Query Log Analysis for Improving User Access to NCBI Web Services
用于改善用户对 NCBI Web 服务的访问的查询日志分析
基本信息
- 批准号:8943212
- 负责人:
- 金额:$ 20.8万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:Appetitive BehaviorBiologicalCase-Control StudiesDataDatabasesDrug FormulationsEvaluationGoalsInformation ServicesInternetInvestigationJournalsLeadLinkMethodsMolecular BiologyNamesPopulationProcessPubMedPublishingResearchResourcesRetrievalSourceSystembasedesignexperienceimprovedmeetingsoperationresearch and developmentsuccessweb services
项目摘要
Over the last decade, the online search for biological information has progressed rapidly and has become an integral part of any scientific discovery process. Today, it is virtually impossible to conduct R&D in biomedicine without relying on the kind of Web resources developed and maintained by the NCBI. Indeed, each day millions of users search for biological information via NCBIs online Entrez system. However, finding data relevant to a users information need is not always easy in Entrez. Improving our understanding of the growing population of Entrez users, their information needs and the way in which they meet these needs opens opportunities to improve information services and information access provided by NCBI. One resource for understanding and characterizing patrons of search engines is the transaction logs. Our previous investigation of PubMed query logs has led us to develop and deploy several useful applications in assisting user searches and retrieval such as the query formulation in PubMed, namely Related Queries and Query Autocomplete. Inspired by its success, we have continued using log analysis to identify research problems which are closely related to NCBI operations.
Among all Entrez databases, PubMed is the most used one and often serves as an entry point for people to access related data in other Entrez databases. In 2013-2014, we compared two automatic approaches for computing relatedness between journals: one through comparing similar articles published by two journals and the other by comparing articles (in two journals) that were accessed by the same set of users in PubMed query logs. As can be seen, the methods are built on two distinct sources: article content vs. usage. Accordingly, we found that there are significant differences in the results of the two approaches. Furthermore, we compared both methods to a third approach that is based on article citation information. In a case study, the comparison results show that the usage-based method produces results similar to those based on article citation information. This is not unexpected because previous research has suggested correlations between article access usage and citations. Taken together, this research demonstrates that content similarity and usage information in query logs can be complementary to one another in finding related items (e.g. related journals; related articles). The article usage information in query logs could be particularly useful when citation information is not available.
In 2011, we studied the PubMed log analysis in terms of its user information needs and search behaviors. One of our main findings was that PubMed users frequently search author names. However, author name ambiguity (e.g. there are multiple authors who share the same name 'Zhiyong Lu' in PubMed) may lead to irrelevant retrieval results. To improve the PubMed user experience with author name queries, an author name disambiguation system based on author profiling and agglomerative clustering was recently developed. In particular, we contributed to the design and evaluation in this project in 2013-2014. When our system was integrated into the PubMed search engine, the overall click-through rate of PubMed users on author name query results improved from 34.9% to 36.9%.
在过去的十年中,生物信息的在线搜索取得了迅速的进展,并已成为任何科学发现过程的一个组成部分。今天,如果不依赖NCBI开发和维护的网络资源,几乎不可能进行生物医学的研发。事实上,每天都有数百万用户通过国家中心局的在线生物信息系统搜索生物信息。然而,在Zeroz中找到与用户信息需求相关的数据并不总是容易的。提高我们对不断增长的用户群体,他们的信息需求以及他们满足这些需求的方式的理解,为改善NCBI提供的信息服务和信息访问提供了机会。用于理解和表征搜索引擎的顾客的一个资源是事务日志。我们以前的调查PubMed查询日志,使我们能够开发和部署几个有用的应用程序,以协助用户搜索和检索,如查询制定在PubMed,即相关检索和查询自动完成。受其成功的启发,我们继续使用日志分析来确定与NCBI操作密切相关的研究问题。
在所有的Pandrez数据库中,PubMed是使用最多的一个,通常作为人们访问其他Pandrez数据库中相关数据的入口点。在2013-2014年,我们比较了两种自动计算期刊之间相关性的方法:一种是通过比较两种期刊发表的相似文章,另一种是通过比较PubMed查询日志中同一组用户访问的文章(在两种期刊中)。可以看出,这些方法建立在两个不同的来源上:文章内容与使用。因此,我们发现两种方法的结果存在显着差异。此外,我们将这两种方法与基于文章引用信息的第三种方法进行了比较。在一个案例研究中,比较结果表明,基于使用的方法产生的结果类似的文章引用信息的基础上。这并不意外,因为之前的研究表明文章访问使用和引用之间存在相关性。两者合计,这项研究表明,内容相似性和使用信息查询日志中可以相互补充,在寻找相关的项目(例如,相关的期刊,相关的文章)。当引用信息不可用时,查询日志中的文章使用信息可能特别有用。
2011年,我们研究了PubMed日志分析的用户信息需求和搜索行为。我们的主要发现之一是PubMed用户经常搜索作者姓名。但是,作者姓名不明确(例如,在PubMed中有多位作者共享同一个名字“Zhiyong Lu”)可能会导致不相关的检索结果。为了提高PubMed用户对作者姓名查询的体验,最近开发了一个基于作者分析和凝聚聚类的作者姓名消歧系统。特别是,我们在2013-2014年为该项目的设计和评估做出了贡献。当我们的系统集成到PubMed搜索引擎,PubMed用户对作者姓名查询结果的整体点击率从34.9%提高到36.9%。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Zhiyong Lu其他文献
Zhiyong Lu的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Zhiyong Lu', 18)}}的其他基金
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
- 批准号:
9362446 - 财政年份:
- 资助金额:
$ 20.8万 - 项目类别:
Query Log Analysis for Improving User Access to NCBI Web Services
用于改善用户对 NCBI Web 服务的访问的查询日志分析
- 批准号:
9564626 - 财政年份:
- 资助金额:
$ 20.8万 - 项目类别:
Machine Learning and Natural Language Processing for Biomedical Applications
生物医学应用的机器学习和自然语言处理
- 批准号:
10927050 - 财政年份:
- 资助金额:
$ 20.8万 - 项目类别:
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
- 批准号:
10007525 - 财政年份:
- 资助金额:
$ 20.8万 - 项目类别:
Automatic Analysis and Annotation of Document Keywords in Biomedical Literature
生物医学文献中文档关键词的自动分析与标注
- 批准号:
8149607 - 财政年份:
- 资助金额:
$ 20.8万 - 项目类别:
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
- 批准号:
8558092 - 财政年份:
- 资助金额:
$ 20.8万 - 项目类别:
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
- 批准号:
9796762 - 财政年份:
- 资助金额:
$ 20.8万 - 项目类别:
Query Log Analysis for Improving User Access to NCBI Web Services
用于改善用户对 NCBI Web 服务的访问的查询日志分析
- 批准号:
8344934 - 财政年份:
- 资助金额:
$ 20.8万 - 项目类别:
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
- 批准号:
8943240 - 财政年份:
- 资助金额:
$ 20.8万 - 项目类别:
Query Log Analysis for Improving User Access to NCBI Web Services
用于改善用户对 NCBI Web 服务的访问的查询日志分析
- 批准号:
8558091 - 财政年份:
- 资助金额:
$ 20.8万 - 项目类别:
相似海外基金
Defining the biological boundaries to sustain extant life on Mars
定义维持火星现存生命的生物边界
- 批准号:
DP240102658 - 财政年份:2024
- 资助金额:
$ 20.8万 - 项目类别:
Discovery Projects
Advanced Multiscale Biological Imaging using European Infrastructures
利用欧洲基础设施进行先进的多尺度生物成像
- 批准号:
EP/Y036654/1 - 财政年份:2024
- 资助金额:
$ 20.8万 - 项目类别:
Research Grant
Open Access Block Award 2024 - Marine Biological Association
2024 年开放获取区块奖 - 海洋生物学协会
- 批准号:
EP/Z532538/1 - 财政年份:2024
- 资助金额:
$ 20.8万 - 项目类别:
Research Grant
NSF/BIO-DFG: Biological Fe-S intermediates in the synthesis of nitrogenase metalloclusters
NSF/BIO-DFG:固氮酶金属簇合成中的生物 Fe-S 中间体
- 批准号:
2335999 - 财政年份:2024
- 资助金额:
$ 20.8万 - 项目类别:
Standard Grant
DESIGN: Driving Culture Change in a Federation of Biological Societies via Cohort-Based Early-Career Leaders
设计:通过基于队列的早期职业领袖推动生物协会联盟的文化变革
- 批准号:
2334679 - 财政年份:2024
- 资助金额:
$ 20.8万 - 项目类别:
Standard Grant
Collaborative Research: The Interplay of Water Condensation and Fungal Growth on Biological Surfaces
合作研究:水凝结与生物表面真菌生长的相互作用
- 批准号:
2401507 - 财政年份:2024
- 资助金额:
$ 20.8万 - 项目类别:
Standard Grant
REU Site: Modeling the Dynamics of Biological Systems
REU 网站:生物系统动力学建模
- 批准号:
2243955 - 财政年份:2024
- 资助金额:
$ 20.8万 - 项目类别:
Standard Grant
Collaborative Research: Conference: Large Language Models for Biological Discoveries (LLMs4Bio)
合作研究:会议:生物发现的大型语言模型 (LLMs4Bio)
- 批准号:
2411529 - 财政年份:2024
- 资助金额:
$ 20.8万 - 项目类别:
Standard Grant
Collaborative Research: Conference: Large Language Models for Biological Discoveries (LLMs4Bio)
合作研究:会议:生物发现的大型语言模型 (LLMs4Bio)
- 批准号:
2411530 - 财政年份:2024
- 资助金额:
$ 20.8万 - 项目类别:
Standard Grant
Collaborative Research: NSF-ANR MCB/PHY: Probing Heterogeneity of Biological Systems by Force Spectroscopy
合作研究:NSF-ANR MCB/PHY:通过力谱探测生物系统的异质性
- 批准号:
2412551 - 财政年份:2024
- 资助金额:
$ 20.8万 - 项目类别:
Standard Grant