Query Log Analysis for Improving User Access to NCBI Web Services
用于改善用户对 NCBI Web 服务的访问的查询日志分析
基本信息
- 批准号:9564626
- 负责人:
- 金额:$ 160.63万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:
- 资助国家:美国
- 起止时间:至
- 项目状态:未结题
- 来源:
- 关键词:AbbreviationsAlgorithmsBiologicalCaringDataDatabasesFormulationFundingGoalsGrantGuidelinesImprove AccessInformation ServicesInternetInvestigationJournalsKnowledgeLearningLinkModelingMolecular BiologyNamesOccupationsPaperPopulationProcessPubMedPublishingResearchResearch PersonnelResourcesRetrievalSourceSpecific qualifier valueSystemTextTransactUnited States National Institutes of HealthWorkbaseimprovedonline resourcephrasesresearch and developmentsearch enginesensorsuccesstoolvirtualweb services
项目摘要
Over the last decade, the online search for biological information has progressed rapidly and has become an integral part of any scientific discovery process. Today, it is virtually impossible to conduct R&D in biomedicine without relying on the kind of Web resources developed and maintained by the NCBI. Indeed, each day millions of users search for biological information via NCBIs online Entrez system. However, finding data relevant to a users information need is not always easy in Entrez. Improving our understanding of the growing population of Entrez users, their information needs and the way in which they meet these needs opens opportunities to improve information services and information access provided by NCBI.
Among all Entrez databases, PubMed is the most used and often serves as an entry point for people to access related data in other databases.One resource for understanding and characterizing patrons of PubMed search engines is its transaction logs. Our previous investigation of PubMed search logs has led us to develop and deploy several useful applications in assisting user searches and retrieval such as the query formulation in PubMed, namely Related Queries, Query Autocomplete and Author Name Disambiguation.
Inspired by past success, we have continued using log analysis to improve access to NCBI resources. For example, we have used user clicks to identify articles that the user considered relevant to their own query. In 2016-2017, we have used deep learning models to understand the relationship between the query and the content of potentially relevant articles. This approach is robust and outperforms both traditional IR algorithms as well as related shallow and deep models based on continuous representations of text, with better results on the under-specified query and term mismatch problems.
Of course, there are multiple factors that indicate whether an article is relevant to the searcher. These include the connection between the query and the content, how recent the article is, whether other people found the article relevant, etc. PubMeds new Best Match sort order (using a Learning to Rank algorithm) combines a number of different scores and sources of information to identify the most relevant queries. This has significantly improved the results of our relevance rankings since Spring 2017.
We are continuing the effort begun by our work on TermVariants. When a term is used in a query, usually documents using equivalent terms are also desired. A seeming trivial example is singular and plural terms. But care must be taken to avoid irrelevant articles. For example, navely applying plural rules to abbreviations is often not helpful. Guidelines are being developed to show where these expansions will be helpful.
To better understand queries, we developed a Field Sensor to completely identify the portions and aims of a query. In other words, we identify which part of the query is an author name, a journal title, a date, or key phrases describing a knowledge the searcher would like to uncover. One practical use for this tool is reminding those looking for information, not specific articles, about our improved relevance searching.
We continue to improve our handling and understanding of author names in PubMed articles. Principle Investigators on NIH-funded grants make a particularly important subset of PubMed authors. Additional information about these authors is available from their grants. Information about published papers in grants allows us to do a better job connecting papers and authors. These authors can be more reliably identified between different institutional affiliations, across changes in research focus and even connect different names for the same author.
在过去的十年里,对生物信息的在线搜索发展迅速,已成为任何科学发现过程中不可或缺的一部分。今天,如果不依赖NCBI开发和维护的那种网络资源,几乎不可能进行生物医学的研发。事实上,每天都有数百万用户通过NCBIS在线Entrez系统搜索生物信息。然而,在Entrez中查找与用户信息需求相关的数据并不总是很容易。提高我们对Entrez用户日益增长的人口、他们的信息需求以及他们满足这些需求的方式的了解,为改进NCBI提供的信息服务和信息获取提供了机会。
在所有Entrez数据库中,PubMed是使用最多的数据库,也是人们访问其他数据库中相关数据的入口点。要了解和描述PubMed搜索引擎的用户特征,一个资源就是其交易日志。我们之前对PubMed搜索日志的调查导致我们开发和部署了几个有用的应用程序来帮助用户进行搜索和检索,例如PubMed中的查询公式,即相关查询、查询自动补全和作者姓名消歧。
受过去成功的启发,我们继续使用日志分析来改进对NCBI资源的访问。例如,我们使用用户点击来标识用户认为与他们自己的查询相关的文章。在2016-2017年间,我们使用深度学习模型来理解查询与潜在相关文章内容之间的关系。该方法具有较好的鲁棒性,优于传统的信息检索算法以及基于文本连续表示的浅层和深层模型,在欠指定查询和术语不匹配问题上取得了更好的效果。
当然,有多个因素表明一篇文章是否与搜索者相关。这些包括查询和内容之间的连接,文章的最近时间,其他人是否认为文章相关,等等。PubMeds新的最佳匹配排序顺序(使用学习排名算法)结合了许多不同的分数和信息源,以识别最相关的查询。自2017年春季以来,这显著改善了我们的相关性排名结果。
我们正在继续我们在TermVariants上开始的工作。当在查询中使用术语时,通常也需要使用等价术语的文档。一个看似微不足道的例子是单数和复数术语。但必须注意避免不相关的文章。例如,天真地将复数规则应用于缩略语通常是没有帮助的。目前正在制定指导方针,以显示这些扩展将在哪些方面有所帮助。
为了更好地理解查询,我们开发了一种现场传感器来完全识别查询的部分和目标。换句话说,我们识别查询的哪个部分是作者姓名、期刊标题、日期或描述搜索者想要发现的知识的关键短语。这个工具的一个实际用途是提醒那些寻找信息的人,而不是特定的文章,关于我们改进的相关性搜索。
我们继续改进对PubMed文章中作者姓名的处理和理解。美国国立卫生研究院资助的拨款的主要调查人员构成了PubMed作者的一个特别重要的子集。有关这些作者的更多信息可从他们的赠款中获得。有关在赠款中发表的论文的信息使我们能够更好地连接论文和作者。这些作者可以更可靠地在不同的机构附属机构之间识别,跨越研究重点的变化,甚至将同一作者的不同名字联系起来。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Zhiyong Lu其他文献
Zhiyong Lu的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Zhiyong Lu', 18)}}的其他基金
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
- 批准号:
9362446 - 财政年份:
- 资助金额:
$ 160.63万 - 项目类别:
Machine Learning and Natural Language Processing for Biomedical Applications
生物医学应用的机器学习和自然语言处理
- 批准号:
10927050 - 财政年份:
- 资助金额:
$ 160.63万 - 项目类别:
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
- 批准号:
10007525 - 财政年份:
- 资助金额:
$ 160.63万 - 项目类别:
Automatic Analysis and Annotation of Document Keywords in Biomedical Literature
生物医学文献中文档关键词的自动分析与标注
- 批准号:
8149607 - 财政年份:
- 资助金额:
$ 160.63万 - 项目类别:
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
- 批准号:
9796762 - 财政年份:
- 资助金额:
$ 160.63万 - 项目类别:
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
- 批准号:
8558092 - 财政年份:
- 资助金额:
$ 160.63万 - 项目类别:
Query Log Analysis for Improving User Access to NCBI Web Services
用于改善用户对 NCBI Web 服务的访问的查询日志分析
- 批准号:
8344934 - 财政年份:
- 资助金额:
$ 160.63万 - 项目类别:
Query Log Analysis for Improving User Access to NCBI Web Services
用于改善用户对 NCBI Web 服务的访问的查询日志分析
- 批准号:
8943212 - 财政年份:
- 资助金额:
$ 160.63万 - 项目类别:
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
- 批准号:
8943240 - 财政年份:
- 资助金额:
$ 160.63万 - 项目类别:
Query Log Analysis for Improving User Access to NCBI Web Services
用于改善用户对 NCBI Web 服务的访问的查询日志分析
- 批准号:
8558091 - 财政年份:
- 资助金额:
$ 160.63万 - 项目类别:
相似海外基金
CAREER: Transferring biological networks emergent principles to drone swarm collaborative algorithms
职业:将生物网络新兴原理转移到无人机群协作算法
- 批准号:
2339373 - 财政年份:2024
- 资助金额:
$ 160.63万 - 项目类别:
Continuing Grant
Point-of-care optical spectroscopy platform and novel ratio-metric algorithms for rapid and systematic functional characterization of biological models in vivo
即时光学光谱平台和新颖的比率度量算法,可快速、系统地表征体内生物模型的功能
- 批准号:
10655174 - 财政年份:2023
- 资助金额:
$ 160.63万 - 项目类别:
Statistical Inference from Multiscale Biological Data: theory, algorithms, applications
多尺度生物数据的统计推断:理论、算法、应用
- 批准号:
EP/Y037375/1 - 财政年份:2023
- 资助金额:
$ 160.63万 - 项目类别:
Research Grant
Analysis of words: algorithms for biological sequences, music and texts
单词分析:生物序列、音乐和文本的算法
- 批准号:
RGPIN-2016-03661 - 财政年份:2021
- 资助金额:
$ 160.63万 - 项目类别:
Discovery Grants Program - Individual
Analysis of words: algorithms for biological sequences, music and texts
单词分析:生物序列、音乐和文本的算法
- 批准号:
RGPIN-2016-03661 - 财政年份:2019
- 资助金额:
$ 160.63万 - 项目类别:
Discovery Grants Program - Individual
Building flexible biological particle detection algorithms for emerging real-time instrumentation
为新兴实时仪器构建灵活的生物颗粒检测算法
- 批准号:
2278799 - 财政年份:2019
- 资助金额:
$ 160.63万 - 项目类别:
Studentship
CAREER: Microscopy Image Analysis to Aid Biological Discovery: Optics, Algorithms, and Community
职业:显微镜图像分析有助于生物发现:光学、算法和社区
- 批准号:
2019967 - 财政年份:2019
- 资助金额:
$ 160.63万 - 项目类别:
Standard Grant
Analysis of words: algorithms for biological sequences, music and texts
单词分析:生物序列、音乐和文本的算法
- 批准号:
RGPIN-2016-03661 - 财政年份:2018
- 资助金额:
$ 160.63万 - 项目类别:
Discovery Grants Program - Individual
Analysis of words: algorithms for biological sequences, music and texts
单词分析:生物序列、音乐和文本的算法
- 批准号:
RGPIN-2016-03661 - 财政年份:2017
- 资助金额:
$ 160.63万 - 项目类别:
Discovery Grants Program - Individual
Analysis of words: algorithms for biological sequences, music and texts
单词分析:生物序列、音乐和文本的算法
- 批准号:
RGPIN-2016-03661 - 财政年份:2016
- 资助金额:
$ 160.63万 - 项目类别:
Discovery Grants Program - Individual