Query Log Analysis for Improving User Access to NCBI Web Services

用于改善用户对 NCBI Web 服务的访问的查询日志分析

基本信息

  • 批准号:
    8558091
  • 负责人:
  • 金额:
    $ 26.03万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
  • 财政年份:
  • 资助国家:
    美国
  • 起止时间:
  • 项目状态:
    未结题

项目摘要

Over the last decade, the online search for biological information has progressed rapidly and has become an integral part of any scientific discovery process. Today, it is virtually impossible to conduct R&D in biomedicine without relying on the kind of Web resources developed and maintained by the NCBI. Indeed, each day millions of users search for biological information via NCBIs online Entrez system. However, finding data relevant to a users information need is not always easy in Entrez. Improving our understanding of the growing population of Entrez users, their information needs and the way in which they meet these needs opens opportunities to improve information services and information access provided by NCBI. One resource for understanding and characterizing patrons of search engines is the transaction logs. Our previous investigation of PubMed query logs has led us to develop and deploy several useful applications in assisting user searches and retrieval such as the query formulation in PubMed, namely Related Queries and Query Autocomplete. Inspired by its success, we have continued using log analysis to identify research problems which are closely related to NCBI operations. Among all Entrez databases, PubMed is the most used one and often serves as an entry point for people to access related data in other Entrez databases. In 2011-2012, we have studied the usage of PubMed articles with regard to their citations. The citations of an article have been an important measurement of the quality and impact of the article. Recently there is an increasing interest on the correlation between the citations and number of downloads, investigating whether the latter can act as a predicting indictor or an alternative solution for evaluation. Our experiments based on the citation and query logs of PubMed show that there is a strong correlation between the count of citation and the number of full-text access for PubMed articles. The highest correlation is 0.6 when 6-month total full-text access and 2-year total citation was counted, while articles with less than 2 citations were excluded. As there is generally a lag between when an article is published and when it is cited in another article, we found that the best correlation occurs when citations are computed 3-month after the publication. We also analyzed the public PLoS usage data, and found that the correlation between their citations (from CrossRef) and the total PDF downloads is 0.655, which is very similar to our PubMed dataset. Another research on query log analysis we conducted in 2011-2012 was the development of search filters using PubMed click-through data in order to enable topic-specific literature searches. Search filters have been developed and demonstrated for better information access to the immense and ever-growing body of publications in the biomedical domain. However, to date the number of filters remains quite limited because the current filter development methods require significant human involvement. In this regard, we developed an automated method to build topic-specific filters on the basis of users search logs from PubMed. Specifically, for a given topic, we first detect relevant user queries and use their corresponding clicks to construct a topic relevant article set. Next, we use statistics to identify terms that best represent the topic-relevant document set. Lastly, the selected representative terms are combined with Boolean operators and evaluated on benchmark datasets to derive the final filter with the best performance. We applied our method to develop filters for four different clinical topics: nephrology, diabetes, pregnancy and depression. For the nephrology filter, our method obtained comparable performance to the state of the art (sensitivity of 91.3%, specificity of 98.7%, precision of 94.6%, accuracy of 97.2%). Similarly, high-performing results (over 90% in all measures) were obtained for the other three search filters.
在过去的十年里,对生物信息的在线搜索发展迅速,已成为任何科学发现过程中不可或缺的一部分。今天,如果不依赖NCBI开发和维护的那种网络资源,几乎不可能进行生物医学的研发。事实上,每天都有数百万用户通过NCBIS在线Entrez系统搜索生物信息。然而,在Entrez中查找与用户信息需求相关的数据并不总是很容易。提高我们对Entrez用户日益增长的人口、他们的信息需求以及他们满足这些需求的方式的了解,为改进NCBI提供的信息服务和信息获取提供了机会。了解和描述搜索引擎用户特征的一个资源是交易日志。我们之前对PubMed查询日志的研究使我们开发和部署了几个有用的应用程序来帮助用户进行搜索和检索,例如PubMed中的查询公式,即相关查询和查询自动补全。受其成功的启发,我们继续使用日志分析来确定与NCBI操作密切相关的研究问题。 在所有Entrez数据库中,PubMed是使用最多的数据库,经常作为人们访问其他Entrez数据库中相关数据的入口点。在2011-2012年,我们研究了PubMed文章的引文使用情况。一篇文章的引文情况一直是衡量文章质量和影响力的重要指标。最近,人们对引文和下载量之间的相关性越来越感兴趣,研究下载量是否可以作为评估的预测指标或替代解决方案。基于PubMed的引文和查询日志的实验表明,被引次数与PubMed文章的全文访问次数之间存在很强的相关性。当计入6个月的全文检索量和2年的总引文量时,相关系数最高,为0.6,排除引文量小于2的文章。由于一篇文章发表到另一篇文章被引用之间通常有一段时间,我们发现最好的相关性发生在发表后3个月计算引文时。我们还分析了公共科学图书馆的使用数据,发现它们的引文(来自CrosRef)与PDF总下载量之间的相关性为0.655,这与我们的PubMed数据集非常相似。 我们在2011-2012年进行的另一项关于查询日志分析的研究是使用PubMed点击直达数据开发搜索过滤器,以便能够进行特定主题的文献搜索。已经开发和演示了搜索过滤器,以便更好地获取生物医学领域中数量庞大且不断增长的出版物的信息。然而,到目前为止,过滤器的数量仍然相当有限,因为目前的过滤器开发方法需要大量的人工参与。在这方面,我们开发了一种自动方法来构建基于PubMed的用户搜索日志的特定主题过滤器。具体地说,对于给定的主题,我们首先检测相关的用户查询,并使用他们相应的点击来构建主题相关文章集。接下来,我们使用统计数据来确定最能代表与主题相关的文档集的术语。最后,将选取的代表性项与布尔算子相结合,并在基准数据集上进行评估,以获得性能最好的最终过滤器。我们应用我们的方法为四个不同的临床主题开发了过滤器:肾脏病、糖尿病、怀孕和抑郁症。对于肾病滤器,我们的方法获得了与最先进水平相当的性能(灵敏度91.3%,特异度98.7%,精确度94.6%,准确度97.2%)。同样,其他三个搜索过滤器都获得了高性能的结果(在所有衡量标准中都超过了90%)。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Zhiyong Lu其他文献

Zhiyong Lu的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Zhiyong Lu', 18)}}的其他基金

Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
  • 批准号:
    9362446
  • 财政年份:
  • 资助金额:
    $ 26.03万
  • 项目类别:
Query Log Analysis for Improving User Access to NCBI Web Services
用于改善用户对 NCBI Web 服务的访问的查询日志分析
  • 批准号:
    9564626
  • 财政年份:
  • 资助金额:
    $ 26.03万
  • 项目类别:
Machine Learning and Natural Language Processing for Biomedical Applications
生物医学应用的机器学习和自然语言处理
  • 批准号:
    10927050
  • 财政年份:
  • 资助金额:
    $ 26.03万
  • 项目类别:
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
  • 批准号:
    10007525
  • 财政年份:
  • 资助金额:
    $ 26.03万
  • 项目类别:
Automatic Analysis and Annotation of Document Keywords in Biomedical Literature
生物医学文献中文档关键词的自动分析与标注
  • 批准号:
    8149607
  • 财政年份:
  • 资助金额:
    $ 26.03万
  • 项目类别:
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
  • 批准号:
    8558092
  • 财政年份:
  • 资助金额:
    $ 26.03万
  • 项目类别:
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
  • 批准号:
    9796762
  • 财政年份:
  • 资助金额:
    $ 26.03万
  • 项目类别:
Query Log Analysis for Improving User Access to NCBI Web Services
用于改善用户对 NCBI Web 服务的访问的查询日志分析
  • 批准号:
    8344934
  • 财政年份:
  • 资助金额:
    $ 26.03万
  • 项目类别:
Query Log Analysis for Improving User Access to NCBI Web Services
用于改善用户对 NCBI Web 服务的访问的查询日志分析
  • 批准号:
    8943212
  • 财政年份:
  • 资助金额:
    $ 26.03万
  • 项目类别:
Named Entity Recognition and Relationship Extraction in Biomedicine
生物医学中的命名实体识别和关系提取
  • 批准号:
    8943240
  • 财政年份:
  • 资助金额:
    $ 26.03万
  • 项目类别:

相似国自然基金

企业绩效评价的DEA-Benchmarking方法及动态博弈研究
  • 批准号:
    70571028
  • 批准年份:
    2005
  • 资助金额:
    16.5 万元
  • 项目类别:
    面上项目

相似海外基金

An innovative EDI data, insights & peer benchmarking platform enabling global business leaders to build data-led EDI strategies, plans and budgets.
创新的 EDI 数据、见解
  • 批准号:
    10100319
  • 财政年份:
    2024
  • 资助金额:
    $ 26.03万
  • 项目类别:
    Collaborative R&D
BioSynth Trust: Developing understanding and confidence in flow cytometry benchmarking synthetic datasets to improve clinical and cell therapy diagnos
BioSynth Trust:发展对流式细胞仪基准合成数据集的理解和信心,以改善临床和细胞治疗诊断
  • 批准号:
    2796588
  • 财政年份:
    2023
  • 资助金额:
    $ 26.03万
  • 项目类别:
    Studentship
Collaborative Research: SHF: Medium: A Comprehensive Modeling Framework for Cross-Layer Benchmarking of In-Memory Computing Fabrics: From Devices to Applications
协作研究:SHF:Medium:内存计算结构跨层基准测试的综合建模框架:从设备到应用程序
  • 批准号:
    2347024
  • 财政年份:
    2023
  • 资助金额:
    $ 26.03万
  • 项目类别:
    Standard Grant
Elements: CausalBench: A Cyberinfrastructure for Causal-Learning Benchmarking for Efficacy, Reproducibility, and Scientific Collaboration
要素:CausalBench:用于因果学习基准测试的网络基础设施,以实现有效性、可重复性和科学协作
  • 批准号:
    2311716
  • 财政年份:
    2023
  • 资助金额:
    $ 26.03万
  • 项目类别:
    Standard Grant
Benchmarking collisional rates and hot electron transport in high-intensity laser-matter interaction
高强度激光-物质相互作用中碰撞率和热电子传输的基准测试
  • 批准号:
    2892813
  • 财政年份:
    2023
  • 资助金额:
    $ 26.03万
  • 项目类别:
    Studentship
Collaborative Research: BeeHive: A Cross-Problem Benchmarking Framework for Network Biology
合作研究:BeeHive:网络生物学的跨问题基准框架
  • 批准号:
    2233969
  • 财政年份:
    2023
  • 资助金额:
    $ 26.03万
  • 项目类别:
    Continuing Grant
FET: Medium: Quantum Algorithms, Complexity, Testing and Benchmarking
FET:中:量子算法、复杂性、测试和基准测试
  • 批准号:
    2311733
  • 财政年份:
    2023
  • 资助金额:
    $ 26.03万
  • 项目类别:
    Continuing Grant
Establishing and benchmarking advanced methods to comprehensively characterize somatic genome variation in single human cells
建立先进方法并对其进行基准测试,以全面表征单个人类细胞的体细胞基因组变异
  • 批准号:
    10662975
  • 财政年份:
    2023
  • 资助金额:
    $ 26.03万
  • 项目类别:
Collaborative Research: BeeHive: A Cross-Problem Benchmarking Framework for Network Biology
合作研究:BeeHive:网络生物学的跨问题基准框架
  • 批准号:
    2233968
  • 财政年份:
    2023
  • 资助金额:
    $ 26.03万
  • 项目类别:
    Continuing Grant
Benchmarking Quantum Advantage
量子优势基准测试
  • 批准号:
    EP/Y004418/1
  • 财政年份:
    2023
  • 资助金额:
    $ 26.03万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了