Data analysis tools for leveraging massive public data to improve hypothesis-driven research
数据分析工具,利用大量公共数据来改进假设驱动的研究
基本信息
- 批准号:10598130
- 负责人:
- 金额:$ 42.82万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-04-01 至 2027-02-28
- 项目状态:未结题
- 来源:
- 关键词:AcuteBiologicalCollectionCommunitiesComputer softwareCongressesDataData AnalysesData SourcesDevelopmentDiseaseGenerationsHeartIndividualMeasurementMedicalMethodsMolecularNational Institute of General Medical SciencesPatientsProcessReproducibilityResearchResearch PersonnelRunningSample SizeSamplingSourceSpeedStatistical Data InterpretationStatistical MethodsTechnologyTrainingUnited StatesUnited States National Institutes of HealthWorkcostcrowdsourcingdata resourcedesignexperimental studyfollow-uphigh throughput technologyimprovedlarge scale datapublic repositoryrecruittool
项目摘要
Project summary
There is a crisis of reproducibility and replicability of scientific results. This crisis is an increasing source of
concern both in the scientific and popular press. The crisis is so acute that the United States Congress is currently
investigating reproducibility of the scientific process. At the heart of this crisis is a collection of problems including
small-sample sizes, under-powered studies, under-trained data analysts and an inability to directly leverage prior
results in the statistical analysis of smaller, hypothesis-driven experiments using high-throughput technologies.
Advances in technology have dramatically reduced the cost and difficulty of collecting high-throughput molecular
data. Large collections of raw data are increasingly publicly available but are usually incorporated into individual
analyses by NIGMS and other investigators on an ad-hoc basis. Meanwhile, the other costs of running a designed,
hypothesis-driven study have not decreased at the same speed with technological advances. It is still expensive to
identify, recruit, collect, and follow up samples even if the high-throughput measurements themselves are cheap.
Despite the incredible amount of available public data, it is still common practice to perform statistical inference
in these hypothesis-driven experiments study-by-study, only indirectly including previous data, estimates, and
results. So findings from these studies may be highly variable, unreliable, or unreplicable. Our group has focused
on developing statistical methods, data resources, and software and training that allow researchers to borrow
strength empirically from public repositories, large-scale data generation projects, and crowd-sourced data to
improve inference in individual, hypothesis driven studies. We propose to build on our work in developing
statistical data sources, methods, software and training that facilitate and speed the work of our biological and
medical collaborators. The result will be a research community that can take advantage of public data already
collected at a large cost to the NIH to improve power, reduce required sample sizes, and improve replication in
many new hypothesis driven molecular studies of development and disorder.
项目摘要
科学成果的再现性和可复制性存在危机。这场危机是一个日益严重的根源,
在科学和大众媒体上都受到关注。这场危机如此严重,以至于美国国会目前正在
研究科学过程的可重复性。这场危机的核心是一系列问题,包括
样本量小,研究力度不足,数据分析师培训不足,无法直接利用先前的数据,
使用高通量技术对较小的假设驱动的实验进行统计分析。
技术的进步大大降低了收集高通量分子的成本和难度,
数据大量的原始数据越来越多地公开,但通常被纳入个人数据库。
NIGMS和其他研究人员在特定基础上进行分析。与此同时,运行一个设计好的,
假设驱动的研究并没有随着技术进步以同样的速度减少。它仍然是昂贵的,
识别、招募、收集和跟踪样本,即使高通量测量本身很便宜。
尽管现有的公开数据数量惊人,但进行统计推断仍然是常见的做法
在这些假设驱动的实验中,一项接一项的研究,只是间接地包括以前的数据,估计,
结果因此,这些研究的结果可能是高度可变的,不可靠的,或不可复制的。我们的团队专注于
开发统计方法、数据资源、软件和培训,使研究人员能够借鉴
从公共存储库、大规模数据生成项目和众包数据中获得经验,
在个体、假设驱动研究中改进推理。我们建议在我们的工作基础上,
统计数据来源,方法,软件和培训,促进和加快我们的生物和
医学合作者其结果将是一个研究社区,可以利用公共数据已经
为了提高功效,减少所需的样本量,并提高重复性,
许多新的假说驱动的发展和障碍的分子研究。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Jeffrey T. Leek其他文献
Tackling the widespread and critical impact of batch effects in high-throughput data
解决批效应在高通量数据中广泛且关键的影响
- DOI:
10.1038/nrg2825 - 发表时间:
2010-09-14 - 期刊:
- 影响因子:52.000
- 作者:
Jeffrey T. Leek;Robert B. Scharpf;Héctor Corrada Bravo;David Simcha;Benjamin Langmead;W. Evan Johnson;Donald Geman;Keith Baggerly;Rafael A. Irizarry - 通讯作者:
Rafael A. Irizarry
Transparency and reproducibility in artificial intelligence
人工智能中的透明度和可重复性
- DOI:
10.1038/s41586-020-2766-y - 发表时间:
2020-10-14 - 期刊:
- 影响因子:48.500
- 作者:
Benjamin Haibe-Kains;George Alexandru Adam;Ahmed Hosny;Farnoosh Khodakarami;Levi Waldron;Bo Wang;Chris McIntosh;Anna Goldenberg;Anshul Kundaje;Casey S. Greene;Tamara Broderick;Michael M. Hoffman;Jeffrey T. Leek;Keegan Korthauer;Wolfgang Huber;Alvis Brazma;Joelle Pineau;Robert Tibshirani;Trevor Hastie;John P. A. Ioannidis;John Quackenbush;Hugo J. W. L. Aerts - 通讯作者:
Hugo J. W. L. Aerts
Erratum to: Practical impacts of genomic data “cleaning” on biological discovery using surrogate variable analysis
- DOI:
10.1186/s12859-016-1152-0 - 发表时间:
2016-08-10 - 期刊:
- 影响因子:3.300
- 作者:
Andrew E. Jaffe;Thomas Hyde;Joel Kleinman;Daniel R. Weinberger;Joshua G. Chenoweth;Ronald D. McKay;Jeffrey T. Leek;Carlo Colantuoni - 通讯作者:
Carlo Colantuoni
Jeffrey T. Leek的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Jeffrey T. Leek', 18)}}的其他基金
Data analysis tools for leveraging massive public data to improve hypothesis-driven research
数据分析工具,利用大量公共数据来改进假设驱动的研究
- 批准号:
10330636 - 财政年份:2022
- 资助金额:
$ 42.82万 - 项目类别:
Data analysis tools for leveraging massive public data to improve hypothesis-driven research
数据分析工具,利用大量公共数据来改进假设驱动的研究
- 批准号:
10654376 - 财政年份:2022
- 资助金额:
$ 42.82万 - 项目类别:
A massive study of data science to address the scientific reproducibility crisis
大规模数据科学研究以解决科学再现性危机
- 批准号:
9100338 - 财政年份:2016
- 资助金额:
$ 42.82万 - 项目类别:
A massive study of data science to address the scientific reproducibility crisis
大规模数据科学研究以解决科学再现性危机
- 批准号:
9244046 - 财政年份:2016
- 资助金额:
$ 42.82万 - 项目类别:
Statistical models for biological and technical variation in RNA sequencing
RNA 测序中生物和技术变异的统计模型
- 批准号:
8593469 - 财政年份:2013
- 资助金额:
$ 42.82万 - 项目类别:
Statistical models for biological and technical variation in RNA sequencing
RNA 测序中生物和技术变异的统计模型
- 批准号:
9264553 - 财政年份:2013
- 资助金额:
$ 42.82万 - 项目类别:
Statistical models for biological and technical variation in RNA sequencing
RNA 测序中生物和技术变异的统计模型
- 批准号:
8722575 - 财政年份:2013
- 资助金额:
$ 42.82万 - 项目类别:
相似海外基金
ICBR Capacity: Biological Collections: Infrastructure improvement and data preservation of the Tetrapods Collection at the Ohio State University Museum of Biological Diversity.
ICBR 能力:生物收藏:俄亥俄州立大学生物多样性博物馆四足动物收藏的基础设施改善和数据保存。
- 批准号:
2312986 - 财政年份:2023
- 资助金额:
$ 42.82万 - 项目类别:
Continuing Grant
Biological Collections: Crucial upgrades to specimen storage, organization, and database management at the rapidly growing Texas A&M Collection of Fishes
生物收藏:快速发展的德克萨斯州 A 实验室对标本存储、组织和数据库管理进行了重要升级
- 批准号:
2035082 - 财政年份:2021
- 资助金额:
$ 42.82万 - 项目类别:
Standard Grant
ICBR Capacity: Biological Collections: Updates to the Operation of the Algal Resources Collection
ICBR 能力:生物收藏:藻类资源收藏运作的更新
- 批准号:
2113785 - 财政年份:2021
- 资助金额:
$ 42.82万 - 项目类别:
Continuing Grant
FSML: Minding the Gap - Improving Year-Round Data Collection to Support Continued and Expanded Biological Research in the McMurdo Dry Valleys of Antarctica
FSML:弥补差距 - 改进全年数据收集,以支持南极洲麦克默多干谷的持续和扩大的生物研究
- 批准号:
2114156 - 财政年份:2021
- 资助金额:
$ 42.82万 - 项目类别:
Standard Grant
I-Corps: Venturi Vacuum Device for Biological and Particulate Sample Collection
I-Corps:用于生物和颗粒样品采集的文丘里真空装置
- 批准号:
1952291 - 财政年份:2020
- 资助金额:
$ 42.82万 - 项目类别:
Standard Grant
Design, prototype and testing of equipment for applying flock onto swab sticks for collection of biological specimens
将植绒应用到拭子棒上以收集生物样本的设备的设计、原型和测试
- 批准号:
60607 - 财政年份:2020
- 资助金额:
$ 42.82万 - 项目类别:
Feasibility Studies
CSBR: NATURAL HISTORY COLLECTIONS: TRANSFORMING ACCESSIBILITY TO THE RICH, SITE-BASED, MULTI-TAXON COLLECTION OF ARCHBOLD BIOLOGICAL STATION
CSBR:自然历史馆藏:改变阿奇博德生物站丰富的、基于站点的、多分类群馆藏的可访问性
- 批准号:
1458229 - 财政年份:2015
- 资助金额:
$ 42.82万 - 项目类别:
Continuing Grant
Legal model of access to biological sample collection and profit sharing in France.
法国获取生物样本采集和利润分享的法律模式。
- 批准号:
26780047 - 财政年份:2014
- 资助金额:
$ 42.82万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
COLLECTION, ADVERTISEMENT AND DISTRIBUTION OF BIOLOGICAL RESPO
生物资源的收集、广告和分发
- 批准号:
8656609 - 财政年份:2013
- 资助金额:
$ 42.82万 - 项目类别:
COLLECTION, ADVERTISEMENT AND DISTRIBUTION OF BIOLOGICAL RESPO
生物资源的收集、广告和分发
- 批准号:
8317485 - 财政年份:2010
- 资助金额:
$ 42.82万 - 项目类别: