A massive study of data science to address the scientific reproducibility crisis
大规模数据科学研究以解决科学再现性危机
基本信息
- 批准号:9244046
- 负责人:
- 金额:$ 36.45万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2016
- 资助国家:美国
- 起止时间:2016-04-01 至 2020-03-31
- 项目状态:已结题
- 来源:
- 关键词:AcuteAddressAffectAreaBehaviorCharacteristicsCommunicationComputer softwareCongressesConsensusCourse ContentDataData AnalysesData AnalyticsData CollectionData ScienceDisciplineDisclosureDropsEducationEnrollmentGalaxyGeographyGoalsGrowthHealthHeartKnowledgeLeadMeasuresMedicalMethodologyMethodsModelingPriceProcessProtocols documentationPublicationsRandomizedReproducibilityResearchResearch InfrastructureResearch PersonnelSeriesSourceStatistical MethodsStatistical ModelsStudentsTimeTrainingTraining ProgramsUnited StatesVariantcohortdesignexperienceexperimental studyimprovedmassive open online coursesopen sourceprogramsprospectivepublic health relevancerandomized trialskillsstatisticssuccesstool
项目摘要
DESCRIPTION (provided by applicant): There is a crisis of reproducibility and replicability of scientific results. This crisis is an increasing source of concern both in the scientific and poplar press. The crisis is so acute that the United States Congress is currently investigating reproducibility of the scientific process. At the heart of the crisis is a shortage of data analytc skill throughout the scientific enterprise. There is an emerging consensus that the best way to address the crisis is to increase data analytic training, particularly around reproducibility and replicability. In this application we (1) propose the first formal statistical model for reproduciility and replicability and then use data and experiments from the largest massive online open program in data science in the world to (2) perform randomized studies to improve our knowledge about which statistical methods and protocols lead to increased reproducibility and replicability in the hands of average users and (3) to analyze learner, course, and content characteristics that increase learner success and throughput to increase the number of trained data analysts worldwide. To accomplish goals (2) and (3) we will use the largest and highest throughput data science program in the world: the Johns Hopkins Data Science Specialization. This specialization, developed by the investigators of this project, consists of nine courses that are offered every month. Since the launch of this program in April 2014, these classes have seen more than two million enrollments and nearly all their experiences have been recorded as data. Furthermore, the MOOC platform for this series permits random assignment of quiz questions and content. We will disseminate our results through open source software, analysis protocols, our popular blog, and the Data Science Specialization to maximally improve data science training and reduce the scientific replication and reproducibility problem. The size of ths program means that by increasing quality of the program and the number of completing students by even a small percentage we can affect global data analytic behavior.
描述(由适用提供):科学结果的可重复性和可复制性存在危机。在科学和杨树出版社中,这场危机是越来越多的关注的根源。危机是如此严重,以至于美国国会目前正在调查科学过程的可重复性。危机的核心是整个科学企业中数据分析技能的短缺。有一个新兴共识,解决危机的最佳方法是增加数据分析培训,尤其是在可重复性和可复制性方面。 In this application we (1) proposal the first formal statistical model for reproducibility and replicability and then use data and experiments from the largest massive online open program in data science in the world to (2) perform randomized studies to improve our knowledge about which statistical methods and protocols lead to increased reproducibility and replicability The hands of average users and (3) to analyze learner, course, and content characteristics that increase learner success and throughput to increase the number of trained全球数据分析师。为了实现目标(2)和(3),我们将使用世界上最大和最高的数据科学计划:约翰·霍普金斯数据科学专业化。该项目的调查人员开发的这种专业化包括每个月提供的九门课程。自2014年4月启动该计划以来,这些课程已有超过200万次入学率,几乎所有的经验都被记录为数据。此外,本系列的MOOC平台允许随机分配测验问题和内容。我们将通过开源软件,分析协议,流行的博客以及数据科学专业化来传播结果,以最大程度地改善数据科学培训并减少科学的复制和可重复性问题。 THS计划的规模意味着,通过提高计划的质量和完成学生的数量,即使是一小部分,我们也会影响全球数据分析行为。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Jeffrey T. Leek其他文献
Jeffrey T. Leek的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Jeffrey T. Leek', 18)}}的其他基金
Data analysis tools for leveraging massive public data to improve hypothesis-driven research
数据分析工具,利用大量公共数据来改进假设驱动的研究
- 批准号:
10598130 - 财政年份:2022
- 资助金额:
$ 36.45万 - 项目类别:
Data analysis tools for leveraging massive public data to improve hypothesis-driven research
数据分析工具,利用大量公共数据来改进假设驱动的研究
- 批准号:
10330636 - 财政年份:2022
- 资助金额:
$ 36.45万 - 项目类别:
Data analysis tools for leveraging massive public data to improve hypothesis-driven research
数据分析工具,利用大量公共数据来改进假设驱动的研究
- 批准号:
10654376 - 财政年份:2022
- 资助金额:
$ 36.45万 - 项目类别:
A massive study of data science to address the scientific reproducibility crisis
大规模数据科学研究以解决科学再现性危机
- 批准号:
9100338 - 财政年份:2016
- 资助金额:
$ 36.45万 - 项目类别:
Statistical models for biological and technical variation in RNA sequencing
RNA 测序中生物和技术变异的统计模型
- 批准号:
8593469 - 财政年份:2013
- 资助金额:
$ 36.45万 - 项目类别:
Statistical models for biological and technical variation in RNA sequencing
RNA 测序中生物和技术变异的统计模型
- 批准号:
9264553 - 财政年份:2013
- 资助金额:
$ 36.45万 - 项目类别:
Statistical models for biological and technical variation in RNA sequencing
RNA 测序中生物和技术变异的统计模型
- 批准号:
8722575 - 财政年份:2013
- 资助金额:
$ 36.45万 - 项目类别:
相似国自然基金
时空序列驱动的神经形态视觉目标识别算法研究
- 批准号:61906126
- 批准年份:2019
- 资助金额:24.0 万元
- 项目类别:青年科学基金项目
本体驱动的地址数据空间语义建模与地址匹配方法
- 批准号:41901325
- 批准年份:2019
- 资助金额:22.0 万元
- 项目类别:青年科学基金项目
大容量固态硬盘地址映射表优化设计与访存优化研究
- 批准号:61802133
- 批准年份:2018
- 资助金额:23.0 万元
- 项目类别:青年科学基金项目
IP地址驱动的多径路由及流量传输控制研究
- 批准号:61872252
- 批准年份:2018
- 资助金额:64.0 万元
- 项目类别:面上项目
针对内存攻击对象的内存安全防御技术研究
- 批准号:61802432
- 批准年份:2018
- 资助金额:25.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Climate Change Effects on Pregnancy via a Traditional Food
气候变化通过传统食物对怀孕的影响
- 批准号:
10822202 - 财政年份:2024
- 资助金额:
$ 36.45万 - 项目类别:
Effects of Aging on Neuronal Lysosomal Damage Responses Driven by CMT2B-linked Rab7
衰老对 CMT2B 相关 Rab7 驱动的神经元溶酶体损伤反应的影响
- 批准号:
10678789 - 财政年份:2023
- 资助金额:
$ 36.45万 - 项目类别:
Functional, structural, and computational consequences of NMDA receptor ablation at medial prefrontal cortex synapses
内侧前额皮质突触 NMDA 受体消融的功能、结构和计算后果
- 批准号:
10677047 - 财政年份:2023
- 资助金额:
$ 36.45万 - 项目类别:
Design and testing of a novel circumesophageal cuff for chronic bilateral subdiaphragmatic vagal nerve stimulation (sVNS)
用于慢性双侧膈下迷走神经刺激(sVNS)的新型环食管套囊的设计和测试
- 批准号:
10702126 - 财政年份:2023
- 资助金额:
$ 36.45万 - 项目类别:
Rapid measurement of novel harm reduction housing on HIV risk, treatment uptake, drug use and supply
快速测量新型减害住房对艾滋病毒风险、治疗接受情况、毒品使用和供应的影响
- 批准号:
10701309 - 财政年份:2023
- 资助金额:
$ 36.45万 - 项目类别: