III: Medium: Detecting Low Dimensional Structures in Genomic Data

III:中:检测基因组数据中的低维结构

基本信息

  • 批准号:
    1705197
  • 负责人:
  • 金额:
    $ 119.97万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2017
  • 资助国家:
    美国
  • 起止时间:
    2017-08-15 至 2022-07-31
  • 项目状态:
    已结题

项目摘要

New sequencing technologies have made genomics a big data science. These data have complexity and represent many variables. In trying to get biological information from genomic sequence, it is often necessary to reduce the complexity. There are a number of different approaches to use computationally, but these often introduce errors because of assumptions made about the data. This project will lead to the development of novel approaches specific to the type of genomic data collected. One of these types of data represents the DNA sequence and the other comes from natural modifications to the sequence when genes are expressed. These new methods will identify important differences more accurately in the two data types by correctly modeling unique properties of these data in a statistical framework. Methods developed during this project will have a great impact on the genomics field, where researchers may discover the genetic basis of complex diseases. The broader impacts of this project are gaining a deeper insight into the genetic basis of complex diseases, distributing the novel methods through public webservers and software tools for academic research and educational purposes, and training undergraduate students, graduate students, and postdoctoral scholars. In particular, this project will provide training to underrepresented groups with a summer intensive program that recruits minorities traditionally underrepresented in STEM fields.Discovering a low dimensional structure from the high dimensional genomic data is a very important procedure in genomic studies because this structure may infer unknown confounding factors in genomic data as well as other important properties of data such as ethnicity of individuals. There are several dimensionality reduction methods prevalently used in the genomics, they may not generate an accurate low dimensional structure from genomic data because their underlying assumption on the statistical model is often violated in the data. This project proposes to develop dimensionality reduction methods aimed for genomic data, especially for methylation and genotype data. These methods will incorporate unique properties present in genomic data such as the discrete nature and correlation structure of genotype data, and different methylation patterns across different cell types and tissues. This project will also analyze asymptotic behavior of the novel methods using random matrix theory. Three strategies will be used to validate the methods. First, for all genomics applications, there are datasets where there is gold standard information, Second, simulated data based on current practices in the genomics community will be used to perform evaluate genomics applications. For example, it is standard in the community to simulate the genetics of admixed individuals by combining the genotypes of individuals of known ancestry from a reference dataset such as the 1000 Genomes project. Third, the team will evaluate the general algorithms by generating simulated data using various generative models to validate that the algorithms have the asymptotic behavior expected and also examine how these algorithms perform when their assumptions are violated. The methods will contribute both to the statistical field by improving current low dimensionality methods and to the genomics field by releasing software tools. The broader impacts of this project are gaining a deeper insight into the genetic basis of complex diseases, distributing the methods through public webservers and software tools for academic research and educational purposes, and training undergraduate students, graduate students, and postdoctoral scholars. In particular, this project will provide training to underrepresented groups with a summer intensive program that recruits minorities traditionally underrepresented in STEM fields.
新的测序技术使基因组学成为一门大数据科学。这些数据具有复杂性并代表许多变量。 在试图从基因组序列中获得生物信息时,通常需要降低复杂性。有许多不同的方法可以用于计算,但由于对数据的假设,这些方法通常会引入错误。该项目将导致开发针对所收集的基因组数据类型的新方法。 这些类型的数据之一代表DNA序列,另一种来自基因表达时对序列的自然修饰。这些新方法将通过在统计框架中正确建模这些数据的独特属性,更准确地识别两种数据类型中的重要差异。该项目期间开发的方法将对基因组学领域产生重大影响,研究人员可能会发现复杂疾病的遗传基础。该项目的更广泛影响是更深入地了解复杂疾病的遗传基础,通过公共网络服务器和软件工具传播新方法用于学术研究和教育目的,并培训本科生,研究生和博士后学者。特别是,该项目将通过夏季强化计划,招募传统上在STEM领域代表性不足的少数民族,为代表性不足的群体提供培训。从高维基因组数据中发现低维结构是基因组研究中非常重要的步骤,因为该结构可以推断基因组数据中未知的混杂因素以及数据的其他重要属性,如个体的种族。目前在基因组学中常用的降维方法有很多种,但由于其对统计模型的基本假设往往在数据中被违反,因此它们可能无法从基因组数据中生成准确的低维结构。本项目提出了针对基因组数据,特别是甲基化和基因型数据的降维方法。这些方法将结合基因组数据中存在的独特性质,例如基因型数据的离散性质和相关结构,以及不同细胞类型和组织中的不同甲基化模式。本计画亦将利用随机矩阵理论分析新方法的渐近行为。三种策略将被用来验证的方法。 首先,对于所有基因组学应用,都有包含金标准信息的数据集,其次,基于基因组学界当前实践的模拟数据将用于评估基因组学应用。 例如,通过结合来自参考数据集(如1000个基因组项目)的已知祖先的个体的基因型来模拟混合个体的遗传学是社区中的标准。 第三,该团队将通过使用各种生成模型生成模拟数据来评估通用算法,以验证算法具有预期的渐近行为,并检查这些算法在违反其假设时的表现。 该方法将有助于统计领域的改进目前的低维方法和基因组学领域的软件工具发布。该项目更广泛的影响是更深入地了解复杂疾病的遗传基础,通过公共网络服务器和软件工具传播用于学术研究和教育目的的方法,并培训本科生,研究生和博士后学者。特别是,该项目将为代表性不足的群体提供夏季强化计划,招募传统上在STEM领域代表性不足的少数民族。

项目成果

期刊论文数量(28)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Enhancing droplet-based single-nucleus RNA-seq resolution using the semi-supervised machine learning classifier DIEM
  • DOI:
    10.1038/s41598-020-67513-5
  • 发表时间:
    2020-07-03
  • 期刊:
  • 影响因子:
    4.6
  • 作者:
    Alvarez, Marcus;Rahmani, Elior;Pajukanta, Paivi
  • 通讯作者:
    Pajukanta, Paivi
Leveraging allelic imbalance to refine fine-mapping for eQTL studies
  • DOI:
    10.1371/journal.pgen.1008481
  • 发表时间:
    2019-12-01
  • 期刊:
  • 影响因子:
    4.5
  • 作者:
    Zou, Jennifer;Hormozdiari, Farhad;Eskin, Eleazar
  • 通讯作者:
    Eskin, Eleazar
Contribution of common and rare variants to bipolar disorder susceptibility in extended pedigrees from population isolates.
人群分离株的扩展谱系中常见和罕见变异对双相情感障碍易感性的贡献。
  • DOI:
    10.1038/s41398-020-0758-1
  • 发表时间:
    2020
  • 期刊:
  • 影响因子:
    6.8
  • 作者:
    Sul,JaeHoon;Service,SusanK;Huang,AldenY;Ramensky,Vasily;Hwang,Sun-Goo;Teshiba,TerriM;Park,YoungJun;Ori,AnilPS;Zhang,Zhongyang;Mullins,Niamh;OldeLoohuis,LoesM;Fears,ScottC;Araya,Carmen;Araya,Xinia;Spesny,Mitzi;Bejaran
  • 通讯作者:
    Bejaran
Stochasticity constrained by deterministic effects of diet and age drive rumen microbiome assembly dynamics
  • DOI:
    10.1038/s41467-020-15652-8
  • 发表时间:
    2020-04-20
  • 期刊:
  • 影响因子:
    16.6
  • 作者:
    Furman, Ori;Shenhav, Liat;Mizrahi, Itzhak
  • 通讯作者:
    Mizrahi, Itzhak
ForestQC: Quality control on genetic variants from next-generation sequencing data using random forest
  • DOI:
    10.1371/journal.pcbi.1007556
  • 发表时间:
    2019-12-01
  • 期刊:
  • 影响因子:
    4.3
  • 作者:
    Li, Jiajin;Jew, Brandon;Sul, Jae Hoon
  • 通讯作者:
    Sul, Jae Hoon
{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Eleazar Eskin其他文献

Improving the usability and archival stability of bioinformatics software
  • DOI:
    10.1186/s13059-019-1649-8
  • 发表时间:
    2019-02-27
  • 期刊:
  • 影响因子:
    9.400
  • 作者:
    Serghei Mangul;Lana S. Martin;Eleazar Eskin;Ran Blekhman
  • 通讯作者:
    Ran Blekhman
Systematic benchmarking of omics computational tools
组学计算工具的系统基准测试
  • DOI:
    10.1038/s41467-019-09406-4
  • 发表时间:
    2019-03-27
  • 期刊:
  • 影响因子:
    15.700
  • 作者:
    Serghei Mangul;Lana S. Martin;Brian L. Hill;Angela Ka-Mei Lam;Margaret G. Distler;Alex Zelikovsky;Eleazar Eskin;Jonathan Flint
  • 通讯作者:
    Jonathan Flint
Discrete profile comparison using information bottleneck
  • DOI:
    10.1186/1471-2105-7-s1-s8
  • 发表时间:
    2006-03-20
  • 期刊:
  • 影响因子:
    3.300
  • 作者:
    Sean O'Rourke;Gal Chechik;Robin Friedman;Eleazar Eskin
  • 通讯作者:
    Eleazar Eskin
MEF: Malicious Email Filter - A UNIX Mail Filter That Detects Malicious Windows Executables
MEF:恶意电子邮件过滤器 - 检测恶意 Windows 可执行文件的 UNIX 邮件过滤器
Dealing with large diagonals in kernel matrices
  • DOI:
    10.1007/bf02530507
  • 发表时间:
    2003-06-01
  • 期刊:
  • 影响因子:
    0.600
  • 作者:
    Jason Weston;Bernhard Schölkopf;Eleazar Eskin;Christina Leslie;William Stafford Noble
  • 通讯作者:
    William Stafford Noble

Eleazar Eskin的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Eleazar Eskin', 18)}}的其他基金

III: Medium: Causal inference in biobanks: Leveraging genetics to infer causal relationships using electronic health records
III:中:生物库中的因果推断:利用电子健康记录利用遗传学来推断因果关系
  • 批准号:
    2106908
  • 财政年份:
    2021
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Continuing Grant
III:Small: Replication Studies for High Dimensional Data: Insights into Confounding and Heterogeneity
III:小:高维数据的复制研究:洞察混杂和异质性
  • 批准号:
    1910885
  • 财政年份:
    2019
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Continuing Grant
III: Small: Causal and Statistical Inference in the Presence of Confounding Factors
III:小:存在混杂因素时的因果和统计推断
  • 批准号:
    1320589
  • 财政年份:
    2013
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Standard Grant
BSF:2012304:Methods for Preprocessing Population Sequence Data
BSF:2012304:群体序列数据的预处理方法
  • 批准号:
    1331176
  • 财政年份:
    2013
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Standard Grant
III: Medium: Meta-analysis reinterpreted using causal graphs
III:中:使用因果图重新解释荟萃分析
  • 批准号:
    1302448
  • 财政年份:
    2013
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Continuing Grant
III: Medium: Private Identification of Relatives and Private GWAS: First Steps in the New Field of CryptoGenomics
III:媒介:亲属的私人身份识别和私人 GWAS:密码基因组学新领域的第一步
  • 批准号:
    1065276
  • 财政年份:
    2011
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Standard Grant
III: Small: Inference of Causal Regulatory Relationships from Genetic Studies
III:小:从遗传研究中推断因果调节关系
  • 批准号:
    0916676
  • 财政年份:
    2009
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Continuing Grant
Collaborative Research: Design and Analysis of Compressed Sensing DNA Microarrays
合作研究:压缩传感 DNA 微阵列的设计和分析
  • 批准号:
    0729049
  • 财政年份:
    2007
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Continuing Grant
Collaborative Research: SEIII: Estimating Haplotype Frequencies
合作研究:SEIII:估计单倍型频率
  • 批准号:
    0731455
  • 财政年份:
    2007
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Standard Grant
Collaborative Research: SEIII: Estimating Haplotype Frequencies
合作研究:SEIII:估计单倍型频率
  • 批准号:
    0513612
  • 财政年份:
    2005
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Standard Grant

相似海外基金

SaTC: CORE: Medium: After the Breach: Detecting Lateral Movement, Reconnaissance, and Exfiltration in Enterprise Networks
SaTC:核心:中:违规后:检测企业网络中的横向移动、侦察和渗透
  • 批准号:
    2152644
  • 财政年份:
    2022
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Standard Grant
III: Medium: Collaborative Research: Detecting and Controlling Network-based Spread of Hospital Acquired Infections
III:媒介:合作研究:检测和控制医院获得性感染的网络传播
  • 批准号:
    1955797
  • 财政年份:
    2020
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Standard Grant
III: Medium: Collaborative Research: Detecting and Controlling Network-based Spread of Hospital Acquired Infections
III:媒介:合作研究:检测和控制医院获得性感染的网络传播
  • 批准号:
    1955883
  • 财政年份:
    2020
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Standard Grant
III: Medium: Collaborative Research: Detecting and Controlling Network-based Spread of Hospital Acquired Infections
III:媒介:合作研究:检测和控制医院获得性感染的网络传播
  • 批准号:
    1955939
  • 财政年份:
    2020
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Standard Grant
CPS: Medium: Detecting and Controlling Unwanted Data Flows in the Internet of Things
CPS:中:检测和控制物联网中不需要的数据流
  • 批准号:
    1953740
  • 财政年份:
    2019
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Cooperative Agreement
CPS: Medium: Detecting and Controlling Unwanted Data Flows in the Internet of Things
CPS:中:检测和控制物联网中不需要的数据流
  • 批准号:
    1739809
  • 财政年份:
    2018
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Cooperative Agreement
NeTS: Medium: Collaborative Research: Detecting and Localizing Spectrum Offenders Using Crowdsourcing
NeTS:媒介:协作研究:使用众包检测和定位频谱违规者
  • 批准号:
    1563928
  • 财政年份:
    2016
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Continuing Grant
NeTS: Medium: Collaborative Research: Detecting and Localizing Spectrum Offenders Using Crowdsourcing
NeTS:媒介:协作研究:使用众包检测和定位频谱违规者
  • 批准号:
    1564287
  • 财政年份:
    2016
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Continuing Grant
CSR: Medium: Highly Scalable and Accurate System Support for Detecting Misbehaving Users and Mitigating Criminal Activities in Realtime Online Video-Based Services
CSR:中:高度可扩展且准确的系统支持,用于检测行为不当的用户并减少实时在线视频服务中的犯罪活动
  • 批准号:
    1162614
  • 财政年份:
    2012
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Continuing Grant
SHF: Medium: RacePro: Automatically Detecting API Races in Deployed Systems
SHF:中:RacePro:自动检测已部署系统中的 API 竞争
  • 批准号:
    1162021
  • 财政年份:
    2012
  • 资助金额:
    $ 119.97万
  • 项目类别:
    Standard Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了