A Software Framework for Exploring 1,000 Genomes of African Descent
用于探索 1,000 个非洲人后裔基因组的软件框架
基本信息
- 批准号:9096211
- 负责人:
- 金额:$ 45.09万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2015
- 资助国家:美国
- 起止时间:2015-07-01 至 2018-06-30
- 项目状态:已结题
- 来源:
- 关键词:AfricaAfricanAlgorithmsAmericasArchitectureAsthmaAuthorization documentationBacteriaCaribbean regionCatalogingCatalogsCentral AmericaCommunitiesComputational algorithmComputer softwareDNA SequenceDNA Sequence DatabasesDataData AnalysesData SetDatabasesDevicesDiseaseGenesGenetic VariationGenetic studyGenomeGenomicsGoalsHealthHumanHuman GenomeHypersensitivityIndividualInvestigationLengthLicensingLifeLocationMapsMethodsModelingMutationMutation DetectionNational Heart, Lung, and Blood InstituteNucleic Acid Regulatory SequencesNucleotidesPopulationProcessProtocols documentationRNA SplicingReadingResearch PersonnelResourcesRetrievalRiskSchemeScientistSepsisSequence AlignmentSiteSoftware FrameworkSoftware ToolsSouth AmericaSpeedSystemTimeUnited StatesVariantbasedata sharingdatabase of Genotypes and Phenotypesdeep sequencingdesignfusion genegene discoverygenome databasegenome sequencinghigh riskhuman subjectindexinginterestmicrobialnext generation sequencingnovelopen sourcepreventprogramsreference genomesample collectionsearch enginesoftware developmentterabytetooltrait
项目摘要
DESCRIPTION (provided by applicant): We propose to create new software and analysis methods designed to make possible the exploration of a unique dataset, the 1,004 genomes sequenced by the Consortium on Asthma among African-Ancestry Populations in the Americas (CAAPA). The size of this dataset, over 130 Terabytes, currently prevents it from being explored with alignment-based tools, and researchers instead are limited to using the much smaller files containing single-nucleotide variants. Our proposed software will make this dataset and others like it available for real- time searching, a capability that is not yet possible for any genomic database of this size. Since the early 1990s, scientists have used DNA sequence databases to study a wide range of problems, including novel gene discovery, mutation detection, the investigation of larger structural variants, and evolutionary processes. The ability to search all known genes and genomes using BLAST and similar programs has long been assumed, and sequence search engines throughout the world provide this ability. However, the vast size of the CAAPA dataset makes it impossible to search the data itself using current tools. One cannot look for specific mutations, extract and re-analyze data for any particular gene or regulatory region, or look for structural variants. Newer, fast next-generation sequence alignment programs such as Bowtie, originally developed in our group, allow far faster alignment of NGS reads to the genome, but even these programs cannot search data on the scale of CAAPA in real time. Different architectures need to be designed and built to accommodate these very large datasets. The CAAPA exploration system (CESYS) will use a combination of a highly efficient database, very fast storage, and fast search algorithms to achieve our goals. This project aims to accomplish several goals that will dramatically enhance the value of CAAPA. First, the data will be made available to a very large community of researchers, who can use it not only to study the genetics of asthma and allergy in the CAAPA populations, but also to compare these subjects to other groups. The data currently resides on hard drives and is available only to a small number of the project's PIs, a situation that limits its value. Second, b creating an authentication system consistent with dbGaP, we will create a data sharing model that other projects can use and that will remove some of the technical barriers to sharing genome data from human subjects. Third, as part of building the database, we will re-call all the SNPs using the newly released human genome build (hg20), creating a consistent set of variants that we will also share freely through the project database. Fourth, we will identify all bacterial contaminants, including those in a subset of subjects known to have bloodstream infections at the time of sample collection. Fifth, we will identify structural variants unique to he CAAPA population, which we can then explore for any association with the risk of asthma.
描述(由申请人提供):我们建议创建新的软件和分析方法,旨在使探索独特的数据集成为可能,即由美洲非洲裔哮喘联盟 (CAAPA) 测序的 1,004 个基因组。该数据集的大小超过 130 TB,目前无法使用基于比对的工具对其进行探索,研究人员只能使用包含单核苷酸变体的小得多的文件。我们提出的软件将使该数据集和其他类似数据集可用于实时搜索,这种功能对于任何这种规模的基因组数据库来说都是不可能的。 自 20 世纪 90 年代初以来,科学家们使用 DNA 序列数据库来研究广泛的问题,包括新基因发现、突变检测、较大结构变异的研究和进化过程。长期以来,人们一直认为使用 BLAST 和类似程序搜索所有已知基因和基因组的能力,并且世界各地的序列搜索引擎都提供了这种能力。然而,CAAPA 数据集的庞大规模使得使用现有工具无法搜索数据本身。人们无法寻找特定的突变,提取和重新分析任何特定基因或调控区域的数据,或者寻找结构变异。 更新、快速的下一代序列比对程序,例如我们小组最初开发的 Bowtie,可以更快地将 NGS 读数与基因组比对,但即使这些程序也无法实时搜索 CAAPA 规模的数据。需要设计和构建不同的架构来容纳这些非常大的数据集。 CAAPA 探索系统 (CESYS) 将结合使用高效数据库、快速存储和快速搜索算法来实现我们的目标。 该项目旨在实现几个目标,从而显着提高 CAAPA 的价值。首先,这些数据将提供给一个非常大的研究人员群体,他们不仅可以用它来研究 CAAPA 人群中哮喘和过敏的遗传学,还可以将这些受试者与其他群体进行比较。这些数据目前驻留在硬盘上,只有少数项目的 PI 可以使用,这种情况限制了其价值。其次,通过创建一个与 dbGaP 一致的认证系统,我们将创建一个其他项目可以使用的数据共享模型,这将消除共享人类受试者基因组数据的一些技术障碍。第三,作为构建数据库的一部分,我们将使用新发布的人类基因组构建 (hg20) 重新调用所有 SNP,创建一组一致的变体,我们也将通过项目数据库免费共享这些变体。第四,我们将识别所有细菌污染物,包括在样本采集时已知患有血流感染的受试者子集中的细菌污染物。第五,我们将确定 CAAPA 人群特有的结构变异,然后我们可以探索其与哮喘风险的任何关联。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Kathleen C Barnes其他文献
The CD14(−159) polymorphism is not associated with circulating sCD14 nor total serum IgE in an asthmatic population of African descent
- DOI:
10.1016/s0091-6749(02)81809-7 - 发表时间:
2002-01-01 - 期刊:
- 影响因子:
- 作者:
April Zambelli-Weiner;Bernadatte Gray;Paul N Levett;Raana P Naidu;Kathleen C Barnes - 通讯作者:
Kathleen C Barnes
Body mass index associates with asthma and respiratory symptoms but is not explained by diet in a caucasian isolate
- DOI:
10.1016/s0091-6749(02)81811-5 - 发表时间:
2002-01-01 - 期刊:
- 影响因子:
- 作者:
Kathyrn B Held;Rasika A Mathias;Kathleen C Barnes - 通讯作者:
Kathleen C Barnes
Kathleen C Barnes的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Kathleen C Barnes', 18)}}的其他基金
PRIDE Academy: Impact of Ancestry and Gender to omics of lung diseases
PRIDE Academy:血统和性别对肺部疾病组学的影响
- 批准号:
10077882 - 财政年份:2019
- 资助金额:
$ 45.09万 - 项目类别:
PRIDE Academy: Impact of Ancestry and Gender to omics of lung diseases
PRIDE Academy:血统和性别对肺部疾病组学的影响
- 批准号:
10378108 - 财政年份:2019
- 资助金额:
$ 45.09万 - 项目类别:
Multi-omic studies of asthma severity in an African ancestry population
非洲血统人群哮喘严重程度的多组学研究
- 批准号:
10094181 - 财政年份:2018
- 资助金额:
$ 45.09万 - 项目类别:
Multi-omic studies of asthma severity in an African ancestry population
非洲血统人群哮喘严重程度的多组学研究
- 批准号:
10331294 - 财政年份:2018
- 资助金额:
$ 45.09万 - 项目类别:
Multi-omic studies of asthma severity in an African ancestry population
非洲血统人群哮喘严重程度的多组学研究
- 批准号:
9522470 - 财政年份:2018
- 资助金额:
$ 45.09万 - 项目类别:
New Approaches for Empowering Studies of Asthma in Populations of African Descent
非洲人后裔哮喘研究的新方法
- 批准号:
9256781 - 财政年份:2016
- 资助金额:
$ 45.09万 - 项目类别:
A Software Framework for Exploring 1,000 Genomes of African Descent
用于探索 1,000 个非洲人后裔基因组的软件框架
- 批准号:
9301024 - 财政年份:2015
- 资助金额:
$ 45.09万 - 项目类别:
Integrative Genomics in Asthmatics of African Descent
非洲裔哮喘的综合基因组学
- 批准号:
9230688 - 财政年份:2014
- 资助金额:
$ 45.09万 - 项目类别:
The autophagic pathway and atopic asthma: role of IL-33 and ST2
自噬途径和特应性哮喘:IL-33 和 ST2 的作用
- 批准号:
8811919 - 财政年份:2014
- 资助金额:
$ 45.09万 - 项目类别:
The autophagic pathway and atopic asthma: role of IL-33 and ST2
自噬途径和特应性哮喘:IL-33 和 ST2 的作用
- 批准号:
8677159 - 财政年份:2014
- 资助金额:
$ 45.09万 - 项目类别:
相似海外基金
Tracing the African roots of Sri-Lanka Portuguese
追溯斯里兰卡葡萄牙语的非洲根源
- 批准号:
AH/Z505717/1 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Research Grant
Bovine herpesvirus 4 as a vaccine platform for African swine fever virus antigens in pigs
牛疱疹病毒 4 作为猪非洲猪瘟病毒抗原的疫苗平台
- 批准号:
BB/Y006224/1 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Research Grant
Commercialisation of African Youth Enterprise Programme
非洲青年企业计划商业化
- 批准号:
ES/Y010752/1 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Research Grant
Resilient and Equitable Nature-based Pathways in Southern African Rangelands (REPAiR)
南部非洲牧场弹性且公平的基于自然的途径 (REPAiR)
- 批准号:
NE/Z503459/1 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Research Grant
Evaluating the effectiveness and sustainability of integrating helminth control with seasonal malaria chemoprevention in West African children
评估西非儿童蠕虫控制与季节性疟疾化学预防相结合的有效性和可持续性
- 批准号:
MR/X023133/1 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Fellowship
Understanding differences in host responses to African swine fever virus
了解宿主对非洲猪瘟病毒反应的差异
- 批准号:
BB/Z514457/1 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Fellowship
The impact on human health of restoring degraded African drylands
恢复退化的非洲旱地对人类健康的影响
- 批准号:
MR/Y019806/1 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Research Grant
CAREER: Habitability of the Hadean Earth - A South African perspective
职业:冥古宙地球的宜居性——南非的视角
- 批准号:
2336044 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Continuing Grant
Nowcasting with Artificial Intelligence for African Rainfall: NAIAR
利用人工智能预测非洲降雨量:NAIAR
- 批准号:
NE/Y000420/1 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Research Grant
Assessing the role of the lithospheric mantle during passive margin development - insights from the South Atlantic African margin
评估岩石圈地幔在被动边缘发育过程中的作用 - 来自南大西洋非洲边缘的见解
- 批准号:
2305552 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Standard Grant














{{item.name}}会员




