A Software Framework for Exploring 1,000 Genomes of African Descent
用于探索 1,000 个非洲人后裔基因组的软件框架
基本信息
- 批准号:9096211
- 负责人:
- 金额:$ 45.09万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2015
- 资助国家:美国
- 起止时间:2015-07-01 至 2018-06-30
- 项目状态:已结题
- 来源:
- 关键词:AfricaAfricanAlgorithmsAmericasArchitectureAsthmaAuthorization documentationBacteriaCaribbean regionCatalogingCatalogsCentral AmericaCommunitiesComputational algorithmComputer softwareDNA SequenceDNA Sequence DatabasesDataData AnalysesData SetDatabasesDevicesDiseaseGenesGenetic VariationGenetic studyGenomeGenomicsGoalsHealthHumanHuman GenomeHypersensitivityIndividualInvestigationLengthLicensingLifeLocationMapsMethodsModelingMutationMutation DetectionNational Heart, Lung, and Blood InstituteNucleic Acid Regulatory SequencesNucleotidesPopulationProcessProtocols documentationRNA SplicingReadingResearch PersonnelResourcesRetrievalRiskSchemeScientistSepsisSequence AlignmentSiteSoftware FrameworkSoftware ToolsSouth AmericaSpeedSystemTimeUnited StatesVariantbasedata sharingdatabase of Genotypes and Phenotypesdeep sequencingdesignfusion genegene discoverygenome databasegenome sequencinghigh riskhuman subjectindexinginterestmicrobialnext generation sequencingnovelopen sourcepreventprogramsreference genomesample collectionsearch enginesoftware developmentterabytetooltrait
项目摘要
DESCRIPTION (provided by applicant): We propose to create new software and analysis methods designed to make possible the exploration of a unique dataset, the 1,004 genomes sequenced by the Consortium on Asthma among African-Ancestry Populations in the Americas (CAAPA). The size of this dataset, over 130 Terabytes, currently prevents it from being explored with alignment-based tools, and researchers instead are limited to using the much smaller files containing single-nucleotide variants. Our proposed software will make this dataset and others like it available for real- time searching, a capability that is not yet possible for any genomic database of this size. Since the early 1990s, scientists have used DNA sequence databases to study a wide range of problems, including novel gene discovery, mutation detection, the investigation of larger structural variants, and evolutionary processes. The ability to search all known genes and genomes using BLAST and similar programs has long been assumed, and sequence search engines throughout the world provide this ability. However, the vast size of the CAAPA dataset makes it impossible to search the data itself using current tools. One cannot look for specific mutations, extract and re-analyze data for any particular gene or regulatory region, or look for structural variants. Newer, fast next-generation sequence alignment programs such as Bowtie, originally developed in our group, allow far faster alignment of NGS reads to the genome, but even these programs cannot search data on the scale of CAAPA in real time. Different architectures need to be designed and built to accommodate these very large datasets. The CAAPA exploration system (CESYS) will use a combination of a highly efficient database, very fast storage, and fast search algorithms to achieve our goals. This project aims to accomplish several goals that will dramatically enhance the value of CAAPA. First, the data will be made available to a very large community of researchers, who can use it not only to study the genetics of asthma and allergy in the CAAPA populations, but also to compare these subjects to other groups. The data currently resides on hard drives and is available only to a small number of the project's PIs, a situation that limits its value. Second, b creating an authentication system consistent with dbGaP, we will create a data sharing model that other projects can use and that will remove some of the technical barriers to sharing genome data from human subjects. Third, as part of building the database, we will re-call all the SNPs using the newly released human genome build (hg20), creating a consistent set of variants that we will also share freely through the project database. Fourth, we will identify all bacterial contaminants, including those in a subset of subjects known to have bloodstream infections at the time of sample collection. Fifth, we will identify structural variants unique to he CAAPA population, which we can then explore for any association with the risk of asthma.
描述(由申请人提供):我们建议创建新的软件和分析方法,旨在使探索一个独特的数据集成为可能,该数据集是由美洲非洲哮喘人群联盟(CAAPA)测序的1,004个基因组。该数据集的大小超过130 TB,目前无法使用基于标记的工具对其进行探索,研究人员只能使用包含单核苷酸变体的更小文件。我们提出的软件将使这个数据集和其他类似的数据集可用于真实的实时搜索,这种能力对于任何这种规模的基因组数据库都是不可能的。 自20世纪90年代初以来,科学家们已经使用DNA序列数据库来研究广泛的问题,包括新基因的发现,突变检测,更大的结构变异的调查和进化过程。使用BLAST和类似程序搜索所有已知基因和基因组的能力早已被假定,并且世界各地的序列搜索引擎提供了这种能力。然而,CAAPA数据集的庞大规模使得使用当前工具搜索数据本身成为不可能。人们无法寻找特定的突变,提取和重新分析任何特定基因或调控区域的数据,或寻找结构变异。 更新,快速的下一代序列比对程序,如Bowtie,最初由我们的小组开发,允许更快地将NGS读数与基因组进行比对,但即使是这些程序也无法在真实的时间内搜索CAAPA规模的数据。需要设计和构建不同的架构来适应这些非常大的数据集。CAAPA探索系统(CEYS)将使用高效的数据库,非常快的存储和快速搜索算法的组合来实现我们的目标。 该项目旨在实现几个目标,这将大大提高CAAPA的价值。首先,这些数据将提供给一个非常大的研究群体,他们不仅可以使用这些数据来研究CAAPA人群中哮喘和过敏的遗传学,还可以将这些受试者与其他群体进行比较。数据当前驻留在硬盘驱动器上,并且仅对项目的少数PI可用,这种情况限制了其价值。其次,B创建一个与dbGaP一致的认证系统,我们将创建一个其他项目可以使用的数据共享模型,这将消除共享人类受试者基因组数据的一些技术障碍。第三,作为构建数据库的一部分,我们将使用新发布的人类基因组构建(hg20)重新调用所有SNP,创建一组一致的变体,我们也将通过项目数据库免费共享。第四,我们将识别所有细菌污染物,包括在样本采集时已知患有血流感染的受试者子集中的细菌污染物。第五,我们将确定CAAPA人群特有的结构变异,然后我们可以探索与哮喘风险的任何关联。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Kathleen C Barnes其他文献
The CD14(−159) polymorphism is not associated with circulating sCD14 nor total serum IgE in an asthmatic population of African descent
- DOI:
10.1016/s0091-6749(02)81809-7 - 发表时间:
2002-01-01 - 期刊:
- 影响因子:
- 作者:
April Zambelli-Weiner;Bernadatte Gray;Paul N Levett;Raana P Naidu;Kathleen C Barnes - 通讯作者:
Kathleen C Barnes
Body mass index associates with asthma and respiratory symptoms but is not explained by diet in a caucasian isolate
- DOI:
10.1016/s0091-6749(02)81811-5 - 发表时间:
2002-01-01 - 期刊:
- 影响因子:
- 作者:
Kathyrn B Held;Rasika A Mathias;Kathleen C Barnes - 通讯作者:
Kathleen C Barnes
Kathleen C Barnes的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Kathleen C Barnes', 18)}}的其他基金
PRIDE Academy: Impact of Ancestry and Gender to omics of lung diseases
PRIDE Academy:血统和性别对肺部疾病组学的影响
- 批准号:
10077882 - 财政年份:2019
- 资助金额:
$ 45.09万 - 项目类别:
PRIDE Academy: Impact of Ancestry and Gender to omics of lung diseases
PRIDE Academy:血统和性别对肺部疾病组学的影响
- 批准号:
10378108 - 财政年份:2019
- 资助金额:
$ 45.09万 - 项目类别:
Multi-omic studies of asthma severity in an African ancestry population
非洲血统人群哮喘严重程度的多组学研究
- 批准号:
10094181 - 财政年份:2018
- 资助金额:
$ 45.09万 - 项目类别:
Multi-omic studies of asthma severity in an African ancestry population
非洲血统人群哮喘严重程度的多组学研究
- 批准号:
10331294 - 财政年份:2018
- 资助金额:
$ 45.09万 - 项目类别:
Multi-omic studies of asthma severity in an African ancestry population
非洲血统人群哮喘严重程度的多组学研究
- 批准号:
9522470 - 财政年份:2018
- 资助金额:
$ 45.09万 - 项目类别:
New Approaches for Empowering Studies of Asthma in Populations of African Descent
非洲人后裔哮喘研究的新方法
- 批准号:
9256781 - 财政年份:2016
- 资助金额:
$ 45.09万 - 项目类别:
A Software Framework for Exploring 1,000 Genomes of African Descent
用于探索 1,000 个非洲人后裔基因组的软件框架
- 批准号:
9301024 - 财政年份:2015
- 资助金额:
$ 45.09万 - 项目类别:
Integrative Genomics in Asthmatics of African Descent
非洲裔哮喘的综合基因组学
- 批准号:
9230688 - 财政年份:2014
- 资助金额:
$ 45.09万 - 项目类别:
The autophagic pathway and atopic asthma: role of IL-33 and ST2
自噬途径和特应性哮喘:IL-33 和 ST2 的作用
- 批准号:
8811919 - 财政年份:2014
- 资助金额:
$ 45.09万 - 项目类别:
Integrative Genomics in Asthmatics of African Descent
非洲裔哮喘的综合基因组学
- 批准号:
9244716 - 财政年份:2014
- 资助金额:
$ 45.09万 - 项目类别:
相似海外基金
Tracing the African roots of Sri-Lanka Portuguese
追溯斯里兰卡葡萄牙语的非洲根源
- 批准号:
AH/Z505717/1 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Research Grant
Bovine herpesvirus 4 as a vaccine platform for African swine fever virus antigens in pigs
牛疱疹病毒 4 作为猪非洲猪瘟病毒抗原的疫苗平台
- 批准号:
BB/Y006224/1 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Research Grant
Commercialisation of African Youth Enterprise Programme
非洲青年企业计划商业化
- 批准号:
ES/Y010752/1 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Research Grant
Resilient and Equitable Nature-based Pathways in Southern African Rangelands (REPAiR)
南部非洲牧场弹性且公平的基于自然的途径 (REPAiR)
- 批准号:
NE/Z503459/1 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Research Grant
Evaluating the effectiveness and sustainability of integrating helminth control with seasonal malaria chemoprevention in West African children
评估西非儿童蠕虫控制与季节性疟疾化学预防相结合的有效性和可持续性
- 批准号:
MR/X023133/1 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Fellowship
Understanding differences in host responses to African swine fever virus
了解宿主对非洲猪瘟病毒反应的差异
- 批准号:
BB/Z514457/1 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Fellowship
The impact on human health of restoring degraded African drylands
恢复退化的非洲旱地对人类健康的影响
- 批准号:
MR/Y019806/1 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Research Grant
CAREER: Habitability of the Hadean Earth - A South African perspective
职业:冥古宙地球的宜居性——南非的视角
- 批准号:
2336044 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Continuing Grant
Nowcasting with Artificial Intelligence for African Rainfall: NAIAR
利用人工智能预测非洲降雨量:NAIAR
- 批准号:
NE/Y000420/1 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Research Grant
Assessing the role of the lithospheric mantle during passive margin development - insights from the South Atlantic African margin
评估岩石圈地幔在被动边缘发育过程中的作用 - 来自南大西洋非洲边缘的见解
- 批准号:
2305552 - 财政年份:2024
- 资助金额:
$ 45.09万 - 项目类别:
Standard Grant














{{item.name}}会员




