A Software Framework for Exploring 1,000 Genomes of African Descent
用于探索 1,000 个非洲人后裔基因组的软件框架
基本信息
- 批准号:9096211
- 负责人:
- 金额:$ 45.09万
- 依托单位:
- 依托单位国家:美国
- 项目类别:
- 财政年份:2015
- 资助国家:美国
- 起止时间:2015-07-01 至 2018-06-30
- 项目状态:已结题
- 来源:
- 关键词:AfricaAfricanAlgorithmsAmericasArchitectureAsthmaAuthorization documentationBacteriaCaribbean regionCatalogingCatalogsCentral AmericaCommunitiesComputational algorithmComputer softwareDNA SequenceDNA Sequence DatabasesDataData AnalysesData SetDatabasesDevicesDiseaseGenesGenetic VariationGenetic studyGenomeGenomicsGoalsHealthHumanHuman GenomeHypersensitivityIndividualInvestigationLengthLicensingLifeLocationMapsMethodsModelingMutationMutation DetectionNational Heart, Lung, and Blood InstituteNucleic Acid Regulatory SequencesNucleotidesPopulationProcessProtocols documentationRNA SplicingReadingResearch PersonnelResourcesRetrievalRiskSchemeScientistSepsisSequence AlignmentSiteSoftware FrameworkSoftware ToolsSouth AmericaSpeedSystemTimeUnited StatesVariantbasedata sharingdatabase of Genotypes and Phenotypesdeep sequencingdesignfusion genegene discoverygenome databasegenome sequencinghigh riskhuman subjectindexinginterestmicrobialnext generation sequencingnovelopen sourcepreventprogramsreference genomesample collectionsearch enginesoftware developmentterabytetooltrait
项目摘要
DESCRIPTION (provided by applicant): We propose to create new software and analysis methods designed to make possible the exploration of a unique dataset, the 1,004 genomes sequenced by the Consortium on Asthma among African-Ancestry Populations in the Americas (CAAPA). The size of this dataset, over 130 Terabytes, currently prevents it from being explored with alignment-based tools, and researchers instead are limited to using the much smaller files containing single-nucleotide variants. Our proposed software will make this dataset and others like it available for real- time searching, a capability that is not yet possible for any genomic database of this size. Since the early 1990s, scientists have used DNA sequence databases to study a wide range of problems, including novel gene discovery, mutation detection, the investigation of larger structural variants, and evolutionary processes. The ability to search all known genes and genomes using BLAST and similar programs has long been assumed, and sequence search engines throughout the world provide this ability. However, the vast size of the CAAPA dataset makes it impossible to search the data itself using current tools. One cannot look for specific mutations, extract and re-analyze data for any particular gene or regulatory region, or look for structural variants. Newer, fast next-generation sequence alignment programs such as Bowtie, originally developed in our group, allow far faster alignment of NGS reads to the genome, but even these programs cannot search data on the scale of CAAPA in real time. Different architectures need to be designed and built to accommodate these very large datasets. The CAAPA exploration system (CESYS) will use a combination of a highly efficient database, very fast storage, and fast search algorithms to achieve our goals. This project aims to accomplish several goals that will dramatically enhance the value of CAAPA. First, the data will be made available to a very large community of researchers, who can use it not only to study the genetics of asthma and allergy in the CAAPA populations, but also to compare these subjects to other groups. The data currently resides on hard drives and is available only to a small number of the project's PIs, a situation that limits its value. Second, b creating an authentication system consistent with dbGaP, we will create a data sharing model that other projects can use and that will remove some of the technical barriers to sharing genome data from human subjects. Third, as part of building the database, we will re-call all the SNPs using the newly released human genome build (hg20), creating a consistent set of variants that we will also share freely through the project database. Fourth, we will identify all bacterial contaminants, including those in a subset of subjects known to have bloodstream infections at the time of sample collection. Fifth, we will identify structural variants unique to he CAAPA population, which we can then explore for any association with the risk of asthma.
描述(由适用提供):我们建议创建新的软件和分析方法,旨在使探索独特的数据集,该数据集是由美洲非洲官方人群中哮喘联盟对1,004个基因组(CAAPA)中的1,004个基因组进行了测序。该数据集的大小超过130吨,目前阻止了它使用基于对齐的工具进行探索,而研究人员则仅限于使用包含单核苷酸变体的较小文件。我们提出的软件将使此数据集和其他类似的软件可用于实时搜索,这是此大小的任何基因组数据库都无法使用的功能。自1990年代初以来,科学家使用DNA序列数据库来研究广泛的问题,包括新的基因发现,突变检测,较大的结构变体的投资和进化过程。长期以来,已经假定了使用BLAST和类似程序搜索所有已知基因和基因组的能力,并且全世界的序列搜索引擎都提供了这种能力。但是,CAAPA数据集的巨大尺寸使得无法使用当前工具搜索数据本身。人们无法寻找特定的突变,提取和重新分析任何特定基因或调节区域的数据,或者寻找结构变体。最初在我们小组中开发的较新的,快速的下一代序列对准程序(例如Bowtie)可以使NGS读取的速度更快地读取基因组,但是即使这些程序也无法实时搜索CAAPA规模的数据。需要设计和构建不同的体系结构,以适应这些非常大的数据集。 CAAPA勘探系统(CESYS)将使用高效的数据库,非常快速的存储和快速搜索算法的组合来实现我们的目标。该项目旨在实现几个目标,这些目标将大大提高CAAPA的价值。首先,这些数据将提供给非常庞大的研究人员社区,他们不仅可以使用它来研究CAAPA人群中哮喘和过敏的遗传学,还可以将这些受试者与其他群体进行比较。目前,数据位于硬盘驱动器上,仅适用于项目的少数PI,这种情况限制了其价值。其次,B创建与DBGAP一致的身份验证系统,我们将创建一个数据共享模型,其他项目可以使用,这将消除从人类受试者共享基因组数据的一些技术障碍。第三,作为构建数据库的一部分,我们将使用新发布的人类基因组构建(HG20)重新呼唤所有SNP,从而创建一系列一致的变体,我们也将通过项目数据库自由共享。第四,我们将确定所有细菌污染物,包括在样本收集时已知患有血液感染的受试者的一部分。第五,我们将确定CAAPA人群独有的结构变体,然后我们可以探索与哮喘风险的任何关联。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Kathleen C Barnes其他文献
The CD14(−159) polymorphism is not associated with circulating sCD14 nor total serum IgE in an asthmatic population of African descent
- DOI:
10.1016/s0091-6749(02)81809-7 - 发表时间:
2002-01-01 - 期刊:
- 影响因子:
- 作者:
April Zambelli-Weiner;Bernadatte Gray;Paul N Levett;Raana P Naidu;Kathleen C Barnes - 通讯作者:
Kathleen C Barnes
Body mass index associates with asthma and respiratory symptoms but is not explained by diet in a caucasian isolate
- DOI:
10.1016/s0091-6749(02)81811-5 - 发表时间:
2002-01-01 - 期刊:
- 影响因子:
- 作者:
Kathyrn B Held;Rasika A Mathias;Kathleen C Barnes - 通讯作者:
Kathleen C Barnes
Kathleen C Barnes的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Kathleen C Barnes', 18)}}的其他基金
PRIDE Academy: Impact of Ancestry and Gender to omics of lung diseases
PRIDE Academy:血统和性别对肺部疾病组学的影响
- 批准号:
10077882 - 财政年份:2019
- 资助金额:
$ 45.09万 - 项目类别:
PRIDE Academy: Impact of Ancestry and Gender to omics of lung diseases
PRIDE Academy:血统和性别对肺部疾病组学的影响
- 批准号:
10378108 - 财政年份:2019
- 资助金额:
$ 45.09万 - 项目类别:
Multi-omic studies of asthma severity in an African ancestry population
非洲血统人群哮喘严重程度的多组学研究
- 批准号:
10094181 - 财政年份:2018
- 资助金额:
$ 45.09万 - 项目类别:
Multi-omic studies of asthma severity in an African ancestry population
非洲血统人群哮喘严重程度的多组学研究
- 批准号:
10331294 - 财政年份:2018
- 资助金额:
$ 45.09万 - 项目类别:
Multi-omic studies of asthma severity in an African ancestry population
非洲血统人群哮喘严重程度的多组学研究
- 批准号:
9522470 - 财政年份:2018
- 资助金额:
$ 45.09万 - 项目类别:
New Approaches for Empowering Studies of Asthma in Populations of African Descent
非洲人后裔哮喘研究的新方法
- 批准号:
9256781 - 财政年份:2016
- 资助金额:
$ 45.09万 - 项目类别:
A Software Framework for Exploring 1,000 Genomes of African Descent
用于探索 1,000 个非洲人后裔基因组的软件框架
- 批准号:
9301024 - 财政年份:2015
- 资助金额:
$ 45.09万 - 项目类别:
Integrative Genomics in Asthmatics of African Descent
非洲裔哮喘的综合基因组学
- 批准号:
9230688 - 财政年份:2014
- 资助金额:
$ 45.09万 - 项目类别:
The autophagic pathway and atopic asthma: role of IL-33 and ST2
自噬途径和特应性哮喘:IL-33 和 ST2 的作用
- 批准号:
8811919 - 财政年份:2014
- 资助金额:
$ 45.09万 - 项目类别:
Integrative Genomics in Asthmatics of African Descent
非洲裔哮喘的综合基因组学
- 批准号:
9244716 - 财政年份:2014
- 资助金额:
$ 45.09万 - 项目类别:
相似海外基金
Clinical decision support algorithm to optimize management of respiratory tract infection in children attending primary health facilities in Kilimanjaro Region, Tanzania
用于优化坦桑尼亚乞力马扎罗地区初级卫生机构儿童呼吸道感染管理的临床决策支持算法
- 批准号:
10734148 - 财政年份:2023
- 资助金额:
$ 45.09万 - 项目类别:
Moving Beyond the Individual- A Data-driven Approach to Improving the Evidence on the Role of Community and Societal Determinants of HIV among Adolescent Girls and Young Women in Sub-Saharan Africa
超越个人——采用数据驱动的方法来改善关于艾滋病毒在撒哈拉以南非洲地区少女和年轻妇女中的社区和社会决定因素的作用的证据
- 批准号:
10619319 - 财政年份:2023
- 资助金额:
$ 45.09万 - 项目类别:
DSpace: Utilizing Data Science to Predict and Improve Health Outcomes in Pediatric HIV
DSpace:利用数据科学预测和改善儿童艾滋病毒的健康结果
- 批准号:
10749123 - 财政年份:2023
- 资助金额:
$ 45.09万 - 项目类别:
Leveraging artificial intelligence/machine learning-based technology to overcome specialized training and technology barriers for the diagnosis and prognostication of colorectal cancer in Africa
利用基于人工智能/机器学习的技术克服非洲结直肠癌诊断和预测的专业培训和技术障碍
- 批准号:
10712793 - 财政年份:2023
- 资助金额:
$ 45.09万 - 项目类别:
Training of machine learning algorithms for the classification of accelerometer-measured bednet use and related behaviors associated with malaria risk
训练机器学习算法,用于对加速计测量的蚊帐使用和与疟疾风险相关的相关行为进行分类
- 批准号:
10727374 - 财政年份:2023
- 资助金额:
$ 45.09万 - 项目类别: