Next generation imputation for huge data sets
大数据集的下一代插补
基本信息
- 批准号:BB/L020726/1
- 负责人:
- 金额:$ 59.29万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2014
- 资助国家:英国
- 起止时间:2014 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Knowledge gained from genome sequencing has great potential for increasing the direction and rate of genetic change in livestock breeding, and biological discovery in animal science. However huge numbers of individuals will need to be sequenced to unlock this potential, and the current cost of sequencing for livestock is several hundreds or thousands of pounds per individual. This will remain a barrier for using this data routinely until the unit cost is of the order of tens of pounds. One promising approach to reducing costs whilst maintaining the quality of the resulting data is to use technology called next-generation sequencing with low coverage (lcNGS). With lcNGS, large numbers of individuals can have their sequences sampled at low cost per individual, but each individual sequence will have substantial missing information. Accuracy is restored by inferring missing data using a process known as imputation. In livestock this process is made more efficient by pedigree structures in livestock populations.Imputation using single nucleotide polymorphism (SNP) data from chips has been successfully applied in livestock. However, these methods are not optimal for the imputation from lcNGS data for several reasons. (i) SNP-chip genotypes are highly accurate and data points are missing only occasionally due to technical issues. In contrast, lcNGS data has much less certainty over the true genotype at a particular locus, and the missing data is randomly spread over the whole genome. (ii) SNP-chip genotypes cover only a small fraction of the genetic variation present in the genome in comparison to sequence data, so the computational techniques for imputing sequence data need to be much more efficient for practical use. (iii) The range of the data produced by lcNGS is rapidly evolving, requiring next-generation imputation algorithms to be very flexible. The imputation algorithm proposed will address these issues from a novel direction by combining two approaches: heuristic and probabilistic. Heuristic algorithms use basic principles of inheritance and so are fast, and accurate. They are well-suited to animal breeding since they use pedigree to make inferences from the abundance of closely-related individuals from large families, with large portions of the genome shared between pairs of individuals. However, heuristic methods can fail if such data is lacking or is unreliable across all or parts of the genome. Probabilistic algorithms primarily use Hidden Markov Models to mimic inheritance statistically and are computationally more demanding, slower, and inherently less accurate than heuristic algorithms. They have been developed primarily for application to human populations in which the pedigree structures, for example small sibships, are not well-suited to exploiting the power of heuristic algorithms. The proposed algorithm will obtain synergy from combining the two approaches as they have complementary strengths in the recovery of information and computational efficiency.The overall objective is therefore to develop a generic imputation system that is capable of imputing in data sets of the order of millions of animals, can cope with the wide variety of data types that may appear from lcNGS. New heuristic approaches will be adopted to develop data that can be integrated with probabilistic approaches and combined into a novel hybrid algorithm. Efficient data handling and storage frameworks, and a user interface will be developed to ensure the algorithm is computationally efficient, easy-to-use, and readily available to users. The algorithm will be benchmarked using a range of real and simulated data sets and historical, real SNP-chip data to ensure it remains backwards compatible to current or previous technology. The availability of the algorithm will enable breeders to accumulate sequence data on millions of animals at low unit cost, and in turn prompt greater accuracy of selection and innovation in breeding goals.
从基因组测序中获得的知识对于提高家畜育种中遗传变化的方向和速度以及动物科学中的生物学发现具有巨大的潜力。然而,为了释放这一潜力,需要对大量个体进行测序,而目前对牲畜进行测序的成本为每个个体数百或数千英镑。在单位成本降至数十英镑之前,这仍然是常规使用这些数据的障碍。在保持结果数据质量的同时降低成本的一个有希望的方法是使用一种称为低覆盖下一代测序(lcNGS)的技术。使用lcNGS,可以以较低的个体成本对大量个体的序列进行采样,但是每个个体的序列都会有大量的缺失信息。通过使用一种称为imputation的过程来推断缺失的数据,从而恢复准确性。在牲畜中,家畜种群的系谱结构使这一过程更加有效。利用芯片中的单核苷酸多态性(SNP)数据进行基因植入已成功应用于家畜。然而,由于一些原因,这些方法对于lcNGS数据的imputation并不是最优的。(i) snp芯片基因型具有很高的准确性,由于技术问题,数据点只是偶尔丢失。相比之下,lcNGS数据对特定位点的真实基因型的确定性要低得多,而且缺失的数据是随机分布在整个基因组中的。(ii)与序列数据相比,snp芯片基因型只覆盖了基因组中存在的一小部分遗传变异,因此用于输入序列数据的计算技术需要在实际应用中更加高效。lcNGS产生的数据范围正在迅速发展,要求下一代数据输入算法非常灵活。该算法结合了启发式和概率性两种方法,从一个新的方向解决了这些问题。启发式算法利用了基本的继承原理,因此速度快、精度高。它们非常适合动物育种,因为它们利用系谱从大家庭中大量的近亲个体中进行推断,在成对个体之间共享大部分基因组。然而,如果在整个或部分基因组中缺乏这样的数据或不可靠,启发式方法可能会失败。概率算法主要使用隐马尔可夫模型来模拟统计上的继承,并且在计算上比启发式算法要求更高、速度更慢,而且本质上更不准确。它们的开发主要是为了应用于谱系结构不太适合利用启发式算法的人群,例如小的兄弟姐妹。由于两种方法在信息恢复和计算效率方面具有互补的优势,因此本算法将获得协同效应。因此,总体目标是开发一种通用的输入系统,能够输入数以百万计的动物数据集,能够处理可能从lcNGS中出现的各种各样的数据类型。将采用新的启发式方法来开发数据,这些数据可以与概率方法相结合,并组合成一种新的混合算法。将开发有效的数据处理和存储框架,以及用户界面,以确保算法的计算效率,易于使用,并且随时可供用户使用。该算法将使用一系列真实和模拟数据集以及历史、真实的snp芯片数据进行基准测试,以确保它与当前或以前的技术保持向后兼容。该算法的可用性将使育种者能够以较低的单位成本积累数百万只动物的序列数据,从而提高选择的准确性和育种目标的创新。
项目成果
期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
MOESM1 of Potential of gene drives with genome editing to increase genetic gain in livestock breeding programs
MOESM1 的基因驱动与基因组编辑增加牲畜育种计划遗传增益的潜力
- DOI:10.6084/m9.figshare.c.3666634_d1
- 发表时间:2017
- 期刊:
- 影响因子:0
- 作者:Gonen S
- 通讯作者:Gonen S
A hybrid method for the imputation of genomic data in livestock populations.
- DOI:10.1186/s12711-017-0300-y
- 发表时间:2017-03-03
- 期刊:
- 影响因子:0
- 作者:Antolín R;Nettelblad C;Gorjanc G;Money D;Hickey JM
- 通讯作者:Hickey JM
MOESM5 of A hybrid method for the imputation of genomic data in livestock populations
MOESM5 家畜种群基因组数据插补的混合方法
- DOI:10.6084/m9.figshare.c.3708046_d5
- 发表时间:2017
- 期刊:
- 影响因子:0
- 作者:AntolAN R
- 通讯作者:AntolAN R
A family-based phasing algorithm for sequence data
基于家族的序列数据定相算法
- DOI:10.1101/504480
- 发表时间:2018
- 期刊:
- 影响因子:0
- 作者:Battagin M
- 通讯作者:Battagin M
Effect of manipulating recombination rates on response to selection in livestock breeding programs.
- DOI:10.1186/s12711-016-0221-1
- 发表时间:2016-06-22
- 期刊:
- 影响因子:0
- 作者:Battagin M;Gorjanc G;Faux AM;Johnston SE;Hickey JM
- 通讯作者:Hickey JM
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
John Hickey其他文献
Spatial Dissection of the Bone Marrow Microenvironment in Multiple Myeloma By High Dimensional Multiplex Tissue Imaging
- DOI:
10.1182/blood-2023-189255 - 发表时间:
2023-11-02 - 期刊:
- 影响因子:
- 作者:
Marc-Andrea Baertsch;Alexander Brobeil;John Hickey;Maximilian Haist;Alexandra Maria Poos;Guolan Lu;Wilson Kuswanto;Christian Schuerch;Harald Voehringer;Wolfgang Huber;Gunhild Mechtersheimer;Carsten Mueller-Tidow;Peter Schirmacher;Katja Weisel;Roland Fenk;Hartmut Goldschmidt;Yury Goltsev;Marc S. Raab;Niels Weinhold;Garry P. Nolan - 通讯作者:
Garry P. Nolan
Colonisation of clearfelled coupes by rainforest tree species from mature mixed forest edges, Tasmania, Australia
- DOI:
10.1016/j.foreco.2006.11.021 - 发表时间:
2007-03-15 - 期刊:
- 影响因子:
- 作者:
John Tabor;Chris McElhinny;John Hickey;Jeff Wood - 通讯作者:
Jeff Wood
John Hickey的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('John Hickey', 18)}}的其他基金
A general method for the imputation of genomic data in crop species
作物物种基因组数据估算的通用方法
- 批准号:
BB/R002061/1 - 财政年份:2017
- 资助金额:
$ 59.29万 - 项目类别:
Research Grant
Analysis of quantitative genetic traits in a huge data set
海量数据集中的数量遗传性状分析
- 批准号:
BB/N006178/1 - 财政年份:2016
- 资助金额:
$ 59.29万 - 项目类别:
Research Grant
15AGRITECHCAT3 Precision Breeding: Broilers from Sequence to Consequence
15AGRITECHCAT3 精准育种:肉鸡从顺序到结果
- 批准号:
BB/N004728/1 - 财政年份:2015
- 资助金额:
$ 59.29万 - 项目类别:
Research Grant
Developing next generation genetic improvement tools from next generation sequencing
通过下一代测序开发下一代遗传改良工具
- 批准号:
BB/M009254/1 - 财政年份:2015
- 资助金额:
$ 59.29万 - 项目类别:
Research Grant
15AGRITECHCAT3 Innovative NextGen pig breeding using DNA sequence data
15AGRITECHCAT3 使用 DNA 序列数据的创新下一代猪育种
- 批准号:
BB/N004736/1 - 财政年份:2015
- 资助金额:
$ 59.29万 - 项目类别:
Research Grant
NIRG: FARSPhase: a Flexible, widely Applicable, Robust, and Scalable phasing algorithm for human genetics
NIRG:FARSPhase:一种灵活、广泛适用、稳健且可扩展的人类遗传学定相算法
- 批准号:
MR/M000370/1 - 财政年份:2015
- 资助金额:
$ 59.29万 - 项目类别:
Research Grant
相似国自然基金
细胞周期蛋白依赖性激酶Cdk1介导卵母细胞第一极体重吸收致三倍体发生的调控机制研究
- 批准号:82371660
- 批准年份:2023
- 资助金额:49.00 万元
- 项目类别:面上项目
Next Generation Majorana Nanowire Hybrids
- 批准号:
- 批准年份:2020
- 资助金额:20 万元
- 项目类别:
二次谐波非线性光学显微成像用于前列腺癌的诊断及药物疗效初探
- 批准号:30470495
- 批准年份:2004
- 资助金额:20.0 万元
- 项目类别:面上项目
相似海外基金
CAREER: Real-Time First-Principles Approach to Understanding Many-Body Effects on High Harmonic Generation in Solids
职业:实时第一性原理方法来理解固体高次谐波产生的多体效应
- 批准号:
2337987 - 财政年份:2024
- 资助金额:
$ 59.29万 - 项目类别:
Continuing Grant
Collaborative Research: Constraining next generation Cascadia earthquake and tsunami hazard scenarios through integration of high-resolution field data and geophysical models
合作研究:通过集成高分辨率现场数据和地球物理模型来限制下一代卡斯卡迪亚地震和海啸灾害情景
- 批准号:
2325311 - 财政年份:2024
- 资助金额:
$ 59.29万 - 项目类别:
Standard Grant
RII Track-4:NSF: In-Situ/Operando Characterizations of Single Atom Catalysts for Clean Fuel Generation
RII Track-4:NSF:用于清洁燃料生成的单原子催化剂的原位/操作表征
- 批准号:
2327349 - 财政年份:2024
- 资助金额:
$ 59.29万 - 项目类别:
Standard Grant
ERI: Non-Contact Ultrasound Generation and Detection for Tissue Functional Imaging and Biomechanical Characterization
ERI:用于组织功能成像和生物力学表征的非接触式超声波生成和检测
- 批准号:
2347575 - 财政年份:2024
- 资助金额:
$ 59.29万 - 项目类别:
Standard Grant
SBIR Phase II: Thermally-optimized power amplifiers for next-generation telecommunication and radar
SBIR 第二阶段:用于下一代电信和雷达的热优化功率放大器
- 批准号:
2335504 - 财政年份:2024
- 资助金额:
$ 59.29万 - 项目类别:
Cooperative Agreement
CAREER: Next-generation Logic, Memory, and Agile Microwave Devices Enabled by Spin Phenomena in Emergent Quantum Materials
职业:由新兴量子材料中的自旋现象实现的下一代逻辑、存储器和敏捷微波器件
- 批准号:
2339723 - 财政年份:2024
- 资助金额:
$ 59.29万 - 项目类别:
Continuing Grant
CAREER: Securing Next-Generation Transportation Infrastructure: A Traffic Engineering Perspective
职业:保护下一代交通基础设施:交通工程视角
- 批准号:
2339753 - 财政年份:2024
- 资助金额:
$ 59.29万 - 项目类别:
Standard Grant
CAREER: Ultralow phase noise signal generation using Kerr-microresonator optical frequency combs
职业:使用克尔微谐振器光学频率梳生成超低相位噪声信号
- 批准号:
2340973 - 财政年份:2024
- 资助金额:
$ 59.29万 - 项目类别:
Continuing Grant
Next-Generation Distributed Graph Engine for Big Graphs
适用于大图的下一代分布式图引擎
- 批准号:
DP240101322 - 财政年份:2024
- 资助金额:
$ 59.29万 - 项目类别:
Discovery Projects
Next Generation Fluorescent Tools for Measuring Autophagy Dynamics in Cells
用于测量细胞自噬动态的下一代荧光工具
- 批准号:
DP240100465 - 财政年份:2024
- 资助金额:
$ 59.29万 - 项目类别:
Discovery Projects