Computational Methods for Phenotype Prediction to Assist Plant Breeding

辅助植物育种的表型预测计算方法

基本信息

  • 批准号:
    RGPIN-2021-04056
  • 负责人:
  • 金额:
    $ 1.75万
  • 依托单位:
  • 依托单位国家:
    加拿大
  • 项目类别:
    Discovery Grants Program - Individual
  • 财政年份:
    2021
  • 资助国家:
    加拿大
  • 起止时间:
    2021-01-01 至 2022-12-31
  • 项目状态:
    已结题

项目摘要

To meet the food demands of an increasing population, crop breeding efficiency needs to be substantially improved. The long-term goal of my research program is to develop a suite of Artificial Intelligence tools to assist plant breeding by utilizing the whole genome level information (genotype) as well as environmental factors. The short-term goals in the next five years are to predict crops' physical properties or traits (phenotype) using genotype data only. This will provide breeders with effective trait selections and accelerate their breeding programs. Phenotype prediction is challenging because the number of features in genotype data is significantly more than the number of samples. Existing methods usually require extraordinarily large computational resources or fail to find the linkage between genotype and phenotype. In this proposal, multiple strategies will be developed to reduce the number of features, increase the sample size, and improve the prediction accuracy. Objective 1 is to reduce the number of features in the genotype data using advanced sampling algorithms. We will modify existing algorithms to make them suitable for large imbalanced data (like the plant) without creating selection bias. The performance of the algorithms will be evaluated by Arabidopsis thaliana, lentil, and wheat data. The resulting features will serve as a feasible input to a prediction algorithm with modest computational resources required. Objective 2 is to increase the sample size by developing a synthetic data generator that can produce data with similar characteristics to plant data. A set of statistical criteria will be developed to measure the similarities between synthetic and real data. The synthetic data will be generated using a practical machine learning model. The data generator will provide sufficient training data (together with results from Objective 1) for the phenotype prediction model. It can also help reduce the need to establish extremely large collections of plant genotype/phenotype data. Objective 3 is to incorporate results from Objectives 1&2 and predict plant phenotypes using a Deep Learning (DL) model. The model developed from my previous work has proven to be effective on bacteria data (which has a small number of features) and will be modified to fit for the plant data. Further, an interpretation layer will be added to the DL model to explain the results. Domain experts can read into the interpretable information and validate the predictions. The impact will be in three aspects: 1. It will advance the development of data comparison standards, by providing a set of statistical criteria for similarity measurement. 2. It will speed up and enhance the selection in plant breeding, by suggesting genomic characteristics that are reliably associated with plant phenotypes. 3. It will contribute to solving the problems that have large-feature-small-sample data, by developing DL models that can be generalized to other fields.
为了满足日益增长的人口的粮食需求,需要大幅提高作物育种效率。我的研究计划的长期目标是开发一套人工智能工具,通过利用全基因组水平信息(基因型)以及环境因素来辅助植物育种。未来五年的短期目标是仅使用基因型数据预测作物的物理特性或性状(表型)。这将为育种者提供有效的性状选择,并加速他们的育种计划。表型预测是具有挑战性的,因为基因型数据中的特征数量显著多于样本数量。现有的方法通常需要非常大的计算资源或无法找到基因型和表型之间的联系。在本提案中,将开发多种策略来减少特征数量,增加样本量,并提高预测精度。目标1是使用先进的抽样算法减少基因型数据中的特征数量。我们将修改现有的算法,使其适用于大型不平衡数据(如植物),而不会产生选择偏差。算法的性能将通过拟南芥、小扁豆和小麦的数据进行评估。由此产生的功能将作为一个可行的输入预测算法与适度的计算资源所需的。 目标2是通过开发一个合成数据生成器来增加样本量,该生成器可以生成与植物数据具有相似特征的数据。将制定一套统计标准,以衡量合成数据和真实的数据之间的相似性。合成数据将使用实用的机器学习模型生成。数据生成器将为表型预测模型提供足够的训练数据(以及目标1的结果)。它还可以帮助减少建立非常大的植物基因型/表型数据集合的需要。目标3是整合目标1和2的结果,并使用深度学习(DL)模型预测植物表型。从我以前的工作中开发的模型已被证明对细菌数据(具有少量特征)有效,并将进行修改以适应植物数据。此外,将在DL模型中添加解释层以解释结果。领域专家可以读取可解释的信息并验证预测。影响将表现在三个方面:1.它将通过提供一套相似性衡量的统计标准,推动数据比较标准的制定。2.它将通过提示与植物表型可靠相关的基因组特征来加速和增强植物育种中的选择。3.这将有助于解决具有大特征小样本数据的问题,通过开发可以推广到其他领域的DL模型。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Yan, Yan其他文献

Angong Niuhuang Pill as adjuvant therapy for treating acute cerebral infarction and intracerebral hemorrhage: A meta-analysis of randomized controlled trials
  • DOI:
    10.1016/j.jep.2019.03.043
  • 发表时间:
    2019-06-12
  • 期刊:
  • 影响因子:
    5.4
  • 作者:
    Liu, Hanwei;Yan, Yan;Shan, Hong
  • 通讯作者:
    Shan, Hong
MiR-130a-3p regulates FUNDC1-mediated mitophagy by targeting GJA1 in myocardial ischemia/reperfusion injury.
  • DOI:
    10.1038/s41420-023-01372-7
  • 发表时间:
    2023-02-25
  • 期刊:
  • 影响因子:
    7
  • 作者:
    Yan, Yan;Tian, Liu-yang;Jia, Qian;Han, Yang;Tian, Yu;Chen, Hui-ning;Cui, Sai-jia;Xi, Jie;Yao, Yong-ming;Zhao, Xiao-jing
  • 通讯作者:
    Zhao, Xiao-jing
Perturb and optimize users' location privacy using geo-indistinguishability and location semantics.
  • DOI:
    10.1038/s41598-022-24893-0
  • 发表时间:
    2022-11-28
  • 期刊:
  • 影响因子:
    4.6
  • 作者:
    Yan, Yan;Xu, Fei;Mahmood, Adnan;Dong, Zhuoyue;Sheng, Quan Z.
  • 通讯作者:
    Sheng, Quan Z.
Release behavior of nano-silver textiles in simulated perspiration fluids
纳米银纺织品在模拟汗液中的释放行为
  • DOI:
    10.1177/0040517512439922
  • 发表时间:
    2012-03
  • 期刊:
  • 影响因子:
    2.3
  • 作者:
    Yan, Yan;Yang, Haifeng;Li, Junfang;Lu, Xiaojing;Wang, Chao
  • 通讯作者:
    Wang, Chao
Intraoperative low-dose dopamine is associated with worse survival in patients with hepatocellular carcinoma: A propensity score matching analysis.
术中低剂量多巴胺与肝细胞癌患者较差的生存率相关:倾向评分匹配分析
  • DOI:
    10.3389/fonc.2022.947172
  • 发表时间:
    2022
  • 期刊:
  • 影响因子:
    4.7
  • 作者:
    Wang, Yan;Xue, Ruifeng;Xing, Wei;Li, Qiang;Gei, Liba;Yan, Fang;Mai, Dongmei;Zeng, Weian;Yan, Yan;Chen, Dongtai
  • 通讯作者:
    Chen, Dongtai

Yan, Yan的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Yan, Yan', 18)}}的其他基金

Computational Methods for Phenotype Prediction to Assist Plant Breeding
辅助植物育种的表型预测计算方法
  • 批准号:
    RGPIN-2021-04056
  • 财政年份:
    2022
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Computational Methods for Phenotype Prediction to Assist Plant Breeding
辅助植物育种的表型预测计算方法
  • 批准号:
    DGECR-2021-00348
  • 财政年份:
    2021
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Launch Supplement

相似国自然基金

Computational Methods for Analyzing Toponome Data
  • 批准号:
    60601030
  • 批准年份:
    2006
  • 资助金额:
    17.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

III: Small: Computational Methods for Multi-dimensional Data Integration to Improve Phenotype Prediction
III:小:多维数据集成的计算方法以改进表型预测
  • 批准号:
    2246796
  • 财政年份:
    2023
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Standard Grant
Statistical Methods for Inferring Gene-Phenotype Associations Using Omic Data from Gene Knockout and Human Phenotype Studies
使用基因敲除和人类表型研究的组学数据推断基因表型关联的统计方法
  • 批准号:
    10733165
  • 财政年份:
    2023
  • 资助金额:
    $ 1.75万
  • 项目类别:
Data-Driven Computational Methods Utilizing Artificial Intelligence to Optimize Phenotype in Human Induced Pluripotent Stem Cell Bioprocessing
利用人工智能优化人类诱导多能干细胞生物加工表型的数据驱动计算方法
  • 批准号:
    546785-2020
  • 财政年份:
    2022
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Postgraduate Scholarships - Doctoral
Computational Methods for Phenotype Prediction to Assist Plant Breeding
辅助植物育种的表型预测计算方法
  • 批准号:
    RGPIN-2021-04056
  • 财政年份:
    2022
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Grants Program - Individual
Federated and transfer learning methods for cross-ancestry and cross-phenotype integration of genomic datasets
用于基因组数据集跨血统和跨表型整合的联合和迁移学习方法
  • 批准号:
    10564023
  • 财政年份:
    2022
  • 资助金额:
    $ 1.75万
  • 项目类别:
PheBC: bias correction methods for EHR derived phenotype
PheBC:EHR 衍生表型的偏差校正方法
  • 批准号:
    10471166
  • 财政年份:
    2021
  • 资助金额:
    $ 1.75万
  • 项目类别:
Biology-aware machine learning methods for characterizing microbiome genotype and phenotype
用于表征微生物组基因型和表型的生物学感知机器学习方法
  • 批准号:
    10696960
  • 财政年份:
    2021
  • 资助金额:
    $ 1.75万
  • 项目类别:
PheBC: bias correction methods for EHR derived phenotype
PheBC:EHR 衍生表型的偏差校正方法
  • 批准号:
    10839649
  • 财政年份:
    2021
  • 资助金额:
    $ 1.75万
  • 项目类别:
Computational Methods for Phenotype Prediction to Assist Plant Breeding
辅助植物育种的表型预测计算方法
  • 批准号:
    DGECR-2021-00348
  • 财政年份:
    2021
  • 资助金额:
    $ 1.75万
  • 项目类别:
    Discovery Launch Supplement
Biology-aware machine learning methods for characterizing microbiome genotype and phenotype
用于表征微生物组基因型和表型的生物学感知机器学习方法
  • 批准号:
    10275055
  • 财政年份:
    2021
  • 资助金额:
    $ 1.75万
  • 项目类别:
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了