权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Understanding recombination through tractable statistical analysis of whole genome sequences

通过对全基因组序列进行易于处理的统计分析来了解重组

基本信息

批准号：
BB/N00874X/1
负责人：
Richard Everitt
金额：
$ 32.98万
依托单位：
University of Reading
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2016
资助国家：
英国
起止时间：
2016 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=BB%2FN00874X%2F1
关键词：
Understanding recombination through tractable statistical

项目摘要

This project concerns the analysis of whole genome sequence data: that is, the complete DNA sequence, the genetic code, of an organism. The technology for acquiring such a sequence is relatively new. It was used to sequence a single human genome in the "Human Genome Project", which finished in 2003. This project took 13 years and cost $2.7 billion. However, technological advances in the past 10 years have led to the cost of sequencing genomes to drop dramatically. It now costs only $1,000 to sequence a human genome and this price continues to fall.Why should one wish to sequence a genome? The human genome can be thought of as the blueprint for building a human. Each person has a slightly different genome: the parts that are common to everyone are what make us human; the parts that differ are responsible for the (genetic) differences between us. Understanding our genomes, through studying both the common parts of the genome and the differences, promises scientific breakthroughs in many areas, particularly medicine. For example, Genomics England are currently planning to sequence 100,000 genomes for the purposes of improving clinical practice in dealing with rare disease, cancer and infectious disease.The study of DNA sequences is not restricted to humans. It is also useful to obtain whole genome sequences from many other living things. This project is focussed on analysing genetic information obtained from bacteria. There are many reasons for studying bacteria, one of the most obvious being that some bacteria cause disease in plants and livestock (affecting our food supply) and in humans (affecting our health). Obtaining a better understanding of the genetic code of bacteria offers the promise of both tracking the spread of infections and also reducing the occurrence of disease.However, although sequence data is relatively easy to obtain, extracting useful information from it can be very difficult. Genome sequences can be stored on computers in text files as a long sequence of letters. A single gene might consist of a few hundred or thousand letters. A whole bacterial genome, containing thousands of genes, might be 3 million letters long. Most scientific projects involve studying a population (tens, hundreds or thousands) of these genomes, thus it is not unusual for a dataset to consist of over a billion letters! To make sense of such a large complicated data set, mathematical methods, implemented as computer programs, are required.This project is concerned with the development of such mathematical methods, and their implementation. The aim of the project is to learn about the evolution of bacteria by studying their genome sequences. As a rule, bacteria reproduce clonally: each individual only has a single parent. However, in some cases they can also exchange DNA, in a manner related to sexual reproduction in humans. It is of great scientific interest to understand such exchanges for several reasons, including that it is one of the main ways in which a bacteria can become resistant to antibiotics. MRSA is an example of an antibiotic resistant bacterial strain that has been of significant concern to the NHS. Understanding how antibiotic resistance is acquired is one way in which scientists can help to tackle such problems. Analysing whole genome sequences using mathematical methods, as is done in this project, is fundamental to these investigations.Currently there are several computer programs that can be used to investigate the exchange of DNA, and important discoveries have been made through using them. However, when they are used on whole genome sequences, some of the programs run too slowly to be useful in many cases (sometimes they take months) and others cannot detect (or provide an incomplete picture) of genetic exchange events. This project is developing new programs that are both accurate, and run quickly enough to be useful.

该项目涉及全基因组序列数据的分析：即生物体的完整DNA序列，即遗传密码。获取这种序列的技术相对较新。在2003年完成的“人类基因组计划”中，它被用来对单个人类基因组进行测序。该项目历时13年，耗资27亿美元。然而，过去10年的技术进步使基因组测序成本大幅下降。现在，测定一个人类基因组的序列只需1,000美元，而且这个价格还在继续下降。人类基因组可以被认为是构建人类的蓝图。每个人都有一个稍微不同的基因组：每个人都有共同的部分是我们人类的组成部分;不同的部分是我们之间（遗传）差异的原因。了解我们的基因组，通过研究基因组的共同部分和差异，有望在许多领域取得科学突破，特别是医学。例如，英国基因组公司目前正计划对10万个基因组进行测序，以改善罕见疾病、癌症和传染病的临床实践。从许多其他生物中获得全基因组序列也是有用的。该项目的重点是分析从细菌中获得的遗传信息。研究细菌的原因有很多，其中最明显的一个是，一些细菌会导致植物和牲畜（影响我们的食物供应）以及人类（影响我们的健康）的疾病。更好地了解细菌的遗传密码，不仅可以追踪感染的传播，还可以减少疾病的发生。然而，尽管序列数据相对容易获得，但从中提取有用的信息可能非常困难。基因组序列可以存储在计算机上的文本文件作为一个长序列的字母。一个基因可能由几百或几千个字母组成。一个完整的细菌基因组，包含数千个基因，可能有300万个字母长。大多数科学项目涉及研究这些基因组的群体（数十，数百或数千），因此数据集由超过十亿个字母组成并不罕见!为了理解如此庞大而复杂的数据集，需要用计算机程序实现数学方法。本项目关注的是这种数学方法的发展及其实现。该项目的目的是通过研究细菌的基因组序列来了解细菌的进化。一般来说，细菌是无性繁殖的：每个个体只有一个亲本。然而，在某些情况下，它们也可以以与人类有性生殖有关的方式交换DNA。了解这种交换具有很大的科学意义，原因有几个，包括它是细菌对抗生素产生耐药性的主要方式之一。MRSA是一种抗生素耐药菌株的例子，它一直受到NHS的严重关注。了解抗生素耐药性是如何获得的是科学家可以帮助解决这些问题的一种方法。利用数学方法分析整个基因组序列是这些研究的基础，正如本项目所做的那样。目前有几种计算机程序可以用来研究DNA的交换，并且通过使用它们已经取得了重要的发现。然而，当它们用于全基因组序列时，一些程序运行得太慢，在许多情况下都没有用（有时需要几个月），而另一些程序则无法检测到（或提供不完整的图片）基因交换事件。这个项目正在开发新的程序，既准确，运行速度快，是有用的。