PLUTo: Phyloinformatic Literature Unlocking Tools. Software for making published phyloinformatic data discoverable, open, and reusable
PLUTo:系统信息学文献解锁工具。
基本信息
- 批准号:BB/K015702/1
- 负责人:
- 金额:$ 15.13万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2014
- 资助国家:英国
- 起止时间:2014 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Phylogenetic data, and the trees inferred from them, represent a hugely valuable resource for evolutionary biological research. The data are often expensive and time-consuming to acquire, and the results from analyses of these data - typically trees - represent a vast investment of effort and expertise across the global community of bioinformaticians and systematists. Trees, and their underlying character data, are often repurposed in other areas of biology; notably in evolutionary studies that seek to test patterns of genomic evolution or macroevolutionary trends. Despite their enormous value, recent research by the PDRA estimates that less than 4% of the phylogenetic trees published in 2010 are available in machine-readable form.Our proposal stands at the leading edge of content mining technology. We will create Open Source 'data liberation' software tools that will allow us to unlock the greater proportion of phyloinformatic data from where they are currently buried in the literature. These will include phylogenetic trees, branch lengths and support values (extracted from the SVG content of PDF files), analytical methods and indices of data quality (from figure legends and the main body of the text) and the underlying molecular and morphological character data. We will also derive full bibliographic and geographical data for each source paper. We will test, refine and perfect these tools by applying them to PLoS, BMC, Elsevier, Wiley and Springer online content from the 21st Century. Once the data are extracted, we will ensure that their immense interdisciplinary (evolutionary biology, ecology, ethology, palaeobiology and conservation) and legacy potential is realised by making them available online in an explicitly open manner. We will also use the data ourselves in order to address several related questions concerning research effort, phyloinfomatic data quality and the progress of systematic research.While there is renewed interest and emphasis on curating underlying research data and results (exemplified by projects such as TreeBASE, Dryad, BMC's partnership with LabArchives, and FigShare) these ventures rely upon author submission, which is rarely mandated by journals. Uptake has been slow and coverage is woeful. The data archiving success of NCBI/GenBank for nucleotide sequences (N.B., not alignments, trees or other results, and certainly not morphology) is the exception rather than the rule in the Biological Sciences. For the foreseeable future, therefore, there is a pressing need to retrospectively gather data from the published literature.This project is extremely novel in its scale and ambition. If successful in re-extracting the majority of phylogenetic data from the last decade, the software will easily be adapted and modified by others to suit the data re-extraction needs of other areas of science. This will better harness the billions of pounds of research money hitherto invested into obtaining and analyzing data, only for it to have been locked down and subsequently obfuscated in PDF publications when projects are completed. The project is also widely trans-disciplinary, bringing together a macroevolutionary phylogeneticist (Wills), a chemoinformaticist (Murray-Rust), and a young, up-coming Researcher (Mounce). The potential wider benefits of this project are vast and diverse; content mining techniques are estimated to be capable of generating up to £200 billion annually in added value for Europe alone. We cannot claim to generate those benefits directly, but we will create open tools and generate open data that will greatly facilitate other commercial, industrial and academic ventures.
系统发育数据以及从中推断出的树是进化生物学研究的宝贵资源。获取这些数据通常既昂贵又耗时,而这些数据(通常是树)的分析结果代表了全球生物信息学家和系统学家社区的巨大努力和专业知识投入。树木及其潜在的特征数据经常在生物学的其他领域被重新利用。特别是在旨在测试基因组进化模式或宏观进化趋势的进化研究中。尽管价值巨大,但 PDRA 最近的研究估计,2010 年发布的系统发育树中,只有不到 4% 是以机器可读形式提供的。我们的提案处于内容挖掘技术的前沿。我们将创建开源“数据解放”软件工具,使我们能够从目前埋藏在文献中的系统信息学数据中解开更大比例的数据。这些将包括系统发育树、分支长度和支持值(从 PDF 文件的 SVG 内容中提取)、分析方法和数据质量指数(来自图形图例和文本主体)以及基础分子和形态特征数据。我们还将获得每篇源论文的完整书目和地理数据。我们将通过将这些工具应用于 PLoS、BMC、Elsevier、Wiley 和 Springer 21 世纪的在线内容来测试、改进和完善这些工具。一旦提取数据,我们将通过以明确开放的方式在线提供它们,确保实现其巨大的跨学科(进化生物学、生态学、动物行为学、古生物学和保护)和遗产潜力。我们还将自己使用这些数据来解决有关研究工作、系统信息数据质量和系统研究进展的几个相关问题。虽然人们对整理基础研究数据和结果重新产生了兴趣和重视(例如 TreeBASE、Dryad、BMC 与 LabArchives 和 FigShare 等项目的例子),但这些项目依赖于作者提交,而期刊很少强制要求作者提交。吸收速度缓慢,覆盖范围也很糟糕。 NCBI/GenBank 的核苷酸序列数据归档成功(注意,不是比对、树或其他结果,当然也不是形态学)是生物科学领域的例外而不是规则。因此,在可预见的未来,迫切需要从已发表的文献中回顾性地收集数据。该项目的规模和雄心都极其新颖。如果成功地重新提取过去十年的大部分系统发育数据,该软件将很容易被其他人改编和修改,以适应其他科学领域的数据重新提取需求。这将更好地利用迄今为止投资于获取和分析数据的数十亿英镑的研究资金,但当项目完成时,这些数据被锁定并随后在 PDF 出版物中进行混淆。该项目还具有广泛的跨学科性,汇集了宏观进化系统发生学家(Wills)、化学信息学家(Murray-Rust)和年轻的新兴研究员(Mounce)。该项目的潜在更广泛利益是巨大且多样的;据估计,内容挖掘技术每年仅能为欧洲创造高达 2000 亿英镑的附加值。我们不能声称直接产生这些好处,但我们将创建开放工具并生成开放数据,这将极大地促进其他商业、工业和学术企业的发展。
项目成果
期刊论文数量(10)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Phylogenetic incongruence and homoplasy in the appendages and bodies of arthropods: why broad character sampling is best
- DOI:10.1093/zoolinnean/zlz024
- 发表时间:2019-09-01
- 期刊:
- 影响因子:2.8
- 作者:Brinkworth, Andrew R.;Sansom, Robert;Wills, Matthew A.
- 通讯作者:Wills, Matthew A.
Ecological Transitions and the Shape of the Decapod Tree of Life
- DOI:10.1093/icb/icac052
- 发表时间:2022-05-24
- 期刊:
- 影响因子:2.6
- 作者:Davis, Katie E.;De Grave, Sammy;Wills, Matthew A.
- 通讯作者:Wills, Matthew A.
Bird clades with less complex appendicular skeletons tend to have higher species richness.
阑尾骨骼的鸟类进化枝往往具有较高的物种丰富度。
- DOI:10.1038/s41467-023-41415-2
- 发表时间:2023-09-19
- 期刊:
- 影响因子:16.6
- 作者:Brinkworth, Andrew;Green, Emily;Li, Yimeng;Oyston, Jack;Ruta, Marcello;Wills, Matthew A.
- 通讯作者:Wills, Matthew A.
Global cooling as a driver of diversification in a major marine clade.
- DOI:10.1038/ncomms13003
- 发表时间:2016-10-04
- 期刊:
- 影响因子:16.6
- 作者:Davis, Katie E.;Hill, Jon;Astrop, Tim I.;Wills, Matthew A.
- 通讯作者:Wills, Matthew A.
Divergent vertebral formulae shape the evolution of axial complexity in mammals.
- DOI:10.1038/s41559-023-01982-5
- 发表时间:2023-03
- 期刊:
- 影响因子:16.8
- 作者:
- 通讯作者:
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Matthew Wills其他文献
Levels of physical activity in people with chronic pain
慢性疼痛患者的体力活动水平
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:1.1
- 作者:
R. Parker;E. Bergman;Anelisiwe Mntambo;S. Stubbs;Matthew Wills - 通讯作者:
Matthew Wills
Matthew Wills的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Matthew Wills', 18)}}的其他基金
Susceptibility to mass extinctions: Ammonites as a case study for integrating morphological, developmental, phylogenetic and biomechanical data
对大规模灭绝的敏感性:菊石作为整合形态学、发育、系统发育和生物力学数据的案例研究
- 批准号:
NE/K014951/1 - 财政年份:2014
- 资助金额:
$ 15.13万 - 项目类别:
Research Grant
The Arthropod Supertree of Life: An Online Interactive Resource for Testing Patterns in Arthropod Evolution and Biodiversity
节肢动物生命超级树:用于测试节肢动物进化和生物多样性模式的在线互动资源
- 批准号:
BB/K006754/1 - 财政年份:2012
- 资助金额:
$ 15.13万 - 项目类别:
Research Grant
相似海外基金
Evolution of halophytes: a phyloinformatic approach to understanding and exploiting the traits underlying salt-tolerance in plants
盐生植物的进化:理解和利用植物耐盐性特征的系统信息学方法
- 批准号:
LP100100143 - 财政年份:2011
- 资助金额:
$ 15.13万 - 项目类别:
Linkage Projects














{{item.name}}会员




