EAGER Collaborative: Bringing Together Computational and Linguistic Methods to Extract 'Dark' Geosciences Data for the EarthCube Framework
EAGER Collaborative:结合计算和语言方法为 EarthCube 框架提取“暗”地球科学数据
基本信息
- 批准号:1242902
- 负责人:
- 金额:$ 12.94万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2012
- 资助国家:美国
- 起止时间:2012-07-15 至 2013-06-30
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
A large percentage of vaulable geoscience data is based on the analysis of discrete samples and is collected manually (e.g., paleontological collections, structural/tectonic data, petrographic/mineralogic data, economic data, geochemical measurements, rock mechanics, etc.) Often, these data are reported only in tables in the published literature or in .pdf or spreadsheets on individual investigator websites. Commonly these data are not registerd on or entered into standardized, publicly accessible databases. As a result, for this data to be discovered and used/reused, researchers or other interested parties must manually comb through the text, figures, and appendices of journal articles or websites of individual investigators, sometimes having to sift through raw experimental data. This process is extremely time intensive and slows down the time needed to make scientific discoveries or allow verification of research results. As a result the vast amount of surface earth geoscience data is currently inaccessible. This inaccessible data is termed "Dark Data". This EAGER combines the expertise of top-notch computer scientists and geoscientists whose goal is to create a search algorithm to bring this dark data to light in a way that will enable the next generation of integrative geoscience research. The approach will involved development of an innovative search engine "crawler" that will comb the geoscience literature and bring dark data to light from the text and figures in this corpus. The cyberinfrastructure tool being developed will be able to interpret the semantics of English text and the concepts of geoscience. The tool will be piloted by examining entries on the Macrostrat database, a structured spatial database of lithologic and geochronologic information, and then employing a geoscience ontology by means of the Hazy framework for information extraction. Questions to be addressed will be to find out to what extent dark data is presently accessible and if it can be extracted and placed into an accessible format and repository where it can be discovered by web services or other search engines. Broader impacts of the work include training of graduate students and increasing the infrastructure for science through the development of a new and much needed data search tool.
A large percentage of vaulable geoscience data is based on the analysis of discrete samples and is collected manually (e.g., paleontological collections, structural/tectonic data, petrographic/mineralogic data, economic data, geochemical measurements, rock mechanics, etc.) Often, these data are reported only in tables in the published literature or in .pdf or spreadsheets on individual investigator websites. 这些数据通常不会在标准化的,可公开访问的数据库上注册或输入。 结果,要发现和使用/重复使用这些数据,研究人员或其他有关方面必须手动梳理单个研究人员的期刊文章或网站的文本,数字和附录,有时必须筛选原始的实验数据。这个过程是极度密集的,减慢了进行科学发现或允许验证研究结果所需的时间。 结果,目前无法访问大量的地面地球科学数据。此无法访问的数据称为“黑暗数据”。 这急切地结合了顶尖的计算机科学家和地球科学家的专业知识,他们的目标是创建搜索算法以使这些黑数据揭示以使下一代综合地球科学研究的方式曝光。 该方法将涉及开发创新的搜索引擎“爬网”,该引擎将梳理地球科学文献,并从本语料库中的文本和数字中揭示黑暗数据。开发的网络基础设施工具将能够解释英语文本的语义和地球科学的概念。该工具将通过检查宏观和地球学信息的结构化空间数据库,然后通过朦胧的信息框架进行信息提取框架,然后使用地球科学本体论进行试验。 要解决的问题将是找出目前可访问黑数据的程度,以及是否可以将其提取并放入可访问的格式和存储库中,在这里可以通过Web服务或其他搜索引擎发现它。 这项工作的广泛影响包括培训研究生以及通过开发新的急需数据搜索工具来增加科学基础架构。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Christopher Re其他文献
Do Multimodal Foundation Models Understand Enterprise Workflows? A Benchmark for Business Process Management Tasks
多模式基础模型理解企业工作流程吗?
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Michael Wornow;A. Narayan;Ben T Viggiano;Ishan S. Khare;Tathagat Verma;Tibor Thompson;Miguel Angel Fuentes Hernandez;Sudharsan Sundar;Chloe Trujillo;Krrish Chawla;Rongfei Lu;Justin Shen;Divya Nagaraj;Joshua Martinez;Vardhan Agrawal;Althea Hudson;Nigam H. Shah;Christopher Re - 通讯作者:
Christopher Re
Christopher Re的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Christopher Re', 18)}}的其他基金
Collaborative Research: Hardware-Aware Matrix Computations for Deep Learning Applications
协作研究:深度学习应用的硬件感知矩阵计算
- 批准号:
2247015 - 财政年份:2023
- 资助金额:
$ 12.94万 - 项目类别:
Standard Grant
AF: Medium: Collaborative Research: Beyond Sparsity: Refined Measures of Complexity for Linear Algebra
AF:媒介:协作研究:超越稀疏性:线性代数复杂性的精确度量
- 批准号:
1763315 - 财政年份:2018
- 资助金额:
$ 12.94万 - 项目类别:
Continuing Grant
AF:III:Small:Collaborative Research: New Frontiers in Join Algorithms: Optimality, Noise, and Richer Languages
AF:III:Small:协作研究:连接算法的新领域:最优性、噪声和更丰富的语言
- 批准号:
1318205 - 财政年份:2013
- 资助金额:
$ 12.94万 - 项目类别:
Standard Grant
AF:III:Small:Collaborative Research: New Frontiers in Join Algorithms: Optimality, Noise, and Richer Languages
AF:III:Small:协作研究:连接算法的新领域:最优性、噪声和更丰富的语言
- 批准号:
1356918 - 财政年份:2013
- 资助金额:
$ 12.94万 - 项目类别:
Standard Grant
CAREER: A Scalable, Declarative, Imprecise Database Management System
职业:可扩展、声明式、不精确的数据库管理系统
- 批准号:
1353606 - 财政年份:2013
- 资助金额:
$ 12.94万 - 项目类别:
Continuing Grant
CAREER: A Scalable, Declarative, Imprecise Database Management System
职业:可扩展、声明式、不精确的数据库管理系统
- 批准号:
1054009 - 财政年份:2011
- 资助金额:
$ 12.94万 - 项目类别:
Continuing Grant
相似国自然基金
数智背景下的团队人力资本层级结构类型、团队协作过程与团队效能结果之间关系的研究
- 批准号:72372084
- 批准年份:2023
- 资助金额:40 万元
- 项目类别:面上项目
双单亲遗传贝类线粒体与核氧化磷酸化基因动态协作调控机制
- 批准号:32302965
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
降水变化下土壤动物协作效应对土壤有机质形成过程的影响
- 批准号:42307409
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
反硝化厌氧甲烷氧化菌群信号交流协作机制与调控策略
- 批准号:52300068
- 批准年份:2023
- 资助金额:30 万元
- 项目类别:青年科学基金项目
在线医疗团队协作模式与绩效提升策略研究
- 批准号:72371111
- 批准年份:2023
- 资助金额:41 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: SaTC: EDU: RoCCeM: Bringing Robotics, Cybersecurity and Computer Science to the Middled School Classroom
合作研究:SaTC:EDU:RoCCeM:将机器人、网络安全和计算机科学带入中学课堂
- 批准号:
2312057 - 财政年份:2023
- 资助金额:
$ 12.94万 - 项目类别:
Standard Grant
Collaborative Research: SaTC: EDU: RoCCeM: Bringing Robotics, Cybersecurity and Computer Science to the Middled School Classroom
合作研究:SaTC:EDU:RoCCeM:将机器人、网络安全和计算机科学带入中学课堂
- 批准号:
2312058 - 财政年份:2023
- 资助金额:
$ 12.94万 - 项目类别:
Standard Grant
Digitization TCN: Collaborative Research: Bringing Asia to digital life: mobilizing underrepresented Asian herbarium collections in the US to propel biodiversity discovery
数字化 TCN:合作研究:将亚洲带入数字生活:动员美国代表性不足的亚洲植物标本馆藏品,推动生物多样性发现
- 批准号:
2101846 - 财政年份:2021
- 资助金额:
$ 12.94万 - 项目类别:
Continuing Grant
Digitization TCN: Collaborative Research: Bringing Asia to digital life: mobilizing underrepresented Asian herbarium collections in the US to propel biodiversity discovery
数字化 TCN:合作研究:将亚洲带入数字生活:动员美国代表性不足的亚洲植物标本馆藏品,推动生物多样性发现
- 批准号:
2101966 - 财政年份:2021
- 资助金额:
$ 12.94万 - 项目类别:
Standard Grant
Digitization TCN: Collaborative Research: Bringing Asia to digital life: mobilizing underrepresented Asian herbarium collections in the US to propel biodiversity discovery
数字化 TCN:合作研究:将亚洲带入数字生活:动员美国代表性不足的亚洲植物标本馆藏品,推动生物多样性发现
- 批准号:
2101773 - 财政年份:2021
- 资助金额:
$ 12.94万 - 项目类别:
Standard Grant