Automating the Integration of EPA Databases
自动集成 EPA 数据库
基本信息
- 批准号:0306899
- 负责人:
- 金额:$ 90万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2003
- 资助国家:美国
- 起止时间:2003-08-15 至 2007-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Due to the wide range of geographic scales and complex tasks the Government must administer, its data is split in many different ways and is collected at different times by different agencies. The resulting massive data heterogeneity means one cannot effectively locate, share, or compare data across sources, let alone achieve computational data interoperability. To date, all approaches to wrap data collections, or even to create mappings across comparable datasets, require manual effort. Despite some promising work, the automated creation of such mappings is still in its infancy, since equivalences and differences manifest themselves at all levels, from individual data values through metadata to the explanatory text surrounding the data collection as a whole. More general methods are required to effectively address this problem. Viewing the data mapping problem as a variant of the cross-language mapping problem of Machine Translation (MT), this project will employ the new statistical algorithms developed since 1990 in the MT community to discover correspondences across comparable datasets at all levels. In MT, the techniques align words and word sequences across languages. This research will adapt and extend the techniques to consider not only data values (the analogue of words) but also data format/orthography, metadata information, and associated textual information (metadata descriptions, footnotes, etc.) in the alignment process, and to perform alignment learning at three levels: individual data cell level, set of cells (column) level, and multi-column level. Multi-level alignment has not been attempted in MT before. These powerful learning techniques have never been applied to metadata schema integration and/or database alignment or wrapping. If these automatically learned mappings are effective, the amount of manual labor required in database wrapping should be significantly reduced. Two sets of domain data will be used. Air quality data will be provided by EPA staff at the California Air Resources Board in Sacramento, who periodically integrate data from some 35 regional Air Quality Management Districts throughout California into a single California-wide database, and pass this along to the Federal EPA in North Carolina. Fire emissions data will be provided by a different set of EPA offices, the USDA/Forest Service, and the Department of Interior.
由于政府必须管理的地理范围广泛和任务复杂,其数据以许多不同的方式分开,并在不同的时间由不同的机构收集。由此产生的海量数据异构性意味着无法有效地跨数据源定位、共享或比较数据,更不用说实现计算数据互操作性了。到目前为止,所有包装数据集合的方法,甚至是跨可比数据集创建映射的方法,都需要手工操作。尽管有一些很有前途的工作,但这种映射的自动化创建仍然处于起步阶段,因为等价和差异在所有级别上都表现出来,从单个数据值到元数据,再到围绕整个数据集合的解释性文本。需要更一般的方法来有效地解决这个问题。将数据映射问题视为机器翻译(MT)跨语言映射问题的一个变体,该项目将采用自1990年以来在机器翻译社区开发的新统计算法来发现所有级别的可比数据集之间的对应关系。在机器翻译中,这些技术将跨语言的单词和单词序列对齐。本研究将对这些技术进行调整和扩展,使其在对齐过程中不仅考虑数据值(单词的模拟),还考虑数据格式/正字法、元数据信息和相关文本信息(元数据描述、脚注等),并在单个数据单元级、单元集(列)级和多列级三个层次上进行对齐学习。以前在MT中还没有尝试过多级对准。这些强大的学习技术从未应用于元数据模式集成和/或数据库对齐或包装。如果这些自动学习的映射是有效的,那么数据库包装所需的手工工作量应该会大大减少。将使用两组域数据。空气质量数据将由位于萨克拉门托的加州空气资源委员会的EPA工作人员提供,他们定期将来自加州35个区域空气质量管理区的数据整合到一个全加州范围的数据库中,并将其传递给北卡罗莱纳州的联邦EPA。火灾排放数据将由不同的环保局办公室、美国农业部/林务局和内政部提供。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Eduard Hovy其他文献
A Framework for Effective Annotation of Information from Closed Captions Using Ontologies
- DOI:
10.1007/s10844-005-0188-9 - 发表时间:
2005-09-01 - 期刊:
- 影响因子:3.400
- 作者:
Latifur Khan;Dennis McLeod;Eduard Hovy - 通讯作者:
Eduard Hovy
A Sentiment Consolidation Framework for Meta-Review Generation
用于生成元评论的情绪巩固框架
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Miao Li;Jey Han Lau;Eduard Hovy - 通讯作者:
Eduard Hovy
ezCoref : A Scalable Approach for Collecting Crowdsourced Annotations for Coreference Resolution
ezCoref:一种收集众包注释以进行共指解析的可扩展方法
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
A. Crowdsourced;David Bamman;Olivia Lewke;Rachel Bawden;Rico Sennrich;Alexandra Birch;Ari Bornstein;Arie Cattan;Ido Dagan;Hong Chen;Zhenhua Fan;Hao Lu;Alan Yuille;Eduard Hovy;Mitch Marcus;M. Palmer;Lance;Rodney Huddleston. 2002;Frédéric Landragin;T. Poibeau;Bernard Vic;Belinda Z. Li;Gabriel Stanovsky;Robert L Logan;Andrew McCallum;Sameer Singh - 通讯作者:
Sameer Singh
What is Your Data Worth to GPT? LLM-Scale Data Valuation with Influence Functions
您的数据对 GPT 有何价值?
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
Sang Keun Choe;Hwijeen Ahn;Juhan Bae;Kewen Zhao;Minsoo Kang;Youngseog Chung;Adithya Pratapa;W. Neiswanger;Emma Strubell;Teruko Mitamura;Jeff Schneider;Eduard Hovy;Roger Grosse;Eric Xing - 通讯作者:
Eric Xing
Cooperative Semi-Supervised Transfer Learning of Machine Reading Comprehension
机器阅读理解的协作半监督迁移学习
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Oliver Bender;F. Och;Y. Bengio;Réjean Ducharme;Pascal Vincent;Kevin Clark;Quoc Minh;V. Le;J. Devlin;Ming;Kenton Lee;Adam Fisch;Alon Talmor;Robin Jia;Minjoon Seo;Michael R. Glass;A. Gliozzo;Rishav Chakravarti;Ian Goodfellow;Jean Pouget;Mehdi Mirza;Serhii Havrylov;Ivan Titov. 2017;Emergence;Jun;Jiatao Gu;Jiajun Shen;Marc’Aurelio;Matthew Henderson;I. Casanueva;Nikola Mrkˇsi´c;Pei;Tsung;Ivan Vuli´c;Yikang Shen;Yi Tay;Che Zheng;Dara Bahri;Donald;Metzler Aaron;Courville;Structformer;Ashish Vaswani;Noam M. Shazeer;Niki Parmar;Thomas Wolf;Lysandre Debut;Julien Victor Sanh;Clement Chaumond;Anthony Delangue;Pier;Tim ric Cistac;Rémi Rault;Morgan Louf;Qizhe Xie;Eduard Hovy;Silei Xu;Sina J. Semnani;Giovanni Campagna - 通讯作者:
Giovanni Campagna
Eduard Hovy的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Eduard Hovy', 18)}}的其他基金
EAGER: A Method to Retrieve Non-Textual Data from Widespread Repositories
EAGER:一种从广泛存储库中检索非文本数据的方法
- 批准号:
1450545 - 财政年份:2014
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
III: EAGER: Automatically Building Test Collections Using Implicit Relevance Signals from the Web
III:EAGER:使用来自 Web 的隐式相关信号自动构建测试集合
- 批准号:
1304939 - 财政年份:2012
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
EAGER: Constructing, Indexing, and Searching Super-Enriched Document Representations in the Cloud
EAGER:在云中构建、索引和搜索超级丰富的文档表示
- 批准号:
1265301 - 财政年份:2012
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
III: EAGER: Automatically Building Test Collections Using Implicit Relevance Signals from the Web
III:EAGER:使用来自 Web 的隐式相关信号自动构建测试集合
- 批准号:
1147810 - 财政年份:2011
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
EAGER: Constructing, Indexing, and Searching Super-Enriched Document Representations in the Cloud
EAGER:在云中构建、索引和搜索超级丰富的文档表示
- 批准号:
1143703 - 财政年份:2011
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
Collaborative Research III-COR: From a Pile of Documents to a Collection of Information: A Framework for Multi-Dimensional Text Analysis
协作研究III-COR:从一堆文档到信息集合:多维文本分析框架
- 批准号:
0705091 - 财政年份:2007
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
Collaborative Research: Language Processing Technology for Electronic Rulemaking
合作研究:电子规则制定的语言处理技术
- 批准号:
0429360 - 财政年份:2004
- 资助金额:
$ 90万 - 项目类别:
Continuing Grant
SGER COLLABORATIVE: A Testbed for eRulemaking Data
SGER Collaborative:电子规则制定数据的测试平台
- 批准号:
0328175 - 财政年份:2003
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
Collaborative Research:Interlingual Annotation of Multilingual Text Corporation
合作研究:多语言文本公司的语间标注
- 批准号:
0325021 - 财政年份:2003
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
ITR: Information Discovery in Digital Government: Self-extending Topic Maps and Ontologies (GrowOnto)
ITR:数字政府中的信息发现:自扩展主题图和本体(GrowOnto)
- 批准号:
0205111 - 财政年份:2002
- 资助金额:
$ 90万 - 项目类别:
Continuing Grant
相似海外基金
Collaborative Research: Constraining next generation Cascadia earthquake and tsunami hazard scenarios through integration of high-resolution field data and geophysical models
合作研究:通过集成高分辨率现场数据和地球物理模型来限制下一代卡斯卡迪亚地震和海啸灾害情景
- 批准号:
2325311 - 财政年份:2024
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
Collaborative Research: Concurrent Design Integration of Products and Remanufacturing Processes for Sustainability and Life Cycle Resilience
协作研究:产品和再制造流程的并行设计集成,以实现可持续性和生命周期弹性
- 批准号:
2348641 - 财政年份:2024
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
MCA Pilot PUI: From glomeruli to pollination: vertical integration of neural encoding through ecologically-relevant behavior
MCA Pilot PUI:从肾小球到授粉:通过生态相关行为进行神经编码的垂直整合
- 批准号:
2322310 - 财政年份:2024
- 资助金额:
$ 90万 - 项目类别:
Continuing Grant
CAREER: The Contagion Science: Integration of inhaled transport mechanics principles inside the human upper respiratory tract at multi scales
职业:传染病科学:在多尺度上整合人类上呼吸道内的吸入运输力学原理
- 批准号:
2339001 - 财政年份:2024
- 资助金额:
$ 90万 - 项目类别:
Continuing Grant
Seamless integration of Financial data into ESG data
将财务数据无缝集成到 ESG 数据中
- 批准号:
10099890 - 财政年份:2024
- 资助金额:
$ 90万 - 项目类别:
Collaborative R&D
COMPAS: co integration of microelectronics and photonics for air and water sensors
COMPAS:微电子学和光子学的共同集成,用于空气和水传感器
- 批准号:
10108154 - 财政年份:2024
- 资助金额:
$ 90万 - 项目类别:
EU-Funded
Linking the HTLV-1 pre-integration complex to the chromatin
将 HTLV-1 预整合复合物连接至染色质
- 批准号:
MR/Y002083/1 - 财政年份:2024
- 资助金额:
$ 90万 - 项目类别:
Research Grant
Collaborative Research: CyberTraining: Implementation: Medium: Transforming the Molecular Science Research Workforce through Integration of Programming in University Curricula
协作研究:网络培训:实施:中:通过将编程融入大学课程来改变分子科学研究人员队伍
- 批准号:
2321045 - 财政年份:2024
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
MCA Pilot PUI: Neural Signaling and Mechanisms Underlying Sensory Integration and Plasticity
MCA Pilot PUI:感觉统合和可塑性背后的神经信号和机制
- 批准号:
2322317 - 财政年份:2024
- 资助金额:
$ 90万 - 项目类别:
Standard Grant
Integration of Advanced Experiments, Imaging and Computation for Synergistic Structure-Performance Design of Powders and Materials in Additive Manufac
先进实验、成像和计算的集成,用于增材制造中粉末和材料的协同结构-性能设计
- 批准号:
EP/Y036778/1 - 财政年份:2024
- 资助金额:
$ 90万 - 项目类别:
Research Grant