EAGER: Mining a Year of Speech
EAGER:挖掘一年的演讲
基本信息
- 批准号:1048900
- 负责人:
- 金额:$ 9.99万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2010
- 资助国家:美国
- 起止时间:2010-08-15 至 2012-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Technologies for storing and processing vast amounts of text are mature and well-defined. In contrast, technologies for browsing or mining content from large collections of non-textual material, especially audio and video, are less well developed. Large sale data mining on text has helped transform the relevant disciplines; the disciplines dealing with spoken language will reap similar benefits from accessible, searchable, large corpora.This project explores the difficult problem of providing rich, intelligent data mining capabilities for a substantial collection of spoken audio data in American and British English. It applies and extends state-of-the-art techniques to offer sophisticated, rapid and flexible access to a richly annotated corpus of a year of speech (about 9,000 hours, 100 million words, or 2 terabytes), derived from the Linguistic Data Consortium, the British National Corpus, and other existing resources. This is ten times more data than has previously been used by researchers in fields such as phonetics, linguistics, and psychology, and 100 to 1,000 times the amounts that are used in common practice.Speech-to-text alignment and search tools will open a new universe of data to researchers in many fields, from linguistics and phonetics to anthropology, speech communication, oral history, and media studies. Audio-video usage on the internet is large and growing at an extraordinary rate, offering increasingly large amounts of an increasingly large range of material. Reliable automatic annotation, indexing and search of this material will allow researchers to examine the distribution of both form and content across time, space, and social structure.
存储和处理大量文本的技术已经成熟且定义明确。相比之下,用于从大量非文本材料(特别是音频和视频)中浏览或挖掘内容的技术则不太发达。文本的大规模销售数据挖掘已经帮助相关学科发生了转变;处理口语的学科也将从可访问、可搜索的大型语料库中获得类似的好处。本项目探讨了为大量的美国和英国英语口语音频数据提供丰富、智能的数据挖掘能力的难题。 它应用并扩展了最先进的技术,以提供对一年语音(约9,000小时,1亿字或2 TB)的丰富注释语料库的复杂,快速和灵活的访问,这些语料库来自语言数据联盟,英国国家语料库和其他现有资源。这比语音学、语言学和心理学等领域的研究人员以前使用的数据多10倍,是通常使用的数据量的100到1,000倍。语音到文本对齐和搜索工具将为许多领域的研究人员打开一个新的数据世界,从语言学和语音学到人类学、语音传播、口述历史和媒体研究。互联网上的音频视频使用量很大,并且以惊人的速度增长,提供越来越多的材料。可靠的自动注释,索引和搜索这些材料将使研究人员能够检查跨时间,空间和社会结构的形式和内容的分布。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Mark Liberman其他文献
Dimensions of Speech and Language Disturbance in Psychosis and Computational Linguistic Markers
- DOI:
10.1016/j.biopsych.2022.02.144 - 发表时间:
2022-05-01 - 期刊:
- 影响因子:
- 作者:
Sunny Tang;Katrin Hänsel;Yan Cong;Sarah Berretta;Sunghye Cho;Amir Nikzad;Aarush Mehta;Sameer Pradhan;James Fiumara;Mark Liberman - 通讯作者:
Mark Liberman
Ruptured Appendicitis after Laparoscopic Roux-enY Gastric Bypass: Pitfalls in Diagnosing a Surgical Abdomen in the Morbidly Obese
- DOI:
10.1381/096089203322618812 - 发表时间:
2003-12-01 - 期刊:
- 影响因子:3.100
- 作者:
Amir Mehran;Mark Liberman;Raul Rosenthal;Samuel Szomstein - 通讯作者:
Samuel Szomstein
CLiFF Notes: Research in the Language, Information and Computation Laboratory of the University of Pennsylvania
CLiFF笔记:宾夕法尼亚大学语言、信息和计算实验室的研究
- DOI:
- 发表时间:
1995 - 期刊:
- 影响因子:0
- 作者:
Norm Badler;F. B. Baldwin;Nicola J. Bessell;Eric Brill;Sharon Cote;Barbara Di Eugenio;Alexis Dimitriadis;Jon Freeman;Christopher W. Geib;A. Gertner;Daniel Hardt;Michael Hegarty;Shyam Kapur;Jonathan Kaye;Michael H. Kelly;Libby Levison;Mark Liberman;D. R. Mani;Mitch Marcus Michael;B. Moore;Michael Niv;Charles L. Ortiz;Jong Cheol Park;Sandeep Prasada Scott - 通讯作者:
Sandeep Prasada Scott
l / VARIATION IN AMERICAN ENGLISH : A CORPUS
l / 美式英语变体:语料库
- DOI:
- 发表时间:
2012 - 期刊:
- 影响因子:0
- 作者:
Jiahong Yuan;Mark Liberman - 通讯作者:
Mark Liberman
LOOKING BACK, MOVING FORWARD Why underlying representations? 1
回顾过去,展望未来 为什么要使用底层表征?
- DOI:
- 发表时间:
- 期刊:
- 影响因子:0
- 作者:
Looking Back;Moving Forward;Larry;M. Hyman;Jeffrey Heinz;Sharon Inkelas;Keith Johnson;Mark Liberman - 通讯作者:
Mark Liberman
Mark Liberman的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Mark Liberman', 18)}}的其他基金
CI-NEW: NIEUW: Novel Incentives and Workflows in Linguistic Data Collection and Annotation
CI-NEW:NIEUW:语言数据收集和注释中的新颖激励措施和工作流程
- 批准号:
1730377 - 财政年份:2017
- 资助金额:
$ 9.99万 - 项目类别:
Standard Grant
Language Preservation 2.0: Crowdsourcing Oral Language Documentation using Mobile Devices
语言保存2.0:使用移动设备众包口语文档
- 批准号:
1160639 - 财政年份:2012
- 资助金额:
$ 9.99万 - 项目类别:
Standard Grant
Prosodic Systems in New Guinea: Integrating computational and typological approaches to linguistic analysis
新几内亚的韵律系统:将计算和类型学方法整合到语言分析中
- 批准号:
0951651 - 财政年份:2010
- 资助金额:
$ 9.99万 - 项目类别:
Standard Grant
Collaborative Research: OLAC: Accessing the World's Language Resources
合作研究:OLAC:访问世界语言资源
- 批准号:
0723357 - 财政年份:2007
- 资助金额:
$ 9.99万 - 项目类别:
Continuing Grant
ITR-SCOTUS: A Resource for Collaborative Research in Speech Technology, Linguistics, Decision Processes and the Law
ITR-SCOTUS:语音技术、语言学、决策过程和法律合作研究的资源
- 批准号:
0325739 - 财政年份:2003
- 资助金额:
$ 9.99万 - 项目类别:
Continuing Grant
Eletronic Materials For Natural Language Research
用于自然语言研究的电子材料
- 批准号:
9113530 - 财政年份:1991
- 资助金额:
$ 9.99万 - 项目类别:
Standard Grant
相似国自然基金
基于Genome mining技术研究抑制表皮葡萄球菌生物膜形成的次级代谢产物
- 批准号:21242003
- 批准年份:2012
- 资助金额:10.0 万元
- 项目类别:专项基金项目
相似海外基金
NeTS: Small: NSF-DST: Modernizing Underground Mining Operations with Millimeter-Wave Imaging and Networking
NeTS:小型:NSF-DST:利用毫米波成像和网络实现地下采矿作业现代化
- 批准号:
2342833 - 财政年份:2024
- 资助金额:
$ 9.99万 - 项目类别:
Standard Grant
Development of social attention indicators of emerging technologies and science policies with network analysis and text mining
利用网络分析和文本挖掘开发新兴技术和科学政策的社会关注指标
- 批准号:
24K16438 - 财政年份:2024
- 资助金额:
$ 9.99万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
FightAMR: Novel global One Health surveillance approach to fight AMR using Artificial Intelligence and big data mining
FightAMR:利用人工智能和大数据挖掘对抗 AMR 的新型全球统一健康监测方法
- 批准号:
MR/Y034422/1 - 财政年份:2024
- 资助金额:
$ 9.99万 - 项目类别:
Research Grant
ART: Mining the Rich Vein of Research in Montana
艺术:挖掘蒙大拿州研究的丰富脉络
- 批准号:
2331325 - 财政年份:2024
- 资助金额:
$ 9.99万 - 项目类别:
Cooperative Agreement
Toward carbon-neutral society: Development of a full-sustainable eco-friendly green mining process for gold recovery
迈向碳中和社会:开发完全可持续的环保绿色采矿工艺以回收黄金
- 批准号:
24K17540 - 财政年份:2024
- 资助金额:
$ 9.99万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
DISES Investigating mercury biogeochemical cycling via mixed-methods in complex artisanal gold mining landscapes and implications for community health
DISES 通过混合方法研究复杂手工金矿景观中的汞生物地球化学循环及其对社区健康的影响
- 批准号:
2307870 - 财政年份:2024
- 资助金额:
$ 9.99万 - 项目类别:
Standard Grant
Generating green hydrogen from mining wastes
从采矿废物中产生绿色氢气
- 批准号:
IM240100202 - 财政年份:2024
- 资助金额:
$ 9.99万 - 项目类别:
Mid-Career Industry Fellowships
Novel Hydrophobic Concrete for Durable and Resilient Mining Infrastructure
用于耐用且有弹性的采矿基础设施的新型疏水混凝土
- 批准号:
LP230100288 - 财政年份:2024
- 资助金额:
$ 9.99万 - 项目类别:
Linkage Projects
SBIR Phase I: Electromagnetic-ablative PGM Refining for In-situ Asteroid Mining
SBIR 第一阶段:用于小行星原位采矿的电磁烧蚀铂族金属精炼
- 批准号:
2327078 - 财政年份:2024
- 资助金额:
$ 9.99万 - 项目类别:
Standard Grant
Temporal Graph Mining for Anomaly Detection
用于异常检测的时间图挖掘
- 批准号:
DP240101547 - 财政年份:2024
- 资助金额:
$ 9.99万 - 项目类别:
Discovery Projects