Automatically building vocabularies from web forum text
从网络论坛文本自动构建词汇表
基本信息
- 批准号:485322-2015
- 负责人:
- 金额:$ 1.82万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Engage Grants Program
- 财政年份:2015
- 资助国家:加拿大
- 起止时间:2015-01-01 至 2016-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
A staggering amount of text is written online every day, much of it in discussions taking place in online forums
focused on topics such as motorsports, pets, and collectibles. This provides an opportunity for businesses to
analyze this text to glean insights into consumer demands and opinions, which can then give a business a
competitive advantage. Due to the large volume of data, the analyses of such conversations would ideally be
done automatically. The key to building an automatic text analysis system is a vocabulary of important terms.
Such a vocabulary could be taken from a dictionary; however, because of the specialized topics that online
forums focus on, many important terms are not listed in general dictionaries. For example, online forums
contain many non-standard words, including intentional variant spellings (e.g., "coooolll") and novel
combinations (e.g., "defuel"), as well as noise (e.g., usernames) and unintentional spelling errors.
The goal of this project is to identify the above categories of words in order to automatically build a vocabulary
of key terms for an online forum community. The project will apply a range of statistical techniques that draw
on information including word frequency, character sequences in words, and the contexts in which words
appear. The automatic construction of such a vocabulary will help improve the performance of the subsequent
text processing and analysis tools that try to zero in on relevant and important pieces of information given the
vast amounts of noisy and unstructured data in online forums. In turn, such tools can improve the quality of the
online services that VerticalScope provides to its forum users, as well as the attractiveness of VerticalScope's
services to various businesses. These help VerticalScope become a leading player in data science research and
development, hence benefiting the Canadian economy. This will further benefit the broader public by enabling
businesses to better understand consumer needs to develop more-attractive products.
每天都有数量惊人的文本被写在网上,其中大部分是在在线论坛上进行的讨论中
专注于赛车运动、宠物和收藏品等主题。这为企业提供了一个机会
分析这篇文章以深入了解消费者的需求和意见,从而为企业提供
竞争优势。由于数据量很大,对此类对话的分析最好是
自动完成。构建自动文本分析系统的关键是重要术语的词汇表。
这样的词汇可以从词典中获得;然而,由于在线的专门主题
论坛关注的焦点是,很多重要的词汇在一般词典中都没有列出。例如,在线论坛
包含许多不标准的单词,包括故意的变体拼写(例如“Cooolll”)和新奇
组合(如“deFuel”),以及噪音(如用户名)和无意的拼写错误。
这个项目的目标是识别以上类别的单词,以便自动建立词汇表
在线论坛社区的关键术语。该项目将应用一系列统计技术,以绘制
关于包括词频、词中的字符序列和词所在的上下文的信息
出现。这样的词汇表的自动构建将有助于提高后续的性能
文本处理和分析工具,试图在给定的
在线论坛中存在大量杂乱无章的数据。反过来,这些工具可以提高
VerticalScope为论坛用户提供的在线服务,以及VerticalScope的吸引力
为各种企业提供服务。这些帮助VerticalScope成为数据科学研究和
发展,从而使加拿大经济受益。这将通过以下方式进一步惠及更广泛的公众
企业为了更好地了解消费者的需求,开发出更具吸引力的产品。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Cook, ChristopherPaul其他文献
Cook, ChristopherPaul的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Cook, ChristopherPaul', 18)}}的其他基金
Automating Dictionary Construction for Better Natural Language Processing
自动构建字典以实现更好的自然语言处理
- 批准号:
RGPIN-2015-05615 - 财政年份:2022
- 资助金额:
$ 1.82万 - 项目类别:
Discovery Grants Program - Individual
Automating Dictionary Construction for Better Natural Language Processing
自动构建字典以实现更好的自然语言处理
- 批准号:
RGPIN-2015-05615 - 财政年份:2021
- 资助金额:
$ 1.82万 - 项目类别:
Discovery Grants Program - Individual
Automating Dictionary Construction for Better Natural Language Processing
自动构建字典以实现更好的自然语言处理
- 批准号:
RGPIN-2015-05615 - 财政年份:2019
- 资助金额:
$ 1.82万 - 项目类别:
Discovery Grants Program - Individual
Automating Dictionary Construction for Better Natural Language Processing
自动构建字典以实现更好的自然语言处理
- 批准号:
RGPIN-2015-05615 - 财政年份:2018
- 资助金额:
$ 1.82万 - 项目类别:
Discovery Grants Program - Individual
Automating Dictionary Construction for Better Natural Language Processing
自动构建字典以实现更好的自然语言处理
- 批准号:
RGPIN-2015-05615 - 财政年份:2017
- 资助金额:
$ 1.82万 - 项目类别:
Discovery Grants Program - Individual
Automating Dictionary Construction for Better Natural Language Processing
自动构建字典以实现更好的自然语言处理
- 批准号:
RGPIN-2015-05615 - 财政年份:2016
- 资助金额:
$ 1.82万 - 项目类别:
Discovery Grants Program - Individual
Automating Dictionary Construction for Better Natural Language Processing
自动构建字典以实现更好的自然语言处理
- 批准号:
RGPIN-2015-05615 - 财政年份:2015
- 资助金额:
$ 1.82万 - 项目类别:
Discovery Grants Program - Individual
Computational models of subtractive word formation precesses
减法构词过程的计算模型
- 批准号:
363418-2008 - 财政年份:2009
- 资助金额:
$ 1.82万 - 项目类别:
Postgraduate Scholarships - Doctoral
Computational models of subtractive word formation precesses
减法构词过程的计算模型
- 批准号:
363418-2008 - 财政年份:2008
- 资助金额:
$ 1.82万 - 项目类别:
Postgraduate Scholarships - Doctoral
Research in Computational Linguistics
计算语言学研究
- 批准号:
318970-2005 - 财政年份:2005
- 资助金额:
$ 1.82万 - 项目类别:
Postgraduate Scholarships - Master's
相似国自然基金
多孔Ti-MSNs@MGF+DX抗炎—成肌体系应用于颞下颌关节假体的作用和机制研究
- 批准号:82370984
- 批准年份:2023
- 资助金额:48.00 万元
- 项目类别:面上项目
基于支链淀粉building blocks构建优质BE突变酶定向修饰淀粉调控机制的研究
- 批准号:31771933
- 批准年份:2017
- 资助金额:60.0 万元
- 项目类别:面上项目
群的结构及若干疑难问题研究
- 批准号:10771180
- 批准年份:2007
- 资助金额:30.0 万元
- 项目类别:面上项目
相似海外基金
TRUST2 - Improving TRUST in artificial intelligence and machine learning for critical building management
TRUST2 - 提高关键建筑管理的人工智能和机器学习的信任度
- 批准号:
10093095 - 财政年份:2024
- 资助金额:
$ 1.82万 - 项目类别:
Collaborative R&D
Facilitating circular construction practices in the UK: A data driven online marketplace for waste building materials
促进英国的循环建筑实践:数据驱动的废弃建筑材料在线市场
- 批准号:
10113920 - 财政年份:2024
- 资助金额:
$ 1.82万 - 项目类别:
SME Support
FABB-HVDC (Future Aerospace power conversion Building Blocks for High Voltage DC electrical power systems)
FABB-HVDC(高压直流电力系统的未来航空航天电力转换构建模块)
- 批准号:
10079892 - 财政年份:2024
- 资助金额:
$ 1.82万 - 项目类别:
Legacy Department of Trade & Industry
Opening Spaces and Places for the Inclusion of Indigenous Knowledge, Voice and Identity: Moving Indigenous People out of the Margins
为包容土著知识、声音和身份提供开放的空间和场所:使土著人民走出边缘
- 批准号:
477924 - 财政年份:2024
- 资助金额:
$ 1.82万 - 项目类别:
Salary Programs
Stories of Divided Politics: Polarisation and Bridge-Building in Colombia and Britain
政治分裂的故事:哥伦比亚和英国的两极分化和桥梁建设
- 批准号:
EP/Y03628X/1 - 财政年份:2024
- 资助金额:
$ 1.82万 - 项目类别:
Research Grant
Building Desirable and Resilient Public Media Futures: Establishing the Centre for Public Values, Technology & Society
建设理想且有弹性的公共媒体未来:建立公共价值观和技术中心
- 批准号:
MR/X033651/1 - 财政年份:2024
- 资助金额:
$ 1.82万 - 项目类别:
Fellowship
Expanding syphilis screening among pregnant women in Indonesia using the rapid dual test for syphilis & HIV with capacity building: The DUALIS Study
使用梅毒快速双重检测扩大印度尼西亚孕妇梅毒筛查
- 批准号:
MR/Y004825/1 - 财政年份:2024
- 资助金额:
$ 1.82万 - 项目类别:
Research Grant
Building recovery and resilience in severe mental illness: Leveraging the role of social determinants in illness trajectories and interventions
建立严重精神疾病的康复和复原力:利用社会决定因素在疾病轨迹和干预措施中的作用
- 批准号:
MR/Z503514/1 - 财政年份:2024
- 资助金额:
$ 1.82万 - 项目类别:
Research Grant
Building Partnerships to Conserve Limestone Pavements
建立伙伴关系以保护石灰石路面
- 批准号:
NE/Y004930/1 - 财政年份:2024
- 资助金额:
$ 1.82万 - 项目类别:
Research Grant














{{item.name}}会员




