Identifying General Product and Brand Names in Online Forums

识别在线论坛中的通用产品和品牌名称

基本信息

项目摘要

One key data component for engaging and acquiring customers is being able to identify names of products inonline forums. Therefore, the extraction of product/brand names will generate data to help vendors and thusincrease overall business revenue. The traditional approach to extract named entities requires time-consumingand expensive manual human-labelled training sets. In addition, we deal with many product or service sectorswhich come with different size, language, and content. Thus fitting a supervised model which is based onhuman-annotated data, for each product/brand to each sector (vertical) can be very expensive. Three majorchallenges are: 1) the high cost of generating training data for each type of product, 2) covering large divergenttypes of products from all verticals, and 3) disambiguating different types of entities based on context.Therefore, in the absence or lack of training data, alternative solution is semi-supervised learning algorithmssuch as bootstrapping and meta-learning methods such as self-training and co-training. In these methods, wecan train a model either with no or very few annotated data. In this research, a hybrid approach combiningtransfer learning and semi-supervised learning is investigated to identify and extract named entities in ourdomain of interest. In transfer learning, the solution for the lack of annotated data in the target domain, is toadapt annotated data from other domains.VerticalScope (the industry partner) is a Canadian company that is becoming a leading player in data scienceresearch and development for understanding user-generated content on the Internet in a variety of sectors. Theproposed project contributes to the growth of VerticalScope, and to the Canadian economy as a result, byallowing the company to apply cutting-edge research for improving user experience on their forums, and hencethe attractiveness of its service to the businesses.
吸引和获取客户的一个关键数据组件是能够在在线论坛中识别产品名称。因此,产品/品牌名称的提取将产生数据,以帮助供应商,从而增加整体业务收入。传统的命名实体提取方法需要耗时且昂贵的人工标注训练集。此外,我们处理许多产品或服务部门,这些部门具有不同的规模,语言和内容。因此,为每个产品/品牌的每个部门(垂直)拟合基于人工注释数据的监督模型可能非常昂贵。三大挑战是:1)为每种类型的产品生成训练数据的成本很高,2)覆盖来自所有垂直领域的大量不同类型的产品,以及3)基于上下文消除不同类型的实体的歧义。因此,在没有或缺乏训练数据的情况下,替代解决方案是半监督学习算法,如自举和元学习方法,如自训练和协同训练。在这些方法中,我们可以训练一个没有或很少注释数据的模型。在这项研究中,一个混合的方法相结合的迁移学习和半监督学习的研究,以识别和提取命名实体在我们感兴趣的领域。在迁移学习中,目标领域缺乏注释数据的解决方案是改编来自其他领域的注释数据。VerticalScope(行业合作伙伴)是一家加拿大公司,正在成为数据科学研究和开发的领先者,用于理解互联网上各个领域的用户生成内容。拟议的项目有助于VerticalScope的增长,并因此对加拿大经济做出贡献,允许该公司应用尖端研究来改善其论坛上的用户体验,从而提高其服务对企业的吸引力。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Makrehchi, Masoud其他文献

Improving clustering performance using independent component analysis and unsupervised feature learning
Content Tree Word Embedding for document representation
  • DOI:
    10.1016/j.eswa.2017.08.021
  • 发表时间:
    2017-12-30
  • 期刊:
  • 影响因子:
    8.5
  • 作者:
    Kamkarhaghighi, Mehran;Makrehchi, Masoud
  • 通讯作者:
    Makrehchi, Masoud

Makrehchi, Masoud的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Makrehchi, Masoud', 18)}}的其他基金

Algorithms and applications of Link Mining: Making Sense of Network Data
链接挖掘的算法和应用:理解网络数据
  • 批准号:
    RGPIN-2021-03380
  • 财政年份:
    2022
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Discovery Grants Program - Individual
Algorithms and applications of Link Mining: Making Sense of Network Data
链接挖掘的算法和应用:理解网络数据
  • 批准号:
    RGPIN-2021-03380
  • 财政年份:
    2021
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Discovery Grants Program - Individual
Towards Predicting Socio-economic Systems by Mining Social Media Data
通过挖掘社交媒体数据来预测社会经济系统
  • 批准号:
    RGPIN-2014-06591
  • 财政年份:
    2019
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Discovery Grants Program - Individual
Towards Predicting Socio-economic Systems by Mining Social Media Data
通过挖掘社交媒体数据来预测社会经济系统
  • 批准号:
    RGPIN-2014-06591
  • 财政年份:
    2018
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Discovery Grants Program - Individual
Towards Predicting Socio-economic Systems by Mining Social Media Data
通过挖掘社交媒体数据来预测社会经济系统
  • 批准号:
    RGPIN-2014-06591
  • 财政年份:
    2017
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Discovery Grants Program - Individual
Detecting relevant segment of text in legal domain
检测法律领域中的相关文本片段
  • 批准号:
    499514-2016
  • 财政年份:
    2016
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Engage Grants Program
Towards Predicting Socio-economic Systems by Mining Social Media Data
通过挖掘社交媒体数据来预测社会经济系统
  • 批准号:
    RGPIN-2014-06591
  • 财政年份:
    2016
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Discovery Grants Program - Individual
Towards Predicting Socio-economic Systems by Mining Social Media Data
通过挖掘社交媒体数据来预测社会经济系统
  • 批准号:
    RGPIN-2014-06591
  • 财政年份:
    2015
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Discovery Grants Program - Individual
Computer assisted generation and transformation of web content
计算机辅助网页内容的生成和转换
  • 批准号:
    477757-2015
  • 财政年份:
    2015
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Engage Grants Program
Towards Predicting Socio-economic Systems by Mining Social Media Data
通过挖掘社交媒体数据来预测社会经济系统
  • 批准号:
    RGPIN-2014-06591
  • 财政年份:
    2014
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Discovery Grants Program - Individual

相似国自然基金

Toward a general theory of intermittent aeolian and fluvial nonsuspended sediment transport
  • 批准号:
  • 批准年份:
    2022
  • 资助金额:
    55 万元
  • 项目类别:

相似海外基金

Collaborative Research: Superinvaders: testing a general hypothesis of forest invasions by woody species across the Americas
合作研究:超级入侵者:测试美洲木本物种入侵森林的一般假设
  • 批准号:
    2331278
  • 财政年份:
    2024
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Standard Grant
The influence of general English proficiency and attitudes/orientation towards English on the development of productive knowledge of English collocations
一般英语水平和对英语的态度/取向对英语搭配生产性知识发展的影响
  • 批准号:
    24K04026
  • 财政年份:
    2024
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
A general strategy to synthesize semiconducting polymers within one minute
一分钟内合成半导体聚合物的一般策略
  • 批准号:
    24K08518
  • 财政年份:
    2024
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Grant-in-Aid for Scientific Research (C)
Postdoctoral Fellowship: STEMEdIPRF: Increasing geoscience enrollment and engagement by transforming perceptions of geoscience among students and the general public
博士后奖学金:STEMEdIPRF:通过改变学生和公众对地球科学的看法来增加地球科学的入学率和参与度
  • 批准号:
    2327348
  • 财政年份:
    2024
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Standard Grant
Collaborative Research: AF: Small: Structural Graph Algorithms via General Frameworks
合作研究:AF:小型:通过通用框架的结构图算法
  • 批准号:
    2347322
  • 财政年份:
    2024
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Standard Grant
REU Site: Quantitative Rules of Life: General Theories across Biological Systems
REU 网站:生命的定量规则:跨生物系统的一般理论
  • 批准号:
    2349052
  • 财政年份:
    2024
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Standard Grant
Collaborative Research: Superinvaders: testing a general hypothesis of forest invasions by woody species across the Americas
合作研究:超级入侵者:测试美洲木本物种入侵森林的一般假设
  • 批准号:
    2331277
  • 财政年份:
    2024
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Standard Grant
RII Track-4:NSF: Construction of New Additive and Semi-Implicit General Linear Methods
RII Track-4:NSF:新的加法和半隐式一般线性方法的构造
  • 批准号:
    2327484
  • 财政年份:
    2024
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Standard Grant
A general continuum theory of polycrystalline materials
多晶材料的一般连续介质理论
  • 批准号:
    EP/X037800/1
  • 财政年份:
    2024
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Research Grant
Fully general-relativistic magneto-hydrodynamic simulations beyond Relativity with GPUs
使用 GPU 进行超越相对论的完全广义相对论磁流体动力学模拟
  • 批准号:
    ST/Z000424/1
  • 财政年份:
    2024
  • 资助金额:
    $ 1.82万
  • 项目类别:
    Research Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了