Aspect Classification of Social Documents
社会文献方面分类
基本信息
- 批准号:488936-2015
- 负责人:
- 金额:$ 1.82万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Engage Grants Program
- 财政年份:2015
- 资助国家:加拿大
- 起止时间:2015-01-01 至 2016-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The current research project aims at creating tools for analyzing, modelling and classifying social documents into general categories. Two languages are concerned in this research: French and English. Twitter is a popular short messaging service available through Web page, desktop and mobile software. In the current research, Twitter posts will be considered as a corpus of social documents. The main steps of the project are explained through the monolingual social documents classification and the Cross-Language social documents classification.
In this research project, various well-known classification models in the machine learning field will be tested and compared to each others such as Naïve Bayes, Random Forest, Support Vector Machine, etc. Moreover, Tweet Natural Language Processing (NLP) tools such as part-of-speech taggers and a dependency parser will be used in order to extract several features based on NLP knowledge of the tweets for the classifiers. Our objective is to identify the optimal combination of features that yields good prediction results, while avoiding overfitting.
The Cross-lingual text classification is a major challenge in NLP, since often training data is available in only one language (target language), but not available for the language of the document we want to classify (source language). Classifying French tweets will be more complex and challenging as we do not have affordable Tweet NLP tools for this language in order to apply the same classification method. One can proceed in two ways: First, a monolingual classification method as explained earlier, will be applied on the French social documents, with considering fewer NLP features. The second solution is a Cross-lingual Text Classification Using topic-dependent word probabilities. Having social documents in the two languages, one can adopt a naïve approach by considering the combination of the multiple independent monolingual (and cross-language) text classifiers.
目前的研究项目旨在创建工具,用于分析,建模和分类社会文件到一般类别。本研究涉及两种语言:法语和英语。Twitter是一种流行的短消息服务,可以通过网页、桌面和移动的软件使用。 在目前的研究中,Twitter帖子将被视为一个语料库的社会文件。通过单语社会文献分类和跨语言社会文献分类说明了该项目的主要步骤。
在本研究项目中,将测试机器学习领域中各种知名的分类模型,并将其相互比较,如朴素贝叶斯,随机森林,支持向量机等。此外,将使用推文自然语言处理(NLP)工具,如词性标记器和依赖分析器,以便基于NLP知识为分类器提取推文的几个特征。我们的目标是确定产生良好预测结果的特征的最佳组合,同时避免过拟合。
跨语言文本分类是NLP中的一个重大挑战,因为训练数据通常只以一种语言(目标语言)提供,但不适用于我们想要分类的文档的语言(源语言)。对法语推文进行分类将更加复杂和具有挑战性,因为我们没有针对这种语言的负担得起的推文NLP工具来应用相同的分类方法。可以通过两种方式进行:首先,如前所述的单语分类方法将应用于法语社会文档,考虑较少的NLP特征。第二种解决方案是使用主题相关单词概率的跨语言文本分类。有了两种语言的社交文档,人们可以通过考虑多个独立的单语(和跨语言)文本分类器的组合来采用简单的方法。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Sadat, Fatiha其他文献
Sadat, Fatiha的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Sadat, Fatiha', 18)}}的其他基金
Coping with Zero-Shot Translation and its Explainability
应对零样本翻译及其可解释性
- 批准号:
RGPIN-2019-07242 - 财政年份:2022
- 资助金额:
$ 1.82万 - 项目类别:
Discovery Grants Program - Individual
Coping with Zero-Shot Translation and its Explainability
应对零样本翻译及其可解释性
- 批准号:
RGPIN-2019-07242 - 财政年份:2021
- 资助金额:
$ 1.82万 - 项目类别:
Discovery Grants Program - Individual
Coping with Zero-Shot Translation and its Explainability
应对零样本翻译及其可解释性
- 批准号:
RGPIN-2019-07242 - 财政年份:2020
- 资助金额:
$ 1.82万 - 项目类别:
Discovery Grants Program - Individual
Coping with Zero-Shot Translation and its Explainability
应对零样本翻译及其可解释性
- 批准号:
RGPIN-2019-07242 - 财政年份:2019
- 资助金额:
$ 1.82万 - 项目类别:
Discovery Grants Program - Individual
Identification of follow up notion from radiologist dictated reports
从放射科医生口述的报告中识别后续概念
- 批准号:
530877-2018 - 财政年份:2018
- 资助金额:
$ 1.82万 - 项目类别:
Engage Grants Program
Unsupervised and Transfer Learning for Words segmentation in Korean Social Media
韩国社交媒体分词的无监督和迁移学习
- 批准号:
523512-2018 - 财政年份:2018
- 资助金额:
$ 1.82万 - 项目类别:
Engage Plus Grants Program
Information Extraction from medical dictated reports
从医疗报告中提取信息
- 批准号:
530559-2018 - 财政年份:2018
- 资助金额:
$ 1.82万 - 项目类别:
Connect Grants Level 1
Towards Developing Digital Language Tools to Build and Enhance Cultural Heritage Knowledge
开发数字语言工具以建立和增强文化遗产知识
- 批准号:
514027-2017 - 财政年份:2017
- 资助金额:
$ 1.82万 - 项目类别:
Connect Grants Level 1
Developing a Domain-based Ontology using Permanent Banking Instructions
使用永久银行指令开发基于领域的本体
- 批准号:
522417-2017 - 财政年份:2017
- 资助金额:
$ 1.82万 - 项目类别:
Engage Grants Program
Bridging Languages in Social Networks and Semi-Supervised Learning for a Compact Representation
连接社交网络和半监督学习中的语言以获得紧凑的表示
- 批准号:
508048-2016 - 财政年份:2016
- 资助金额:
$ 1.82万 - 项目类别:
Engage Grants Program
相似海外基金
The Personal Data Economy, the Corporatist State Model, and a Global Framework for an Emergent Classification of Social, Political, and Economic Power
个人数据经济、社团主义国家模型以及社会、政治和经济权力新兴分类的全球框架
- 批准号:
2096931 - 财政年份:2018
- 资助金额:
$ 1.82万 - 项目类别:
Studentship
A classification of public policy contributing to social innovation and an analysis of effect mechanism
公共政策促进社会创新的分类及效应机制分析
- 批准号:
18K12698 - 财政年份:2018
- 资助金额:
$ 1.82万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Exploration of methods of measurement and analyses of theory-based social class classification for health research in Japan
日本健康研究基于理论的社会阶层分类测量方法探索
- 批准号:
18K19699 - 财政年份:2018
- 资助金额:
$ 1.82万 - 项目类别:
Grant-in-Aid for Challenging Research (Exploratory)
Encrypted social media traffic classification
加密社交媒体流量分类
- 批准号:
522181-2017 - 财政年份:2017
- 资助金额:
$ 1.82万 - 项目类别:
Engage Grants Program
Development of a general classification framework under the Neyman-Pearson Paradigm, with biomedical and social applications
在内曼-皮尔逊范式下开发通用分类框架,并具有生物医学和社会应用
- 批准号:
1613338 - 财政年份:2016
- 资助金额:
$ 1.82万 - 项目类别:
Standard Grant
Online fuzzy classification of social communities
社交社区在线模糊分类
- 批准号:
392992-2010 - 财政年份:2012
- 资助金额:
$ 1.82万 - 项目类别:
Postgraduate Scholarships - Doctoral
Development of the Autism Classification System of Functioning: Social Communication
自闭症功能分类系统的发展:社会沟通
- 批准号:
224232 - 财政年份:2011
- 资助金额:
$ 1.82万 - 项目类别:
Operating Grants
Online fuzzy classification of social communities
社交社区在线模糊分类
- 批准号:
392992-2010 - 财政年份:2011
- 资助金额:
$ 1.82万 - 项目类别:
Postgraduate Scholarships - Doctoral
Online fuzzy classification of social communities
社交社区在线模糊分类
- 批准号:
392992-2010 - 财政年份:2010
- 资助金额:
$ 1.82万 - 项目类别:
Postgraduate Scholarships - Doctoral
Scholars Award: Measure for Measure: Social Ontologies of Classification
学者奖:衡量衡量:分类的社会本体论
- 批准号:
0849052 - 财政年份:2009
- 资助金额:
$ 1.82万 - 项目类别:
Continuing Grant