Unsupervised and Transfer Learning for Words segmentation in Korean Social Media
韩国社交媒体分词的无监督和迁移学习
基本信息
- 批准号:523512-2018
- 负责人:
- 金额:$ 0.91万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Engage Plus Grants Program
- 财政年份:2018
- 资助国家:加拿大
- 起止时间:2018-01-01 至 2019-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Social media has pervaded virtually every aspect of our lives. People use social media in many ways on a**regular basis: networking, productivity, education, navigation and everything in between. Dealing with**multilingual social media is very challenging, as we have to deal with a certain amount of languages that can**quickly become overwhelming .**UQAM's team, under the direction of Prof. Sadat Fatiha, has completed a NSERC Engage project with the**industrial partner, Advanced Symbolic Inc., entitled 'Bridging Languages in Social Networks and**Semi-Supervised Learning for a Compact Representation', which goal was pre-processing Chinese and**Japanese social media. The developed linguistic tools have shown good performances in terms of accuracy and**F-measure, outperforming other segmentation tools with up to 4.80 pts for the Japanese segmentation and up to**+2.69 for the Chinese.**The current NSERC Engage Plus project aims at pursuing the initial project, by adding and involving Korean**as a third language in ASI's Asian languages Artificial Intelligence pipeline.**In this project, our main concern is the linguistic pre-processing of South Korean short messages (140**characters limit) of Twitter that are unstructured and highly noisy. As we do not have annotated data for the**noisy set of South Korean social media data, unsupervised and transfer learning techniques will be explored**and investigated. The construction of such powerful tool and technology will help design a complete data**mining and Natural Language Processing pipeline for Asian languages at ASI, as the company has very little**expertise in solving such problems, in an automatic way.
社交媒体几乎已经渗透到我们生活的方方面面。人们经常以多种方式使用社交媒体:网络,生产力,教育,导航以及介于两者之间的一切。处理 ** 多语言社交媒体是非常具有挑战性的,因为我们必须处理一定数量的语言,这些语言可能 ** 很快变得势不可挡。UQAM的团队,在教授的指导下。Sadat Alfreha,已经完成了NSERC参与项目与 ** 工业合作伙伴,先进的符号公司,题为“在社交网络中桥接语言和 ** 紧凑表示的半监督学习”,其目标是预处理中文和 ** 日本社交媒体。所开发的语言工具在准确性和 ** F度量方面表现良好,优于其他分割工具,日语分割高达4.80分,中文高达 **+2.69。目前的NSERC Engage Plus项目旨在通过在ASI的亚洲语言人工智能管道中添加和涉及韩语 ** 作为第三语言来实现初始项目。在这个项目中,我们主要关注的是韩国的Twitter短消息(140** 字符的限制),是非结构化和高噪声的语言预处理。由于我们没有针对韩国社交媒体数据的 ** 噪声集的注释数据,因此将探索 ** 和研究无监督和迁移学习技术。这种强大的工具和技术的构建将有助于为ASI的亚洲语言设计一个完整的数据挖掘和自然语言处理管道,因为该公司在自动解决此类问题方面的专业知识很少。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Sadat, Fatiha其他文献
Sadat, Fatiha的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Sadat, Fatiha', 18)}}的其他基金
Coping with Zero-Shot Translation and its Explainability
应对零样本翻译及其可解释性
- 批准号:
RGPIN-2019-07242 - 财政年份:2022
- 资助金额:
$ 0.91万 - 项目类别:
Discovery Grants Program - Individual
Coping with Zero-Shot Translation and its Explainability
应对零样本翻译及其可解释性
- 批准号:
RGPIN-2019-07242 - 财政年份:2021
- 资助金额:
$ 0.91万 - 项目类别:
Discovery Grants Program - Individual
Coping with Zero-Shot Translation and its Explainability
应对零样本翻译及其可解释性
- 批准号:
RGPIN-2019-07242 - 财政年份:2020
- 资助金额:
$ 0.91万 - 项目类别:
Discovery Grants Program - Individual
Coping with Zero-Shot Translation and its Explainability
应对零样本翻译及其可解释性
- 批准号:
RGPIN-2019-07242 - 财政年份:2019
- 资助金额:
$ 0.91万 - 项目类别:
Discovery Grants Program - Individual
Identification of follow up notion from radiologist dictated reports
从放射科医生口述的报告中识别后续概念
- 批准号:
530877-2018 - 财政年份:2018
- 资助金额:
$ 0.91万 - 项目类别:
Engage Grants Program
Information Extraction from medical dictated reports
从医疗报告中提取信息
- 批准号:
530559-2018 - 财政年份:2018
- 资助金额:
$ 0.91万 - 项目类别:
Connect Grants Level 1
Towards Developing Digital Language Tools to Build and Enhance Cultural Heritage Knowledge
开发数字语言工具以建立和增强文化遗产知识
- 批准号:
514027-2017 - 财政年份:2017
- 资助金额:
$ 0.91万 - 项目类别:
Connect Grants Level 1
Developing a Domain-based Ontology using Permanent Banking Instructions
使用永久银行指令开发基于领域的本体
- 批准号:
522417-2017 - 财政年份:2017
- 资助金额:
$ 0.91万 - 项目类别:
Engage Grants Program
Bridging Languages in Social Networks and Semi-Supervised Learning for a Compact Representation
连接社交网络和半监督学习中的语言以获得紧凑的表示
- 批准号:
508048-2016 - 财政年份:2016
- 资助金额:
$ 0.91万 - 项目类别:
Engage Grants Program
Aspect Classification of Social Documents
社会文献方面分类
- 批准号:
488936-2015 - 财政年份:2015
- 资助金额:
$ 0.91万 - 项目类别:
Engage Grants Program
相似国自然基金
具有时序迁移能力的Spiking-Transfer learning (脉冲-迁移学习)方法研究
- 批准号:61806040
- 批准年份:2018
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Trustworthy Hypothesis Transfer Learning
可信假设迁移学习
- 批准号:
DE240101089 - 财政年份:2024
- 资助金额:
$ 0.91万 - 项目类别:
Discovery Early Career Researcher Award
Adaptive Multi-Source Transfer Learning Approaches for Environmental Challenges
应对环境挑战的自适应多源迁移学习方法
- 批准号:
EP/Y002539/1 - 财政年份:2024
- 资助金额:
$ 0.91万 - 项目类别:
Research Grant
Transfer Learning for Monte Carlo Methods
蒙特卡罗方法的迁移学习
- 批准号:
EP/Y022300/1 - 财政年份:2024
- 资助金额:
$ 0.91万 - 项目类别:
Research Grant
CAREER: New data integration approaches for efficient and robust meta-estimation, model fusion and transfer learning
职业:新的数据集成方法,用于高效、稳健的元估计、模型融合和迁移学习
- 批准号:
2337943 - 财政年份:2024
- 资助金额:
$ 0.91万 - 项目类别:
Continuing Grant
Transfer learning in the prediction of catalytic activity of photosensitizers
迁移学习在光敏剂催化活性预测中的应用
- 批准号:
23K13744 - 财政年份:2023
- 资助金额:
$ 0.91万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Development of an sEMG-based Human-Computer Interface Utilizing Deep Transfer Learning and Continual Learning
利用深度迁移学习和持续学习开发基于表面肌电图的人机界面
- 批准号:
23H03445 - 财政年份:2023
- 资助金额:
$ 0.91万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Transfer of statistical learning from perception to production
将统计学习从感知转移到生产
- 批准号:
2346989 - 财政年份:2023
- 资助金额:
$ 0.91万 - 项目类别:
Standard Grant
Collaborative Research: New Theory and Methods for High-Dimensional Multi-Task and Transfer Learning Inference
合作研究:高维多任务和迁移学习推理的新理论和新方法
- 批准号:
2324490 - 财政年份:2023
- 资助金额:
$ 0.91万 - 项目类别:
Continuing Grant
Collaborative Research: New Theory and Methods for High-Dimensional Multi-Task and Transfer Learning Inference
合作研究:高维多任务和迁移学习推理的新理论和新方法
- 批准号:
2324489 - 财政年份:2023
- 资助金额:
$ 0.91万 - 项目类别:
Continuing Grant
Transfer learning leveraging large-scale transcriptomics to map disrupted gene networks in cardiovascular disease
利用大规模转录组学的转移学习来绘制心血管疾病中被破坏的基因网络
- 批准号:
10696753 - 财政年份:2023
- 资助金额:
$ 0.91万 - 项目类别: