BIGDATA: F: Large-Scale Transductive Learning from Heterogeneous Data Sources
BIGDATA:F:来自异构数据源的大规模转化学习
基本信息
- 批准号:1546329
- 负责人:
- 金额:$ 118.85万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2016
- 资助国家:美国
- 起止时间:2016-01-01 至 2020-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Important problems in the big-data era involve predictions based on heterogeneous sources of information and the dependency structures in data. In recommendation systems, for example, predictions need to be made not only based on observed user ratings over items (movies, books, music, shopping products, etc.), but also based on information such as demographical data of users and textual descriptions of items. In event detection from textual data (news stories, tweets, maintenance reports, legal documents, etc.), joint inference must be based on who (agents), what (event types or topics), where (locations) and when (dates), and also based on the connections among agents (in social networks), topics (in an event-type ontology), locations (in a map) and temporal co-occurrences. The fundamental research questions therefore include: (1) how to develop a unified optimization framework for predictions based on heterogeneous information and dependency structures in various kinds of tasks; (2) how to make the inference computationally tractable when the combined space of model parameters is extremely large; and (3) how to significantly enhance the prediction power of the system by leveraging massively available unlabeled data in addition to human-annotated training data which are often sparse.This project will address the three challenges via the following four approaches.(1) A unified representation of heterogeneous information sources using product graphs: This framework aims to represent heterogeneous sources of data and intra-source dependencies, such as social connections among users, semantic similarities among items, contextual correlations among keywords, topical similarities among documents, hierarchical relations among topic labels, and so on. Each data source will be represented using a graph, and the individual graphs of multiple sources will be combined into a product graph where each node corresponds to a tuple of nodes in the individual graphs, and each link aggregates the links in the individual graphs. (2) Transductive learning over graph products: This project plans to reduce the inference problems in a broad range of prediction tasks to semi-supervised transductive learning problems over the product graphs mentioned above. The training data in each task (of classification, regression or link prediction) will be represented as a subset of labeled (or scored) nodes in the product graph, and the labels (or scores) of those nodes will be propagated over the links in the product graph until convergence. This project will study various kinds of graph transductions theoretically and empirically.(3) Large-scale optimization algorithms: The induced product graphs are typically extremely large. To address the computational bottlenecks, this project will develop new scalable algorithms based on theoretical properties and computational characteristics of spectral graph products, including adapted versions of rank-reduced matrix factorization, aggressive basis pruning, and sampling-based low-rank approximation.(4) Thorough evaluations in multiple important applications: The proposed new approach will be evaluated on benchmark data collections for context-aware collaborative filtering, semi-structured event detection and tracking, and expert finding via multi-source social network analysis.The proposed work, if successful, will offer principled solutions for enhancing the prediction power of systems in a broad range of tasks, whenever recommendation, classification and regression are involved. Technical impacts of the proposed work are expected in multiple research fields. For further information see the project web site at: http://nyc.lti.cs.cmu.edu/gp-trans/index.html
大数据时代的重要问题涉及基于异构信息源和数据依赖结构的预测。 例如,在推荐系统中,预测不仅需要基于观察到的用户对项目(电影、书籍、音乐、购物产品等)的评级,而且还基于诸如用户的人口统计数据和项目的文本描述之类的信息。在从文本数据(新闻报道、推文、维护报告、法律的文档等)进行事件检测时,联合推断必须基于谁(代理)、什么(事件类型或主题)、哪里(位置)和何时(日期),并且还基于代理(在社交网络中)、主题(在事件类型本体中)、位置(在地图中)和时间同现之间的连接。 因此,基本的研究问题包括:(1)如何在不同的任务中,基于异质信息和依赖结构,开发一个统一的预测优化框架:(2)当模型参数的组合空间非常大时,如何使推理在计算上易于处理;以及(3)如何通过利用除人类之外的大量可用的未标记数据来显著增强系统的预测能力。带注释的训练数据通常是稀疏的。本项目将通过以下四种方法来解决这三个挑战。(1)使用产品图的异构信息源的统一表示:该框架旨在表示数据的异构源和源内依赖关系,例如用户之间的社交联系,项目之间的语义相似性,关键字之间的上下文相关性,文档之间的主题相似性,主题标签之间的层次关系等。每个数据源将使用图表示,并且多个源的各个图将被组合成乘积图,其中每个节点对应于各个图中的节点的元组,并且每个链接聚合各个图中的链接。(2)图产品上的转导学习:该项目计划将广泛的预测任务中的推理问题减少到上述产品图上的半监督转导学习问题。 每个任务(分类、回归或链接预测)中的训练数据将被表示为产品图中标记(或评分)节点的子集,这些节点的标签(或评分)将在产品图中的链接上传播,直到收敛。 本计画将从理论与实证两方面研究各种图形转换。(3)大规模优化算法:导出的乘积图通常非常大。为了解决计算瓶颈问题,该项目将根据谱图产品的理论特性和计算特性开发新的可扩展算法,包括降秩矩阵因子分解的适应版本,积极的基础修剪和基于采样的低秩近似。(4)在多个重要应用中进行全面评估:拟议的新方法将在上下文感知协同过滤、半结构化事件检测和跟踪以及通过多源社交网络分析进行专家发现的基准数据收集上进行评估。拟议的工作如果成功,将提供原则性的解决方案,用于增强系统在广泛任务中的预测能力,无论何时推荐,涉及分类和回归。预计拟议工作将在多个研究领域产生技术影响。欲了解更多信息,请访问项目网站:http://nyc.lti.cs.cmu.edu/gp-trans/index.html
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Yiming Yang其他文献
Triple-Cation Perovskite Resistive Switching Memory with Enhanced Endurance and Retention
具有增强耐用性和保留能力的三阳离子钙钛矿电阻开关存储器
- DOI:
10.1021/acsaelm.0c00674 - 发表时间:
2020 - 期刊:
- 影响因子:4.7
- 作者:
Yang Huang;Lingzhi Tang;Chen Wang;Hongbo Fan;Zhenxuan Zhao;Huaqiang Wu;Min Xu;Rensheng Shen;Yiming Yang;Jiming Bian - 通讯作者:
Jiming Bian
Say CHEESE: Common Human Emotional Expression Set Encoder and Its Application to Analyze Deceptive Communication
Say CHEESE:人类常见情绪表达集编码器及其在分析欺骗性通信中的应用
- DOI:
- 发表时间:
2018 - 期刊:
- 影响因子:0
- 作者:
Taylan K. Sen;Md. Kamrul Hasan;Minh Tran;Yiming Yang;Ehsan Hoque - 通讯作者:
Ehsan Hoque
Cross-Lingual Pseudo-Relevance Feedback Using a Comparable Corpus
使用可比语料库的跨语言伪相关反馈
- DOI:
10.1007/3-540-45691-0_12 - 发表时间:
2001 - 期刊:
- 影响因子:0
- 作者:
Monica Rogati;Yiming Yang - 通讯作者:
Yiming Yang
High Stability of Dielectric Permittity for K0.5Na0.5NbO3-Based Lead Free Piezoelectric Ceramics
K0.5Na0.5NbO3基无铅压电陶瓷介电常数的高稳定性
- DOI:
10.1080/00150193.2010.482848 - 发表时间:
2010-10 - 期刊:
- 影响因子:0.8
- 作者:
Hongliang Du;Zhuo Xu;Shaobo Qu;Zhen Yang;Jingbo Zhao;Yiming Yang;Weidong Mo;Song Xia - 通讯作者:
Song Xia
Characteristics and sources of volatile organic compounds during pollution episodes and clean periods in the Beijing-Tianjin-Hebei region
京津冀地区污染期和清洁期挥发性有机物特征及来源
- DOI:
10.1016/j.scitotenv.2021.149491 - 发表时间:
2021 - 期刊:
- 影响因子:9.8
- 作者:
Suding Yang;Xin Li;Mengdi Song;Ying Liu;Xuena Yu;Shiyi Chen;Sihua Lu;Wenjie Wang;Yiming Yang;Limin Zeng;Yuanhang Zhang - 通讯作者:
Yuanhang Zhang
Yiming Yang的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Yiming Yang', 18)}}的其他基金
III: Small: Multi-field Hierarchical Discovery and Tracking (mf-HDT) of Emerging Topics
III:小型:新兴主题的多领域分层发现和跟踪(mf-HDT)
- 批准号:
1216282 - 财政年份:2012
- 资助金额:
$ 118.85万 - 项目类别:
Standard Grant
III-COR: Collaborative Research: User-centric, Adaptive and Collaborative Information Filtering
III-COR:协作研究:以用户为中心的自适应协作信息过滤
- 批准号:
0704689 - 财政年份:2007
- 资助金额:
$ 118.85万 - 项目类别:
Standard Grant
KDI: Universal Information Access: Translingual Retrieval, Summarization, Tracking, Detection and Validation
KDI:通用信息访问:跨语言检索、总结、跟踪、检测和验证
- 批准号:
9873009 - 财政年份:1998
- 资助金额:
$ 118.85万 - 项目类别:
Standard Grant
相似国自然基金
水稻穗粒数调控关键因子LARGE6的分子遗传网络解析
- 批准号:
- 批准年份:2022
- 资助金额:30 万元
- 项目类别:青年科学基金项目
量子自旋液体中拓扑拟粒子的性质:量子蒙特卡罗和新的large-N理论
- 批准号:
- 批准年份:2020
- 资助金额:62 万元
- 项目类别:面上项目
甘蓝型油菜Large Grain基因调控粒重的分子机制研究
- 批准号:31972875
- 批准年份:2019
- 资助金额:58.0 万元
- 项目类别:面上项目
Large PB/PB小鼠 视网膜新生血管模型的研究
- 批准号:30971650
- 批准年份:2009
- 资助金额:8.0 万元
- 项目类别:面上项目
基因discs large在果蝇卵母细胞的后端定位及其体轴极性形成中的作用机制
- 批准号:30800648
- 批准年份:2008
- 资助金额:20.0 万元
- 项目类别:青年科学基金项目
LARGE基因对口腔癌细胞中α-DG糖基化及表达的分子调控
- 批准号:30772435
- 批准年份:2007
- 资助金额:29.0 万元
- 项目类别:面上项目
相似海外基金
BIGDATA: F: Collaborative Research: Practical Analysis of Large-Scale Data with Lyme Disease Case Study
BIGDATA:F:协作研究:莱姆病案例研究大规模数据的实际分析
- 批准号:
1934319 - 财政年份:2019
- 资助金额:
$ 118.85万 - 项目类别:
Standard Grant
BIGDATA: Collaborative Research: F: Efficient Distributed Computation of Large-Scale Graph Problems in Epidemiology and Contagion Dynamics
BIGDATA:协作研究:F:流行病学和传染动力学中大规模图问题的高效分布式计算
- 批准号:
1931628 - 财政年份:2019
- 资助金额:
$ 118.85万 - 项目类别:
Standard Grant
BIGDATA: IA: Enabling Large-Scale, Privacy-Preserving Genomic Computing with a Hardware-Assisted Secure Big-Data Analytics Framework
BIGDATA:IA:利用硬件辅助的安全大数据分析框架实现大规模、隐私保护的基因组计算
- 批准号:
1838083 - 财政年份:2019
- 资助金额:
$ 118.85万 - 项目类别:
Standard Grant
BIGDATA: F: Collaborative Research: Theory and Practice of Randomized Algorithms for Ultra-Large-Scale Signal Processing
BIGDATA:F:协作研究:超大规模信号处理随机算法的理论与实践
- 批准号:
1838177 - 财政年份:2018
- 资助金额:
$ 118.85万 - 项目类别:
Standard Grant
BIGDATA: F: Computationally Efficient Algorithms for Large-Scale Crossed Random Effects Models
BIGDATA:F:大规模交叉随机效应模型的计算高效算法
- 批准号:
1837931 - 财政年份:2018
- 资助金额:
$ 118.85万 - 项目类别:
Standard Grant
BIGDATA: F: Algorithms for Tensor-Based Modeling of Large Scale Structured Data
BIGDATA:F:大规模结构化数据基于张量的建模算法
- 批准号:
1837985 - 财政年份:2018
- 资助金额:
$ 118.85万 - 项目类别:
Standard Grant
BIGDATA: F: Collaborative Research: Theory and Practice of Randomized Algorithms for Ultra-Large-Scale Signal Processing
BIGDATA:F:协作研究:超大规模信号处理随机算法的理论与实践
- 批准号:
1838131 - 财政年份:2018
- 资助金额:
$ 118.85万 - 项目类别:
Standard Grant
BIGDATA: F: Collaborative Research: Practical Analysis of Large-Scale Data with Lyme Disease Case Study
BIGDATA:F:协作研究:莱姆病案例研究大规模数据的实际分析
- 批准号:
1740325 - 财政年份:2017
- 资助金额:
$ 118.85万 - 项目类别:
Standard Grant
BIGDATA: F: Collaborative Research: Practical Analysis of Large-Scale Data with Lyme Disease Case Study
BIGDATA:F:协作研究:莱姆病案例研究大规模数据的实际分析
- 批准号:
1740312 - 财政年份:2017
- 资助金额:
$ 118.85万 - 项目类别:
Standard Grant
BIGDATA: Collaborative Research: F: Efficient Distributed Computation of Large-Scale Graph Problems in Epidemiology and Contagion Dynamics
BIGDATA:协作研究:F:流行病学和传染动力学中大规模图问题的高效分布式计算
- 批准号:
1633720 - 财政年份:2016
- 资助金额:
$ 118.85万 - 项目类别:
Standard Grant