权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

BIGDATA: F: Large-Scale Transductive Learning from Heterogeneous Data Sources

BIGDATA：F：来自异构数据源的大规模转化学习

基本信息

批准号：
1546329
负责人：
Yiming Yang
金额：
$ 118.85万
依托单位：
Carnegie-Mellon University
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2016
资助国家：
美国
起止时间：
2016-01-01 至 2020-12-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1546329&HistoricalAwards=false
关键词：
BIGDATA Large Scale Transductive Learning

项目摘要

Important problems in the big-data era involve predictions based on heterogeneous sources of information and the dependency structures in data. In recommendation systems, for example, predictions need to be made not only based on observed user ratings over items (movies, books, music, shopping products, etc.), but also based on information such as demographical data of users and textual descriptions of items. In event detection from textual data (news stories, tweets, maintenance reports, legal documents, etc.), joint inference must be based on who (agents), what (event types or topics), where (locations) and when (dates), and also based on the connections among agents (in social networks), topics (in an event-type ontology), locations (in a map) and temporal co-occurrences. The fundamental research questions therefore include: (1) how to develop a unified optimization framework for predictions based on heterogeneous information and dependency structures in various kinds of tasks; (2) how to make the inference computationally tractable when the combined space of model parameters is extremely large; and (3) how to significantly enhance the prediction power of the system by leveraging massively available unlabeled data in addition to human-annotated training data which are often sparse.This project will address the three challenges via the following four approaches.(1) A unified representation of heterogeneous information sources using product graphs: This framework aims to represent heterogeneous sources of data and intra-source dependencies, such as social connections among users, semantic similarities among items, contextual correlations among keywords, topical similarities among documents, hierarchical relations among topic labels, and so on. Each data source will be represented using a graph, and the individual graphs of multiple sources will be combined into a product graph where each node corresponds to a tuple of nodes in the individual graphs, and each link aggregates the links in the individual graphs. (2) Transductive learning over graph products: This project plans to reduce the inference problems in a broad range of prediction tasks to semi-supervised transductive learning problems over the product graphs mentioned above. The training data in each task (of classification, regression or link prediction) will be represented as a subset of labeled (or scored) nodes in the product graph, and the labels (or scores) of those nodes will be propagated over the links in the product graph until convergence. This project will study various kinds of graph transductions theoretically and empirically.(3) Large-scale optimization algorithms: The induced product graphs are typically extremely large. To address the computational bottlenecks, this project will develop new scalable algorithms based on theoretical properties and computational characteristics of spectral graph products, including adapted versions of rank-reduced matrix factorization, aggressive basis pruning, and sampling-based low-rank approximation.(4) Thorough evaluations in multiple important applications: The proposed new approach will be evaluated on benchmark data collections for context-aware collaborative filtering, semi-structured event detection and tracking, and expert finding via multi-source social network analysis.The proposed work, if successful, will offer principled solutions for enhancing the prediction power of systems in a broad range of tasks, whenever recommendation, classification and regression are involved. Technical impacts of the proposed work are expected in multiple research fields. For further information see the project web site at: http://nyc.lti.cs.cmu.edu/gp-trans/index.html

大数据时代的重要问题涉及基于异构信息源和数据依赖结构的预测。例如，在推荐系统中，预测不仅需要基于观察到的用户对项目（电影、书籍、音乐、购物产品等）的评级，而且还基于诸如用户的人口统计数据和项目的文本描述之类的信息。在从文本数据（新闻报道、推文、维护报告、法律的文档等）进行事件检测时，联合推断必须基于谁（代理）、什么（事件类型或主题）、哪里（位置）和何时（日期），并且还基于代理（在社交网络中）、主题（在事件类型本体中）、位置（在地图中）和时间同现之间的连接。因此，基本的研究问题包括：（1）如何在不同的任务中，基于异质信息和依赖结构，开发一个统一的预测优化框架：（2）当模型参数的组合空间非常大时，如何使推理在计算上易于处理;以及（3）如何通过利用除人类之外的大量可用的未标记数据来显著增强系统的预测能力。带注释的训练数据通常是稀疏的。本项目将通过以下四种方法来解决这三个挑战。(1)使用产品图的异构信息源的统一表示：该框架旨在表示数据的异构源和源内依赖关系，例如用户之间的社交联系，项目之间的语义相似性，关键字之间的上下文相关性，文档之间的主题相似性，主题标签之间的层次关系等。每个数据源将使用图表示，并且多个源的各个图将被组合成乘积图，其中每个节点对应于各个图中的节点的元组，并且每个链接聚合各个图中的链接。(2)图产品上的转导学习：该项目计划将广泛的预测任务中的推理问题减少到上述产品图上的半监督转导学习问题。每个任务（分类、回归或链接预测）中的训练数据将被表示为产品图中标记（或评分）节点的子集，这些节点的标签（或评分）将在产品图中的链接上传播，直到收敛。本计画将从理论与实证两方面研究各种图形转换。(3)大规模优化算法：导出的乘积图通常非常大。为了解决计算瓶颈问题，该项目将根据谱图产品的理论特性和计算特性开发新的可扩展算法，包括降秩矩阵因子分解的适应版本，积极的基础修剪和基于采样的低秩近似。(4)在多个重要应用中进行全面评估：拟议的新方法将在上下文感知协同过滤、半结构化事件检测和跟踪以及通过多源社交网络分析进行专家发现的基准数据收集上进行评估。拟议的工作如果成功，将提供原则性的解决方案，用于增强系统在广泛任务中的预测能力，无论何时推荐，涉及分类和回归。预计拟议工作将在多个研究领域产生技术影响。欲了解更多信息，请访问项目网站：http://nyc.lti.cs.cmu.edu/gp-trans/index.html

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Yiming Yang其他文献

Triple-Cation Perovskite Resistive Switching Memory with Enhanced Endurance and Retention

具有增强耐用性和保留能力的三阳离子钙钛矿电阻开关存储器

DOI：
10.1021/acsaelm.0c00674
发表时间：
2020
期刊：
ACS Applied Electronic Materials
影响因子：
4.7
作者：
Yang Huang;Lingzhi Tang;Chen Wang;Hongbo Fan;Zhenxuan Zhao;Huaqiang Wu;Min Xu;Rensheng Shen;Yiming Yang;Jiming Bian
通讯作者：
Jiming Bian

Say CHEESE: Common Human Emotional Expression Set Encoder and Its Application to Analyze Deceptive Communication

Say CHEESE：人类常见情绪表达集编码器及其在分析欺骗性通信中的应用

DOI：
发表时间：
2018
期刊：
IEEE International Conference on Automatic Face & Gesture Recognition
影响因子：
0
作者：
Taylan K. Sen;Md. Kamrul Hasan;Minh Tran;Yiming Yang;Ehsan Hoque
通讯作者：
Ehsan Hoque

Cross-Lingual Pseudo-Relevance Feedback Using a Comparable Corpus

使用可比语料库的跨语言伪相关反馈

DOI：
10.1007/3-540-45691-0_12
发表时间：
2001
期刊：
影响因子：
0
作者：
Monica Rogati;Yiming Yang
通讯作者：
Yiming Yang

High Stability of Dielectric Permittity for K0.5Na0.5NbO3-Based Lead Free Piezoelectric Ceramics

K0.5Na0.5NbO3基无铅压电陶瓷介电常数的高稳定性

DOI：
10.1080/00150193.2010.482848
发表时间：
2010-10
期刊：
Ferroelectrics
影响因子：
0.8
作者：
Hongliang Du;Zhuo Xu;Shaobo Qu;Zhen Yang;Jingbo Zhao;Yiming Yang;Weidong Mo;Song Xia
通讯作者：
Song Xia

Characteristics and sources of volatile organic compounds during pollution episodes and clean periods in the Beijing-Tianjin-Hebei region

京津冀地区污染期和清洁期挥发性有机物特征及来源

DOI：
10.1016/j.scitotenv.2021.149491
发表时间：
2021
期刊：
Science of The Total Environment
影响因子：
9.8
作者：
Suding Yang;Xin Li;Mengdi Song;Ying Liu;Xuena Yu;Shiyi Chen;Sihua Lu;Wenjie Wang;Yiming Yang;Limin Zeng;Yuanhang Zhang
通讯作者：
Yuanhang Zhang