Collaborative Research: New Statistical Learning and Scalable Computing for Large Unstructured Data
协作研究:大型非结构化数据的新统计学习和可扩展计算
基本信息
- 批准号:1415308
- 负责人:
- 金额:$ 22.9万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-08-01 至 2017-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This proposal focuses on some fundamental issues concerning unstructured data that arise from text-heavy documents, where the underlying data exhibit unique characteristics such as large volume, large variety and large velocity of change. Automating the process of information extraction is extremely critical in the information age, and has high-utility in online surveys, and threat detection and prevention. The integrated program of research and education will have significant impacts in many fields such as machine learning and data mining, natural language processing, opinion survey, business forecasting and service, health research, and social and political science, among others. This will stimulate interdisciplinary research and collaboration with scientists from disparate fields. The proposed project requires extensive algorithm and software development for target applications. In particular, advanced computational tools will be developed through mapReduce over distributed computational platforms such as OpenMP, MPI and hadoop, and documentation of the software will be disseminated along with the technology transfer.Unstructured data impose great challenges in that text documents need to be embedded and integrated with numerical input for statistical modeling, which requires overparameterized modeling to achieve accurate prediction and unbiased inference for high-dimensional data. The proposed research aims to develop new statistical methods and tools for sentiment analysis and text summarization utilizing word relations through graphs and personalized prediction for recommender systems. It borrows information across all available information for document summarization, including tagged and untagged documents, leading to higher accuracy of tagging. This will enhance information storage, sorting and processing as well as filtering. Moreover, the project develops a novel approach for accurate personalized prediction utilizing the heterogeneity variation among all users, which impacts everyday life in terms of personalization, such as in service, recommendation and advertising. More importantly, the proposed statistical methodology and scalable computational algorithms will be valuable and useful for other types of unstructured data. Finally, many of the advanced optimization techniques and computing procedures to be developed will also be applicable to other types of ``BIG" data problems.
这一建议侧重于一些基本问题,涉及非结构化数据,从文本密集的文件,其中的基础数据表现出独特的特点,如大容量,大品种和大的变化速度。信息提取过程的自动化在信息时代至关重要,在在线调查、威胁检测和预防中具有很高的实用性。研究和教育的综合计划将在许多领域产生重大影响,如机器学习和数据挖掘,自然语言处理,民意调查,商业预测和服务,健康研究以及社会和政治科学等。这将促进跨学科研究和与来自不同领域的科学家的合作。拟议的项目需要广泛的算法和软件开发的目标应用程序。特别是,将在分布式计算平台(如OpenMP、MPI和Hadoop)上通过mapReduce开发先进的计算工具,并将沿着技术转让传播软件文档。非结构化数据带来了巨大的挑战,因为文本文档需要嵌入并与数字输入集成,以进行统计建模,这需要过参数化建模来实现对高维数据的准确预测和无偏推断。建议的研究旨在开发新的统计方法和工具,情感分析和文本摘要利用词的关系,通过图形和个性化的预测推荐系统。 它借用所有可用信息的信息进行文档摘要,包括标记和未标记的文档,从而提高标记的准确性。 这将加强信息的储存、分类和处理以及过滤。此外,该项目还开发了一种新的方法,利用所有用户之间的异质性变化进行准确的个性化预测,这会影响日常生活中的个性化,例如服务,推荐和广告。更重要的是,所提出的统计方法和可扩展的计算算法对于其他类型的非结构化数据将是有价值和有用的。最后,许多先进的优化技术和计算程序也将适用于其他类型的“大”数据问题。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Annie Qu其他文献
At-harvest prediction of grey mould risk in pear fruit in long-term cold storage
- DOI:
10.1016/j.cropro.2009.01.001 - 发表时间:
2009-05-01 - 期刊:
- 影响因子:
- 作者:
Robert A. Spotts;Maryna Serdani;Kelly M. Wallis;Monika Walter;Trish Harris-Virgin;Kim Spotts;David Sugar;Chang Lin Xiao;Annie Qu - 通讯作者:
Annie Qu
Dynamic Tensor Recommender Systems
动态张量推荐系统
- DOI:
- 发表时间:
2021 - 期刊:
- 影响因子:0
- 作者:
Yanqing Zhang;Xuan Bi;Niansheng Tang;Annie Qu - 通讯作者:
Annie Qu
Dynamic Tensor Recommender System
动态张量推荐系统
- DOI:
10.11159/icsta19.09 - 发表时间:
2019-08 - 期刊:
- 影响因子:6
- 作者:
Yanqing Zhang;Xuan Bi;Niansheng Tang;Annie Qu - 通讯作者:
Annie Qu
Imputed Factor Regression for High-dimensional Block-wise Missing Data
高维分块缺失数据的估算因子回归
- DOI:
10.5705/ss.202018.0008 - 发表时间:
2020 - 期刊:
- 影响因子:1.4
- 作者:
Yanqing Zhang;Niansheng Tang;Annie Qu - 通讯作者:
Annie Qu
Discussion of Fan et al.’s paper “Gaining efficiency via weighted estimators for multivariate failure time data”
- DOI:
10.1007/s11425-009-0135-2 - 发表时间:
2009-06-01 - 期刊:
- 影响因子:1.500
- 作者:
Annie Qu;Lan Xue - 通讯作者:
Lan Xue
Annie Qu的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Annie Qu', 18)}}的其他基金
Collaborative Research: Integrative Heterogeneous Learning for Intensive Complex Longitudinal Data
协作研究:密集复杂纵向数据的综合异构学习
- 批准号:
2210640 - 财政年份:2022
- 资助金额:
$ 22.9万 - 项目类别:
Standard Grant
Collaborative Research: New Statistical Learning for Complex Heterogeneous Data
协作研究:复杂异构数据的新统计学习
- 批准号:
2019461 - 财政年份:2020
- 资助金额:
$ 22.9万 - 项目类别:
Standard Grant
FRG: Collaborative Research: Generative Learning on Unstructured Data with Applications to Natural Language Processing and Hyperlink Prediction
FRG:协作研究:非结构化数据的生成学习及其在自然语言处理和超链接预测中的应用
- 批准号:
1952406 - 财政年份:2020
- 资助金额:
$ 22.9万 - 项目类别:
Standard Grant
Conference on Statistical Learning and Data Science
统计学习与数据科学会议
- 批准号:
1818546 - 财政年份:2018
- 资助金额:
$ 22.9万 - 项目类别:
Standard Grant
Collaborative Research: New Statistical Learning for Complex Heterogeneous Data
协作研究:复杂异构数据的新统计学习
- 批准号:
1821198 - 财政年份:2018
- 资助金额:
$ 22.9万 - 项目类别:
Standard Grant
Personalized classification, moment selection, and time-varying networks for large-scale longitudinal data
大规模纵向数据的个性化分类、矩选择和时变网络
- 批准号:
1308227 - 财政年份:2013
- 资助金额:
$ 22.9万 - 项目类别:
Standard Grant
Model selection and efficient learning for high dimensional clustered data
高维聚类数据的模型选择和高效学习
- 批准号:
0906660 - 财政年份:2009
- 资助金额:
$ 22.9万 - 项目类别:
Standard Grant
CAREER: Semiparametric and Non-Parametric Models for Correlated Data
职业:相关数据的半参数和非参数模型
- 批准号:
0902232 - 财政年份:2008
- 资助金额:
$ 22.9万 - 项目类别:
Continuing Grant
CAREER: Semiparametric and Non-Parametric Models for Correlated Data
职业:相关数据的半参数和非参数模型
- 批准号:
0348764 - 财政年份:2004
- 资助金额:
$ 22.9万 - 项目类别:
Continuing Grant
Semiparametric Models for Correlated Data: The Quadratic Inference Function Approach
相关数据的半参数模型:二次推理函数方法
- 批准号:
0103513 - 财政年份:2001
- 资助金额:
$ 22.9万 - 项目类别:
Standard Grant
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: REU Site: Earth and Planetary Science and Astrophysics REU at the American Museum of Natural History in Collaboration with the City University of New York
合作研究:REU 地点:地球与行星科学和天体物理学 REU 与纽约市立大学合作,位于美国自然历史博物馆
- 批准号:
2348998 - 财政年份:2025
- 资助金额:
$ 22.9万 - 项目类别:
Standard Grant
Collaborative Research: REU Site: Earth and Planetary Science and Astrophysics REU at the American Museum of Natural History in Collaboration with the City University of New York
合作研究:REU 地点:地球与行星科学和天体物理学 REU 与纽约市立大学合作,位于美国自然历史博物馆
- 批准号:
2348999 - 财政年份:2025
- 资助金额:
$ 22.9万 - 项目类别:
Standard Grant
Collaborative Research: Resolving the LGM ventilation age conundrum: New radiocarbon records from high sedimentation rate sites in the deep western Pacific
合作研究:解决LGM通风年龄难题:西太平洋深部高沉降率地点的新放射性碳记录
- 批准号:
2341426 - 财政年份:2024
- 资助金额:
$ 22.9万 - 项目类别:
Continuing Grant
Collaborative Research: Resolving the LGM ventilation age conundrum: New radiocarbon records from high sedimentation rate sites in the deep western Pacific
合作研究:解决LGM通风年龄难题:西太平洋深部高沉降率地点的新放射性碳记录
- 批准号:
2341424 - 财政年份:2024
- 资助金额:
$ 22.9万 - 项目类别:
Continuing Grant
Collaborative Research: New to IUSE: EDU DCL:Diversifying Economics Education through Plug and Play Video Modules with Diverse Role Models, Relevant Research, and Active Learning
协作研究:IUSE 新增功能:EDU DCL:通过具有不同角色模型、相关研究和主动学习的即插即用视频模块实现经济学教育多元化
- 批准号:
2315700 - 财政年份:2024
- 资助金额:
$ 22.9万 - 项目类别:
Standard Grant
Collaborative Research: AF: Small: New Directions in Algorithmic Replicability
合作研究:AF:小:算法可复制性的新方向
- 批准号:
2342244 - 财政年份:2024
- 资助金额:
$ 22.9万 - 项目类别:
Standard Grant
Collaborative Research: On New Directions for the Derivation of Wave Kinetic Equations
合作研究:波动力学方程推导的新方向
- 批准号:
2306378 - 财政年份:2024
- 资助金额:
$ 22.9万 - 项目类别:
Standard Grant
Collaborative Research: New to IUSE: EDU DCL:Diversifying Economics Education through Plug and Play Video Modules with Diverse Role Models, Relevant Research, and Active Learning
协作研究:IUSE 新增功能:EDU DCL:通过具有不同角色模型、相关研究和主动学习的即插即用视频模块实现经济学教育多元化
- 批准号:
2315699 - 财政年份:2024
- 资助金额:
$ 22.9万 - 项目类别:
Standard Grant
Collaborative Research: Understanding New Labor Relations for the 21st Century
合作研究:理解21世纪的新型劳动关系
- 批准号:
2346230 - 财政年份:2024
- 资助金额:
$ 22.9万 - 项目类别:
Standard Grant
Collaborative Research: New Regression Models and Methods for Studying Multiple Categorical Responses
合作研究:研究多重分类响应的新回归模型和方法
- 批准号:
2415067 - 财政年份:2024
- 资助金额:
$ 22.9万 - 项目类别:
Continuing Grant