Collaborative Research: New statistical learning and scalable computation for large unstructured data
协作研究:大型非结构化数据的新统计学习和可扩展计算
基本信息
- 批准号:1415500
- 负责人:
- 金额:$ 25.56万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2014
- 资助国家:美国
- 起止时间:2014-08-01 至 2017-07-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This proposal focuses on some fundamental issues concerning unstructured data that arise from text-heavy documents, where the underlying data exhibit unique characteristics such as large volume, large variety and large velocity of change. Automating the process of information extraction is extremely critical in the information age, and has high-utility in online surveys, and threat detection and prevention. The integrated program of research and education will have significant impacts in many fields such as machine learning and data mining, natural language processing, opinion survey, business forecasting and service, health research, and social and political science, among others. This will stimulate interdisciplinary research and collaboration with scientists from disparate fields. The proposed project requires extensive algorithm and software development for target applications. In particular, advanced computational tools will be developed through mapReduce over distributed computational platforms such as OpenMP, MPI and hadoop, and documentation of the software will be disseminated along with the technology transfer.Unstructured data impose great challenges in that text documents need to be embedded and integrated with numerical input for statistical modeling, which requires overparameterized modeling to achieve accurate prediction and unbiased inference for high-dimensional data. The proposed research aims to develop new statistical methods and tools for sentiment analysis and text summarization utilizing word relations through graphs and personalized prediction for recommender systems. It borrows information across all available information for document summarization, including tagged and untagged documents, leading to higher accuracy of tagging. This will enhance information storage, sorting and processing as well as filtering. Moreover, the project develops a novel approach for accurate personalized prediction utilizing the heterogeneity variation among all users, which impacts everyday life in terms of personalization, such as in service, recommendation and advertising. More importantly, the proposed statistical methodology and scalable computational algorithms will be valuable and useful for other types of unstructured data. Finally, many of the advanced optimization techniques and computing procedures to be developed will also be applicable to other types of ``BIG" data problems.
该建议侧重于从文本繁重的文档中产生的非结构化数据的一些基本问题,其中底层数据表现出独特的特征,如大容量、多种类和大变化速度。在信息时代,信息提取过程的自动化是极其重要的,在在线调查、威胁检测和预防中具有很高的实用性。研究和教育的综合计划将在许多领域产生重大影响,如机器学习和数据挖掘、自然语言处理、民意调查、商业预测和服务、卫生研究以及社会和政治学等。这将促进跨学科研究和与来自不同领域的科学家的合作。拟议的项目需要针对目标应用程序进行广泛的算法和软件开发。特别是在OpenMP、MPI和Hadoop等分布式计算平台上,将通过MapReduce开发先进的计算工具,软件文档将随着技术转移而传播。非结构化数据带来了巨大的挑战,需要将文本文档与数值输入嵌入和集成以进行统计建模,这需要过度参数化建模,以实现对高维数据的准确预测和无偏推理。这项研究旨在开发新的统计方法和工具,用于情感分析和文本摘要,利用图中的词关系和个性化预测为推荐系统提供支持。它在文档摘要的所有可用信息(包括已标记和未标记的文档)中借用信息,从而提高标记的准确性。这将加强信息存储、分类和处理以及过滤。此外,该项目还开发了一种新的方法,利用所有用户之间的异构性差异进行准确的个性化预测,这在个性化方面影响了日常生活,如服务、推荐和广告。更重要的是,所提出的统计方法和可伸缩的计算算法对于其他类型的非结构化数据将是有价值和有用的。最后,将要开发的许多高级优化技术和计算程序也将适用于其他类型的“大数据”问题。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Xiaotong Shen其他文献
Adaptive Regularization through Entire Solution Surface
通过整个解决方案表面的自适应正则化
- DOI:
- 发表时间:
2008 - 期刊:
- 影响因子:0
- 作者:
WU Seongho;Xiaotong Shen;C. Geyer - 通讯作者:
C. Geyer
Associations between plasma metals and hemoglobin in female college students with dysmenorrhea
- DOI:
10.1016/j.heliyon.2024.e37778 - 发表时间:
2024-09-30 - 期刊:
- 影响因子:
- 作者:
Qingzhi Hou;Yuchen Zhang;Hua Yang;Yunjie Wang;Zexi Xu;Jiujing Lin;Jia Li;Chenyang Hou;Zhanhui Qiu;Haoran Zhang;Ping Zhang;Xiangsheng Xue;Xiaotong Shen;Xinghua Xu;Hui Zou;Zhenrui Ma;Jing Gao;Xiaomei Li - 通讯作者:
Xiaomei Li
Vehicle Autonomy Using Cooperative Perception for Mobility-on-Demand Systems
使用协作感知实现按需出行系统的车辆自主
- DOI:
- 发表时间:
2015 - 期刊:
- 影响因子:0
- 作者:
Seong;T. Bandyopadhyay;B. Qin;Z. J. Chong;Wei Liu;Xiaotong Shen;S. Pendleton;J. Fu;M. Ang;Emilio Frazzoli;D. Rus - 通讯作者:
D. Rus
A DUF4281 domain-containing protein (homologue of ABA4) of emPhaeodactylum tricornutum/em regulates the biosynthesis of fucoxanthin
三角褐指藻中的一个含 DUF4281 结构域的蛋白(ABA4 的同源物)调节岩藻黄质的生物合成
- DOI:
10.1016/j.algal.2022.102728 - 发表时间:
2022-06-01 - 期刊:
- 影响因子:4.500
- 作者:
Xiaotong Shen;Kehou Pan;Lin Zhang;Baohua Zhu;Yun Li;Jichang Han - 通讯作者:
Jichang Han
Pyridine emN/em‑Oxide-Promoted Cobalt-Catalyzed Dioxygen-Mediated Methane Oxidation
吡啶氮氧化物促进的钴催化双氧介导的甲烷氧化
- DOI:
10.1021/acs.joc.3c00770 - 发表时间:
2023-08-04 - 期刊:
- 影响因子:3.600
- 作者:
Bingyin Meng;Luyao Liu;Xiaotong Shen;Wu Fan;Suhua Li - 通讯作者:
Suhua Li
Xiaotong Shen的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Xiaotong Shen', 18)}}的其他基金
FRG: Collaborative Research: Generative Learning on Unstructured Data with Applications to Natural Language Processing and Hyperlink Prediction
FRG:协作研究:非结构化数据的生成学习及其在自然语言处理和超链接预测中的应用
- 批准号:
1952539 - 财政年份:2020
- 资助金额:
$ 25.56万 - 项目类别:
Standard Grant
Collaborative Research: Collaborative Learning for Multimodal Data
协作研究:多模态数据的协作学习
- 批准号:
1712564 - 财政年份:2017
- 资助金额:
$ 25.56万 - 项目类别:
Standard Grant
Collaborative Research: Automatic Video Interpretation and Description
合作研究:自动视频解释和描述
- 批准号:
1721216 - 财政年份:2017
- 资助金额:
$ 25.56万 - 项目类别:
Standard Grant
Structured classification and regression
结构化分类和回归
- 批准号:
0906616 - 财政年份:2009
- 资助金额:
$ 25.56万 - 项目类别:
Standard Grant
Collaborative Proposal: International Research and Education: Workshops in Statistics
合作提案:国际研究和教育:统计研讨会
- 批准号:
0634639 - 财政年份:2006
- 资助金额:
$ 25.56万 - 项目类别:
Standard Grant
Inference and Prediction in a Complex Discovery Process
复杂发现过程中的推理和预测
- 批准号:
0604394 - 财政年份:2006
- 资助金额:
$ 25.56万 - 项目类别:
Standard Grant
Nonseparable Multiclass Learning for Object Tracking
用于对象跟踪的不可分离多类学习
- 批准号:
0354881 - 财政年份:2003
- 资助金额:
$ 25.56万 - 项目类别:
Continuing Grant
Nonseparable Multiclass Learning for Object Tracking
用于对象跟踪的不可分离多类学习
- 批准号:
0328802 - 财政年份:2003
- 资助金额:
$ 25.56万 - 项目类别:
Continuing Grant
Semiparametric and Nonparametric Inferences
半参数和非参数推理
- 批准号:
0072635 - 财政年份:2000
- 资助金额:
$ 25.56万 - 项目类别:
Standard Grant
相似国自然基金
Research on Quantum Field Theory without a Lagrangian Description
- 批准号:24ZR1403900
- 批准年份:2024
- 资助金额:0.0 万元
- 项目类别:省市级项目
Cell Research
- 批准号:31224802
- 批准年份:2012
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research
- 批准号:31024804
- 批准年份:2010
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Cell Research (细胞研究)
- 批准号:30824808
- 批准年份:2008
- 资助金额:24.0 万元
- 项目类别:专项基金项目
Research on the Rapid Growth Mechanism of KDP Crystal
- 批准号:10774081
- 批准年份:2007
- 资助金额:45.0 万元
- 项目类别:面上项目
相似海外基金
Collaborative Research: REU Site: Earth and Planetary Science and Astrophysics REU at the American Museum of Natural History in Collaboration with the City University of New York
合作研究:REU 地点:地球与行星科学和天体物理学 REU 与纽约市立大学合作,位于美国自然历史博物馆
- 批准号:
2348998 - 财政年份:2025
- 资助金额:
$ 25.56万 - 项目类别:
Standard Grant
Collaborative Research: REU Site: Earth and Planetary Science and Astrophysics REU at the American Museum of Natural History in Collaboration with the City University of New York
合作研究:REU 地点:地球与行星科学和天体物理学 REU 与纽约市立大学合作,位于美国自然历史博物馆
- 批准号:
2348999 - 财政年份:2025
- 资助金额:
$ 25.56万 - 项目类别:
Standard Grant
Collaborative Research: New to IUSE: EDU DCL:Diversifying Economics Education through Plug and Play Video Modules with Diverse Role Models, Relevant Research, and Active Learning
协作研究:IUSE 新增功能:EDU DCL:通过具有不同角色模型、相关研究和主动学习的即插即用视频模块实现经济学教育多元化
- 批准号:
2315700 - 财政年份:2024
- 资助金额:
$ 25.56万 - 项目类别:
Standard Grant
Collaborative Research: Resolving the LGM ventilation age conundrum: New radiocarbon records from high sedimentation rate sites in the deep western Pacific
合作研究:解决LGM通风年龄难题:西太平洋深部高沉降率地点的新放射性碳记录
- 批准号:
2341426 - 财政年份:2024
- 资助金额:
$ 25.56万 - 项目类别:
Continuing Grant
Collaborative Research: Resolving the LGM ventilation age conundrum: New radiocarbon records from high sedimentation rate sites in the deep western Pacific
合作研究:解决LGM通风年龄难题:西太平洋深部高沉降率地点的新放射性碳记录
- 批准号:
2341424 - 财政年份:2024
- 资助金额:
$ 25.56万 - 项目类别:
Continuing Grant
Collaborative Research: On New Directions for the Derivation of Wave Kinetic Equations
合作研究:波动力学方程推导的新方向
- 批准号:
2306378 - 财政年份:2024
- 资助金额:
$ 25.56万 - 项目类别:
Standard Grant
Collaborative Research: AF: Small: New Directions in Algorithmic Replicability
合作研究:AF:小:算法可复制性的新方向
- 批准号:
2342244 - 财政年份:2024
- 资助金额:
$ 25.56万 - 项目类别:
Standard Grant
Collaborative Research: New to IUSE: EDU DCL:Diversifying Economics Education through Plug and Play Video Modules with Diverse Role Models, Relevant Research, and Active Learning
协作研究:IUSE 新增功能:EDU DCL:通过具有不同角色模型、相关研究和主动学习的即插即用视频模块实现经济学教育多元化
- 批准号:
2315699 - 财政年份:2024
- 资助金额:
$ 25.56万 - 项目类别:
Standard Grant
Collaborative Research: Understanding New Labor Relations for the 21st Century
合作研究:理解21世纪的新型劳动关系
- 批准号:
2346230 - 财政年份:2024
- 资助金额:
$ 25.56万 - 项目类别:
Standard Grant
Collaborative Research: New Regression Models and Methods for Studying Multiple Categorical Responses
合作研究:研究多重分类响应的新回归模型和方法
- 批准号:
2415067 - 财政年份:2024
- 资助金额:
$ 25.56万 - 项目类别:
Continuing Grant