REU Site: University of North Carolina at Greensboro in Complex Data Analysis using Statistical and Machine Learning Tools

REU 网站:北卡罗来纳大学格林斯博罗分校使用统计和机器学习工具进行复杂数据分析

基本信息

  • 批准号:
    1950549
  • 负责人:
  • 金额:
    $ 32.4万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2020
  • 资助国家:
    美国
  • 起止时间:
    2020-01-01 至 2023-12-31
  • 项目状态:
    已结题

项目摘要

In this age of big data, technological innovations allow the collection of massive amounts of data at low cost. Sizes of hundreds of gigabytes, terabytes, and even petabytes are no longer uncommon. Appropriate handling and analysis of such data is vital training for the future generation of graduates. With this goal in mind, the proposed REU program aims to provide 10-week sophisticated training in “Complex Data Analysis using Statistical and Machine Learning Tools” to eight (8) highly motivated nationally selected undergraduates from Mathematical Sciences during summers of 2020-2022. The faculty mentors bring a rich and diverse experience to this training program. PI Gupta, Co-PI Gao, Senior Personnel Richter and Stufken are statisticians; Senior Personnel Mohanty is a Computer Science faculty specializing in machine learning tools; and Senior Personnel Sun is a statistical geneticist in the Department of Mathematics and Statistics. The training program will motivate the student participants, particularly those from under-represented minorities, to go on to graduate programs in mathematical sciences and become better trained professionals capable of handling societal data analytics needs. As part of broader professional training, students will undertake trips to major research centers in North Carolina such as SAS, SAMSI (Statistical and Applied Mathematical Sciences Institute), and the Joint School of Nano Science and Nano Engineering. We expect that the research completed as part of this training will be of very high quality and will lead to journal articles and conference presentations. Complexity in data can come in a variety of ways. High dimensionality of the data is one such complexity where the number of variables can be relatively large as compared to the data size. Data contamination is another type of complexity. Students will learn the art of simultaneous handling of dimensionality-reduction and outlier detection as part of one of the projects. Noise-added data (to create confidentiality in data before public release) is another type of complexity. It is becoming common for researchers to have access only to scrambled data, and not the real data. In one of the projects, we will talk about why and how data are scrambled and de-scrambled using randomized response models, retaining aggregate level properties and ensuring anonymity to respondents. The machine learning component of the REU program will focus on leveraging the capabilities of sophisticated tools such as Unsupervised and Supervised Machine Learning, and Deep Learning for identification of social media posts related to disease symptoms, and for prediction of temporal trends in disease propagation. Violation of model assumptions is another source of complexity that necessitates the use of nonparametric techniques such as the resampling methods. In many matched-pairs design situations, a mixture of complete and incomplete pairs of data are available. Rather than ignoring data from incomplete pairs, we will train students in methods designed for analyzing such data. In another project, students will be trained in the subdata selection techniques which are very helpful in dealing with data of enormous size. This project will address questions of the type (1) what size should the subdata have to ensure a reliable analysis; and (2) for a given size, how should the subdata be selected? The overarching goal in all of these projects will be to train students in recognizing various complexities in data and finding the right techniques to handle such data.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
在这个大数据时代,技术创新使得以低成本收集大量数据成为可能。数百千兆字节、太字节甚至拍字节的大小不再罕见。对这些数据进行适当的处理和分析是对未来一代毕业生的重要培训。考虑到这一目标,拟议的REU计划旨在在2020-2022年夏季为8(8)名高度积极的全国数学科学本科生提供为期10周的“使用统计和机器学习工具进行复杂数据分析”的复杂培训。教师导师为这个培训项目带来了丰富多样的经验。PI Gupta, Co-PI Gao, Senior Personnel Richter和Stufken是统计学家;高级人事Mohanty是专门从事机器学习工具的计算机科学教师;孙是数学与统计系的统计遗传学家。该培训计划将激励学生参与者,特别是那些来自代表性不足的少数民族的学生,继续攻读数学科学研究生课程,成为能够处理社会数据分析需求的受过更好训练的专业人士。作为更广泛的专业培训的一部分,学生将前往北卡罗来纳州的主要研究中心,如SAS, SAMSI(统计与应用数学科学研究所)和纳米科学与纳米工程联合学院。我们期望作为培训的一部分完成的研究将是高质量的,并将导致期刊文章和会议报告。数据的复杂性可能以多种方式出现。数据的高维是这样一种复杂性,其中变量的数量与数据大小相比可能相对较大。数据污染是另一种类型的复杂性。学生将学习同时处理降维和异常值检测的艺术,作为其中一个项目的一部分。添加噪声的数据(在公开发布之前为数据创建机密性)是另一种类型的复杂性。研究人员只能访问加密数据,而不能访问真实数据,这种情况正变得越来越普遍。在其中一个项目中,我们将讨论为什么以及如何使用随机响应模型对数据进行加扰和反加扰,保留聚合级别属性并确保受访者的匿名性。REU项目的机器学习部分将专注于利用复杂工具的能力,如无监督和监督机器学习,以及深度学习,以识别与疾病症状相关的社交媒体帖子,并预测疾病传播的时间趋势。违反模型假设是复杂性的另一个来源,需要使用非参数技术,如重采样方法。在许多匹配对设计情况下,可以使用完整和不完整数据对的混合。我们不会忽略来自不完全对的数据,而是会训练学生分析这些数据的方法。在另一个项目中,学生将接受子数据选择技术的培训,这对处理海量数据非常有帮助。该项目将解决以下问题:(1)子数据应该有多大才能确保可靠的分析;(2)对于给定的大小,子数据应该如何选择?所有这些项目的首要目标是训练学生认识到数据中的各种复杂性,并找到处理这些数据的正确技术。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Sat Gupta其他文献

Accounting for Lack of Trust in Optional Binary RRT Models Using a Unified Measure of Privacy and Efficiency
A generalized estimator for finite population mean in the presence of measurement errors in stratified random sampling
Life and Work of C. R. Rao
C. R. Rao 的生活和工作
Improved ratio-type estimators using stratified double-ranked set sampling
Avoiding surgical site infections in neurosurgical procedures
避免神经外科手术中的手术部位感染
  • DOI:
  • 发表时间:
    2015
  • 期刊:
  • 影响因子:
    0
  • 作者:
    Jennifer L. Fencl;F. Wood;Sat Gupta;Vangela Swofford;M. Morgan;D. Green
  • 通讯作者:
    D. Green

Sat Gupta的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Sat Gupta', 18)}}的其他基金

REU Site: University of North Carolina at Greensboro - Complex Data Analysis using Statistical and Machine Learning Tools
REU 站点:北卡罗来纳大学格林斯伯勒分校 - 使用统计和机器学习工具进行复杂数据分析
  • 批准号:
    2244160
  • 财政年份:
    2023
  • 资助金额:
    $ 32.4万
  • 项目类别:
    Standard Grant
Advances in Interdisciplinary Statistics and Combinatorics, October 10-12, 2014
跨学科统计和组合学进展,2014 年 10 月 10-12 日
  • 批准号:
    1417056
  • 财政年份:
    2014
  • 资助金额:
    $ 32.4万
  • 项目类别:
    Standard Grant
International Conference on Advances in Interdisciplinary Statistics and Combinatorics
跨学科统计和组合学进展国际会议
  • 批准号:
    1212830
  • 财政年份:
    2012
  • 资助金额:
    $ 32.4万
  • 项目类别:
    Standard Grant
International Conference on Advances in Interdisciplinary Statistics and Combinatorics
跨学科统计和组合学进展国际会议
  • 批准号:
    0726015
  • 财政年份:
    2007
  • 资助金额:
    $ 32.4万
  • 项目类别:
    Standard Grant

相似国自然基金

新型WDR5蛋白Win site抑制剂的合理设计、合成及其抗肿瘤活性研究
  • 批准号:
  • 批准年份:
    2021
  • 资助金额:
    30 万元
  • 项目类别:
    青年科学基金项目
具有共形结构的高性能Ta4SiTe4基有机/无机复合柔性热电薄膜
  • 批准号:
    52172255
  • 批准年份:
    2021
  • 资助金额:
    58 万元
  • 项目类别:
    面上项目
基于重要农地保护LESA(Land Evaluation and Site Assessment)体系思想的高标准基本农田建设研究
  • 批准号:
    41340011
  • 批准年份:
    2013
  • 资助金额:
    20.0 万元
  • 项目类别:
    专项基金项目

相似海外基金

Collaborative Research: REU Site: Earth and Planetary Science and Astrophysics REU at the American Museum of Natural History in Collaboration with the City University of New York
合作研究:REU 地点:地球与行星科学和天体物理学 REU 与纽约市立大学合作,位于美国自然历史博物馆
  • 批准号:
    2348998
  • 财政年份:
    2025
  • 资助金额:
    $ 32.4万
  • 项目类别:
    Standard Grant
Collaborative Research: REU Site: Earth and Planetary Science and Astrophysics REU at the American Museum of Natural History in Collaboration with the City University of New York
合作研究:REU 地点:地球与行星科学和天体物理学 REU 与纽约市立大学合作,位于美国自然历史博物馆
  • 批准号:
    2348999
  • 财政年份:
    2025
  • 资助金额:
    $ 32.4万
  • 项目类别:
    Standard Grant
REU Site: University of Colorado, Engineering Smart Biomaterials
REU 站点:科罗拉多大学,工程智能生物材料
  • 批准号:
    2348856
  • 财政年份:
    2024
  • 资助金额:
    $ 32.4万
  • 项目类别:
    Standard Grant
REU Site: Research Experience in Functional Materials for Undergraduates in Chemistry at the University of South Florida
REU 网站:南佛罗里达大学化学专业本科生功能材料研究经验
  • 批准号:
    2349085
  • 财政年份:
    2024
  • 资助金额:
    $ 32.4万
  • 项目类别:
    Standard Grant
REU Site: Research Experiences for Undergraduates in Physics and Astronomy at the University of Toledo
REU 网站:托莱多大学物理和天文学本科生的研究经验
  • 批准号:
    2349585
  • 财政年份:
    2024
  • 资助金额:
    $ 32.4万
  • 项目类别:
    Continuing Grant
REU Site: Research Experiences for Undergraduates in Algebra and Discrete Mathematics at Auburn University
REU 网站:奥本大学代数和离散数学本科生的研究经验
  • 批准号:
    2349684
  • 财政年份:
    2024
  • 资助金额:
    $ 32.4万
  • 项目类别:
    Continuing Grant
REU Site: Research in Symmetries at the University of Kentucky
REU 网站:肯塔基大学对称性研究
  • 批准号:
    2349261
  • 财政年份:
    2024
  • 资助金额:
    $ 32.4万
  • 项目类别:
    Continuing Grant
REU Site: Undergraduate Mathematical Science Research at James Madison University
REU 网站:詹姆斯麦迪逊大学本科生数学科学研究
  • 批准号:
    2349593
  • 财政年份:
    2024
  • 资助金额:
    $ 32.4万
  • 项目类别:
    Standard Grant
REU Site: Nevis Laboratories Columbia University for Summers 2024-2026
REU 网站:哥伦比亚大学尼维斯实验室 2024-2026 年暑假
  • 批准号:
    2349438
  • 财政年份:
    2024
  • 资助金额:
    $ 32.4万
  • 项目类别:
    Continuing Grant
REU Site: Summer Undergraduate Research in Chemistry and Biochemistry at Miami University
REU 网站:迈阿密大学化学与生物化学暑期本科生研究
  • 批准号:
    2349468
  • 财政年份:
    2024
  • 资助金额:
    $ 32.4万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了