权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Testing and Deep Learning for Functional Data

功能数据的测试和深度学习

基本信息

批准号：
2210891
负责人：
Jane-Ling Wang
金额：
$ 30万
依托单位：
University of California-Davis
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2022
资助国家：
美国
起止时间：
2022-07-01 至 2025-06-30
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2210891&HistoricalAwards=false
关键词：
Testing Deep Learning Functional Data

项目摘要

The proposed research involves two distinctive fields, functional data analysis (FDA) and deep learning. Functional data are random functions, which have become increasingly common due to technological advances to handle massive data. Examples include climate or air pollution data collected over a period of time. The field has emerged as a mainstream research area, but the literature is mainly focused on estimation problems and has not yet leveraged the advantages of deep learning methods. This project aims to fill these gaps. It includes several new tests for functional data and employs deep learning, instead of the conventional nonparametric smoothing methods, to handle functional data. The proposed approaches will be applied to various functional data, including evaluating the effect of pollutants on lung cancer mortality and explaining the effects of physical activity on health. A major emphasis is the development of new theory and algorithms. Computer code associated with the research will be publicly disseminated as R- or Python packages. The research findings will be incorporated in graduate curricula, undergraduate research projects, and short courses at workshops. They will also be presented at professional meetings. Student researchers will receive training in research, computing and communication skills.Although functional data are intrinsically infinite dimensional, measurements are only available at discrete locations, which may vary from subject to subject. The number of measurement locations per subject can be small (sparse functional data) or grow with the sample size (intensely sampled functional data). The proposed research covers all types of sampling plans and employs, whenever feasible, a single platform that is universally applicable. Such an approach is important as it is not trivial to judge whether the sampling plan for a particular dataset is intense or sparse. It also has the merit that the theory is unified and automatically reveals the phase transitions of the convergence rates of the corresponding estimators. Project 1 (Hypothesis Testing for Functional Linear Models) aims at developing a general framework for hypothesis testing under the setting of functional linear models. Existing methods focus on testing a specific null hypothesis using a tailored test and are not well suited for testing the temporal duration of the effect of a functional covariate, such as the impact of PM2.5 on lung cancer. None of them has been shown to be optimal for a composite null hypothesis. We propose a single platform to test the null hypothesis that the regression coefficient of a functional covariate resides in a closed subspace of all possible coefficient functions. The proposed test, which resembles the classical F-test, is simple and includes tests for global nullity, partial nullity and domain of the coefficient function as special cases. Project 2 (Testing Homogeneity and Independence for Functional Data) addresses the challenges of two fundamental tasks, testing the homogeneity (equal distributions) and independence of functional data. Such tests are infeasible when the functional process can only be sampled at a few discrete locations, a situation that is ubiquitous in longitudinal studies. For each task, we propose a customized version, marginal homogeneity or marginal independence, that has practical implications and is feasible for theory and implementation. Project 3 (Deep learning for Functional Data) aims at bringing the success of deep learning to bear with functional data. Surprisingly, the application of deep neural networks to functional data has been scarce and remains an open problem. A recent approach, developed by a team led by the PI, uses neural networks to search for the optimal basis functions to represent a functional input that automatically adapts to the prediction task in hand. We propose to expand the reach and theoretical understanding of this adaptive basis approach. Another objective is to design new methodology to impute partially observed functional data that uses Transformers, a deep neural network that transforms a given sequence of elements, such as the sequence of words in a sentence, into another sequence. The project will offer a broad range of new opportunities for interdisciplinary training of a future generation of statisticians and will contribute to enhancing a more inclusive atmosphere in statistical sciences.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

拟议的研究涉及两个不同的领域，功能数据分析(FDA)和深度学习。函数数据是随机函数，由于处理海量数据的技术进步，这种函数已变得越来越常见。例如，在一段时间内收集的气候或空气污染数据。该领域已经成为一个主流的研究领域，但文献主要集中在估计问题上，还没有充分利用深度学习方法的优势。该项目旨在填补这些空白。它包括几个新的函数数据测试，并使用深度学习来处理函数数据，而不是传统的非参数平滑方法。建议的方法将应用于各种功能数据，包括评估污染物对肺癌死亡率的影响，以及解释体力活动对健康的影响。一个主要的重点是新理论和算法的发展。与这项研究相关的计算机代码将以R-或Python包的形式公开传播。研究成果将被纳入研究生课程、本科生研究项目和研讨会的短期课程。它们还将在专业会议上介绍。学生研究人员将接受研究、计算和沟通技能方面的培训。尽管功能数据本质上是无限维的，但测量只能在离散的位置进行，这可能会因主题而异。每个受试者的测量位置的数量可以很少(稀疏功能数据)，也可以随着样本大小而增加(密集采样的功能数据)。拟议的研究涵盖所有类型的抽样计划，并在可行的情况下采用一个普遍适用的单一平台。这种方法很重要，因为判断特定数据集的抽样计划是密集的还是稀疏的并不是微不足道的。它还具有理论统一、自动揭示相应估计器收敛速度的相变的优点。项目1(功能线性模型的假设检验)旨在开发一个在功能线性模型设置下进行假设检验的通用框架。现有的方法侧重于使用定制的测试来检验特定的零假设，并且不能很好地适合于检验功能性协变量的效应的时间持续时间，例如PM2.5对肺癌的影响。对于复合零假设，它们中没有一个被证明是最佳的。我们提出了一个单一的平台来检验零假设，即函数协变量的回归系数位于所有可能的系数函数的闭子空间中。所提出的检验方法类似于经典的F检验，非常简单，并且包括了系数函数的整体零性、部分零性和定义域的检验作为特例。项目2(测试功能数据的同质性和独立性)解决了两项基本任务的挑战，即测试功能数据的同质性(均匀分布)和独立性。当功能过程只能在几个离散的位置进行抽样时，这种测试是不可行的，这种情况在纵向研究中普遍存在。对于每个任务，我们都提出了一个定制的版本，边际同质性或边际独立性，具有实际意义，在理论和实施上都是可行的。项目3(功能数据的深度学习)旨在将深度学习的成功应用于功能数据。令人惊讶的是，深度神经网络在功能性数据中的应用一直很少，而且仍然是一个悬而未决的问题。PI领导的一个团队最近开发了一种方法，使用神经网络搜索最优基函数，以表示自动适应手头预测任务的函数输入。我们建议扩大这种适应性基础方法的范围和理论理解。另一个目标是设计新的方法，使用Transformers来输入部分观察到的功能数据，Transformers是一种深度神经网络，将给定的元素序列(如句子中的单词序列)转换为另一序列。该项目将为未来一代统计学家的跨学科培训提供广泛的新机会，并将有助于加强统计科学中更具包容性的氛围。该奖项反映了NSF的法定使命，并通过使用基金会的智力优势和更广泛的影响审查标准进行评估，被认为值得支持。