权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Sparse linear models: Their existence and stability

稀疏线性模型：它们的存在性和稳定性

基本信息

批准号：
EP/W011905/1
负责人：
Joab Winkler
金额：
$ 10.12万
依托单位：
University of Sheffield
依托单位国家：
英国
项目类别：
Research Grant
财政年份：
2022
资助国家：
英国
起止时间：
2022 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FW011905%2F1
关键词：
Sparse linear models Their existence

项目摘要

Large quantities of data are gathered in many domains of life, including the medical, financial, retail, industrial and social media domains. This data must be analysed such that its properties can be extracted and the underlying system understood. It is necessary to distinguish between the quantity of data, which may be large, and the information contained in the data, which may allow it to be represented, with acceptable accuracy, by a simple model. This simple model captures fundamental properties of the system, such that it can be used for the determination of the response of the system on new (unseen) data. An example of a simple model is a low degree polynomial, but this proposal considers a sparse model, which is another example of a simple model. A sparse model of a system is a model in which the dominant input variables (predictors) that determine the output, rather than all the input variables, are identified. Genomics provides an example of a sparse model because there are about 30,000 genes in the human body, but not all genes are associated directly with cancer. It is therefore desirable to identify the genes that are most directly associated with cancer, such that treatment is focused on the dominant contributory factors, rather than factors whose role in the cause of cancer is minor. Sparsity of the solution x of the linear algebraic equation Ax=b is imposed by regularisation in the 1-norm (the lasso). This is different from regularisation in the 2-norm (Tikhonov regularisation), which imposes stability on x. The lasso is not understood as well as Tikhonov regularisation because of the absence of a 1-norm matrix decomposition, but fundamental properties of a regularised solution of Ax=b are independent of the norm in which the regularisation is imposed. For example, a regularised solution in both norms must be stable and the error between it and the exact solution must be small. This proposal considers these properties of a regularised solution when regularisation by the lasso is used.Computations on sparse models are, in general, simpler and faster than computations on exact dense models, which is advantageous, and important theoretical issues that must be addressed are considered in this proposal. A sparse model is an approximation of an exact dense model and there is therefore an error associated with a sparse model. A good sparse model is a model in which this error is small, and this sparse model is accepted because this small error is balanced by the greater physical insight allowed by a sparse model. Furthermore, a sparse model must be computationally reliable such that results derived from it are numerically stable, and thus a good sparse model must have a small error and be stable. It cannot, however, be assumed that all inputs yield an approximate input-output relationship that is sparse, stable and has a small error. It is therefore necessary to establish the class of inputs for which these properties are, and are not, satisfied. It follows that there are many issues to be considered before a sparse model can be used with confidence of its correctness. This proposal addresses these issues and it will include theoretical results and computational experiments. The benefits of the proposed research extend to the many areas in which a sparse model is used to model an input-output relationship. These applications include the medical, financial, retail, industrial and social media domains (as stated above). Apart from the computational advantages of a sparse model (stated above), the desirability of a sparse model follows from its simplicity, and it is therefore easier to obtain a physical understanding of the input-output relationship of the system.

生活中的许多领域都收集了大量数据，包括医疗、金融、零售、工业和社交媒体领域。必须对这些数据进行分析，以便提取其属性并了解其基础系统。有必要区分数据量（可能很大）和数据中所含的信息（可以用一个简单的模型以可接受的准确度表示）。这个简单的模型捕获了系统的基本属性，因此它可以用于确定系统对新（未见过的）数据的响应。简单模型的一个例子是低次多项式，但该建议考虑稀疏模型，这是简单模型的另一个例子。系统的稀疏模型是一种模型，其中确定输出的主要输入变量（预测变量），而不是所有输入变量。基因组学提供了一个稀疏模型的例子，因为人体中大约有30，000个基因，但并非所有基因都与癌症直接相关。因此，需要鉴定与癌症最直接相关的基因，使得治疗集中于主要的促成因素，而不是在癌症原因中作用较小的因素。线性代数方程Ax=B的解x的稀疏性由1-范数的正则化（套索）施加。这与2-范数正则化（Tikhonov正则化）不同，后者对x施加了稳定性。由于没有1-范数矩阵分解，套索不像吉洪诺夫正则化那样被理解，但是Ax=B的正则化解的基本性质与施加正则化的范数无关。例如，在两个范数下的正则化解必须是稳定的，并且它与精确解之间的误差必须很小。这个建议考虑这些属性的正则化的解决方案时，正则化的套索used.Computations稀疏模型，在一般情况下，更简单，更快的计算比精确密集的模型，这是有利的，重要的理论问题，必须解决的考虑在这个建议。稀疏模型是精确密集模型的近似，因此存在与稀疏模型相关联的误差。一个好的稀疏模型是一个模型，在这个模型中，这个误差是小的，这个稀疏模型是可以接受的，因为这个小误差是由稀疏模型所允许的更大的物理洞察力来平衡的。此外，稀疏模型必须在计算上可靠，使得从其导出的结果在数值上稳定，并且因此好的稀疏模型必须具有小的误差并且稳定。然而，不能假定所有投入都产生一种稀疏、稳定和误差小的近似投入产出关系。因此，有必要确定满足和不满足这些性质的输入类别。因此，在可以使用稀疏模型并确信其正确性之前，需要考虑许多问题。这个建议解决了这些问题，它将包括理论结果和计算实验。所提出的研究的好处扩展到许多领域，其中稀疏模型是用来模拟输入输出关系。这些应用包括医疗、金融、零售、工业和社交媒体领域（如上所述）。除了稀疏模型的计算优势（如上所述），稀疏模型的可取之处在于它的简单性，因此更容易获得对系统输入-输出关系的物理理解。