权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

A Statistical Study on Bayes Statistics and Ensemble Learning

贝叶斯统计与集成学习的统计研究

基本信息

批准号：
14084210
负责人：
MURATA Noboru
金额：
$ 4.48万
依托单位：
Waseda University
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research on Priority Areas
财政年份：
2002
资助国家：
日本
起止时间：
2002 至 2005
项目状态：
已结题

项目摘要

In order to study boosting algorithms, we consider the structure of a space of general learning models which is naturally introduced by Bregman divergence. Statistical properties such as robustness against noises and outliers, asymptotic efficiency depending on the size of training samples and learning models, and Bayes optimality and consistency of convex functions which induce Bregman divergences, are discussed and clarified. Based on the above consideration, we have proposed a new generic class of boosting algorithms, which is called "U-Boost".Moreover, extending boosting algorithms to the density estimation, we have proposed an algorithm for regression problems. In the algorithm, Gaussian processes in reproducing kernel Hilbert spaces are used as regressors, and estimating functions based on Bregman divergences are utilized for inference.In our study, a close relationship between boosting algorithm and support vector machines has been exposed, therefore we have also studied on the generalization errors of support vector machines from an algebraic and geometrical viewpoint.For practical applications, we have coped with the following problems.In order to avoid an explosion of the number of parameters, which frequently occurs in estimating a huge probability table of graphical models and Bayesian networks, we have constructed a mixture model based on the concept of ensemble learning. The model consists of simple tables and has rather good generalization errors. We discussed an estimation algorithm of the model, which is an extension of the EM algorithm from a viewpoint of information geometry.We also worked on constructing an on-line algorithm for boosting, in order to apply the boosting to learning problems such as reinforcement learning, in which plenty of data are observed one after another. We have considered methods for reconstructing the objective function from sequentially obtained data, and compared with ordinary off-line boosting algorithms.

为了研究提升算法，我们考虑了由Bregman发散自然引入的一般学习模型空间的结构。统计性质，如对噪声和离群值的鲁棒性，渐近效率取决于训练样本和学习模型的大小，贝叶斯最优性和一致性的凸函数诱导Bregman分歧，进行了讨论和澄清。基于上述考虑，我们提出了一类新的Boosting算法，称为“U-Boost”，并将Boosting算法推广到密度估计，提出了一种回归问题的算法。该算法以再生核Hilbert空间中的高斯过程为回归变量，利用基于Bregman发散的估计函数进行推理，揭示了Boosting算法与支持向量机的密切关系，并从代数和几何的角度研究了支持向量机的泛化误差。为了避免在估计图模型和贝叶斯网络的巨大概率表时经常出现的参数爆炸，我们基于集成学习的概念构造了一个混合模型。该模型由简单的表格组成，具有较好的泛化误差。从信息几何的角度讨论了该模型的估计算法，该算法是EM算法的扩展，为了将Boosting算法应用于强化学习等需要连续观测大量数据的学习问题，我们还构造了一种在线Boosting算法。我们已经考虑的方法重建的目标函数从顺序获得的数据，并与普通的离线升压算法。