权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Beyond Clustering: Unsupervised Modeling with Complex Representations

超越聚类：具有复杂表示的无监督建模

基本信息

批准号：
EP/E042694/1
负责人：
Katherine Heller
金额：
$ 29.72万
依托单位：
University of Cambridge
依托单位国家：
英国
项目类别：
Fellowship
财政年份：
2008
资助国家：
英国
起止时间：
2008 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=EP%2FE042694%2F1
关键词：
Beyond Clustering Unsupervised Modeling Complex

项目摘要

The field of Machine Learning strives to develop new theory and algorithms that improve the ability of computers to recognize patterns, make autonomous decisions, and make predictions based on data. New advances in Machine Learning have broad impact in other scientific fields, in commerce, and in the daily lives of individuals. For example, they can help neuroscientists analyze high-dimensional brain imaging data, improve online product recommendation systems, or help individuals automatically organize their digital photo albums.Clustering is an important unsupervised Machine Learning tool for a variety of problems. Abstractly, clustering is discovering groups of data points that belong together. As an example, if given the task of clustering animals, one might group them together by type (mammals, reptiles, amphibians), or alternatively by size (small or large). Automated clustering tools have been used to cluster gene expression data in order to elucidate gene function, automatically group news articles on the web by topic, automatically categorize music by genre, and spatio-temporally cluster climate data to improve climate prediction.While clustering is a wonderful tool for many applications, it is actually quite limited. In many situations the data being modeled can have a much richer and more complex hidden representation than the simple assignment of each data point to a cluster. For example, data points can actually belong to multiple clusters simultaneously (e.g. the movie Scream could belong to both the horror movie cluster and the comedy cluster). The hidden representation of the data could be structured, for example sentences can be represented by parse trees. The data being modeled might have multiple latent features (like images which can contain multiple objects). Moreover, the total number of latent features might not be known, and therefore should not be specified or limited a priori. This flexibility is provided by the use of nonparametric Bayesian methods, which will play a fundamental role in this proposal.My main goal is to advance the state-of-the-art for unsupervised machine learning, by developing principled, theoretically sound, probabilistic models and algorithms, which extend a clustering paradigm to problems which need richer representations. These richer and more complex representations for data provide the ability to model data well in the many situations in which clustering is not good enough. In addition to advancing the theory, I will also develop efficient learning and inference algorithms for the probabilistic models that use these representations.The starting point for much of my work will be nonparametric Bayesian methods, and in particular, the Indian Buffet Process (IBP). Nonparametric methods are designed to be very flexible, and can model data better than inflexible models with a fixed number of parameters. My methods will be able to automatically infer the correct model size (number of parameters) from the data. I will focus on six specific new contributions to unsupervised machine learning. First, I will develop probabilistic models in which each data point can simultaneously belong to multiple overlapping clusters. Second, I will extend the clustering-on-demand paradigm to relational data creating a method that will enable computers to perform simple forms of analogical reasoning. Third, I will develop efficient methods for learning and inference in IBPs. Fourth, using the IBP I will create a new approach to Independent Components Analysis (a widely-used signal processing method) making it possible to automatically learn the number of components in a signal. Fifth, I will develop new probabilistic unsupervised methods for computers to transfer what they have learned on one task to other tasks. Finally, I will explore new uses of advanced probability theory and stochastic processes in the design of practical nonparametric machine learning methods.

机器学习领域致力于开发新的理论和算法，以提高计算机识别模式、做出自主决策和基于数据做出预测的能力。机器学习的新进展对其他科学领域、商业和个人的日常生活产生了广泛的影响。例如，它们可以帮助神经科学家分析高维脑成像数据，改进在线产品推荐系统，或者帮助个人自动组织他们的数字相册。聚类是解决各种问题的一种重要的无监督机器学习工具。抽象地说，聚类是发现属于一起的数据点组。举个例子，如果给动物分类的任务，人们可能会按类型（哺乳动物、爬行动物、两栖动物）或大小（小或大）将它们分组。自动聚类工具已被用于基因表达数据的聚类以阐明基因功能，自动按主题对网络新闻文章进行分组，自动按类型对音乐进行分类，自动对气候数据进行时空聚类以提高气候预测。虽然集群对于许多应用程序来说是一个很好的工具，但它实际上是非常有限的。在许多情况下，与将每个数据点简单地分配给集群相比，被建模的数据可能具有更丰富、更复杂的隐藏表示。例如，数据点实际上可以同时属于多个集群（例如，电影《惊声尖叫》可以同时属于恐怖电影集群和喜剧集群）。数据的隐藏表示可以是结构化的，例如句子可以用解析树表示。被建模的数据可能有多个潜在特征（比如可以包含多个对象的图像）。此外，潜在特征的总数可能是未知的，因此不应该指定或限制先验。这种灵活性是由使用非参数贝叶斯方法提供的，这将在本提案中发挥基本作用。我的主要目标是通过开发有原则的、理论上合理的概率模型和算法，将聚类范式扩展到需要更丰富表示的问题，来推进无监督机器学习的最新技术。这些更丰富、更复杂的数据表示提供了在许多聚类不够好的情况下对数据进行良好建模的能力。除了推进理论，我还将为使用这些表示的概率模型开发有效的学习和推理算法。我的大部分工作的起点将是非参数贝叶斯方法，特别是印度自助餐过程（IBP）。非参数方法被设计得非常灵活，可以比具有固定数量参数的不灵活的模型更好地建模数据。我的方法将能够从数据中自动推断出正确的模型大小（参数数量）。我将重点介绍对无监督机器学习的六个具体新贡献。首先，我将开发概率模型，其中每个数据点可以同时属于多个重叠的聚类。其次，我将把按需集群范式扩展到关系数据，创建一种方法，使计算机能够执行简单形式的类比推理。第三，我将开发ibp学习和推理的有效方法。第四，使用IBP，我将创造一种独立分量分析（一种广泛使用的信号处理方法）的新方法，使自动学习信号中分量的数量成为可能。第五，我将开发新的概率无监督方法，让计算机将它们从一个任务中学到的知识转移到其他任务。最后，我将探索高级概率论和随机过程在实际非参数机器学习方法设计中的新用途。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Katherine Heller其他文献

OTC Product: BioSafe Diabetes Risk Assessment

DOI：
10.1331/japha.2008.08529
发表时间：
2008-07-01
期刊：
Research article
影响因子：
作者：
Katherine Heller
通讯作者：
Katherine Heller

Performance of machine learning models for predicting high-severity symptoms in multiple sclerosis

DOI：
10.1038/s41598-024-63888-x
发表时间：
2025-05-25
期刊：
Scientific Reports
影响因子：
3.900
作者：
Subhrajit Roy;Diana Mincu;Lev Proleev;Chintan Ghate;Jennifer S. Graves;David F. Steiner;Fletcher Lee Hartsell;Katherine Heller
通讯作者：
Katherine Heller

OTC Product: SinuCleanse for Rhinosinusitis

DOI：
10.1331/154434506775268607
发表时间：
2006-01-01
期刊：
Research article
影响因子：
作者：
Katherine Heller
通讯作者：
Katherine Heller

Evaluating the Usability and Impact of an Artificial Intelligence-Powered Clinical Decision Support System for Depression Treatment

DOI：
10.1016/j.biopsych.2020.02.451
发表时间：
2020-05-01
期刊：
Conference abstract
影响因子：
作者：
Myriam Tanguay-Sela;David Benrimoh;Kelly Perlman;Sonia Israel;Joseph Mehltretter;Caitrin Armstrong;Robert Fratila;Sagar Parikh;Jordan Karp;Katherine Heller;Ipsit Vahia;Daniel Blumberger;Sherif Karama;Simone Vigod;Gail Myhr;Ruben Martins;Colleen Rollins;Christina Popescu;Eryn Lundrigan;Emily Snook
通讯作者：
Emily Snook

The Case for Globalizing Fairness: A Mixed Methods Study on Colonialism, AI, and Health in Africa

全球化公平案例：关于非洲殖民主义、人工智能和健康的混合方法研究

DOI：
发表时间：
2024
期刊：
arXiv.org
影响因子：
0
作者：
M. Asiedu;Awa Dieng;Alexander Haykel;Negar Rostamzadeh;Stephen R. Pfohl;Chirag Nagpal;Maria Nagawa;Abigail Oppong;Sanmi Koyejo;Katherine Heller
通讯作者：
Katherine Heller