Beyond Clustering: Unsupervised Modeling with Complex Representations
超越聚类:具有复杂表示的无监督建模
基本信息
- 批准号:EP/E042694/1
- 负责人:
- 金额:$ 29.72万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Fellowship
- 财政年份:2008
- 资助国家:英国
- 起止时间:2008 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
The field of Machine Learning strives to develop new theory and algorithms that improve the ability of computers to recognize patterns, make autonomous decisions, and make predictions based on data. New advances in Machine Learning have broad impact in other scientific fields, in commerce, and in the daily lives of individuals. For example, they can help neuroscientists analyze high-dimensional brain imaging data, improve online product recommendation systems, or help individuals automatically organize their digital photo albums.Clustering is an important unsupervised Machine Learning tool for a variety of problems. Abstractly, clustering is discovering groups of data points that belong together. As an example, if given the task of clustering animals, one might group them together by type (mammals, reptiles, amphibians), or alternatively by size (small or large). Automated clustering tools have been used to cluster gene expression data in order to elucidate gene function, automatically group news articles on the web by topic, automatically categorize music by genre, and spatio-temporally cluster climate data to improve climate prediction.While clustering is a wonderful tool for many applications, it is actually quite limited. In many situations the data being modeled can have a much richer and more complex hidden representation than the simple assignment of each data point to a cluster. For example, data points can actually belong to multiple clusters simultaneously (e.g. the movie Scream could belong to both the horror movie cluster and the comedy cluster). The hidden representation of the data could be structured, for example sentences can be represented by parse trees. The data being modeled might have multiple latent features (like images which can contain multiple objects). Moreover, the total number of latent features might not be known, and therefore should not be specified or limited a priori. This flexibility is provided by the use of nonparametric Bayesian methods, which will play a fundamental role in this proposal.My main goal is to advance the state-of-the-art for unsupervised machine learning, by developing principled, theoretically sound, probabilistic models and algorithms, which extend a clustering paradigm to problems which need richer representations. These richer and more complex representations for data provide the ability to model data well in the many situations in which clustering is not good enough. In addition to advancing the theory, I will also develop efficient learning and inference algorithms for the probabilistic models that use these representations.The starting point for much of my work will be nonparametric Bayesian methods, and in particular, the Indian Buffet Process (IBP). Nonparametric methods are designed to be very flexible, and can model data better than inflexible models with a fixed number of parameters. My methods will be able to automatically infer the correct model size (number of parameters) from the data. I will focus on six specific new contributions to unsupervised machine learning. First, I will develop probabilistic models in which each data point can simultaneously belong to multiple overlapping clusters. Second, I will extend the clustering-on-demand paradigm to relational data creating a method that will enable computers to perform simple forms of analogical reasoning. Third, I will develop efficient methods for learning and inference in IBPs. Fourth, using the IBP I will create a new approach to Independent Components Analysis (a widely-used signal processing method) making it possible to automatically learn the number of components in a signal. Fifth, I will develop new probabilistic unsupervised methods for computers to transfer what they have learned on one task to other tasks. Finally, I will explore new uses of advanced probability theory and stochastic processes in the design of practical nonparametric machine learning methods.
机器学习领域致力于开发新的理论和算法,以提高计算机识别模式、做出自主决策和基于数据做出预测的能力。机器学习的新进展对其他科学领域、商业和个人的日常生活产生了广泛的影响。例如,它们可以帮助神经科学家分析高维脑成像数据,改进在线产品推荐系统,或者帮助个人自动组织他们的数字相册。聚类是解决各种问题的一种重要的无监督机器学习工具。抽象地说,聚类是发现属于一起的数据点组。举个例子,如果给动物分类的任务,人们可能会按类型(哺乳动物、爬行动物、两栖动物)或大小(小或大)将它们分组。自动聚类工具已被用于基因表达数据的聚类以阐明基因功能,自动按主题对网络新闻文章进行分组,自动按类型对音乐进行分类,自动对气候数据进行时空聚类以提高气候预测。虽然集群对于许多应用程序来说是一个很好的工具,但它实际上是非常有限的。在许多情况下,与将每个数据点简单地分配给集群相比,被建模的数据可能具有更丰富、更复杂的隐藏表示。例如,数据点实际上可以同时属于多个集群(例如,电影《惊声尖叫》可以同时属于恐怖电影集群和喜剧集群)。数据的隐藏表示可以是结构化的,例如句子可以用解析树表示。被建模的数据可能有多个潜在特征(比如可以包含多个对象的图像)。此外,潜在特征的总数可能是未知的,因此不应该指定或限制先验。这种灵活性是由使用非参数贝叶斯方法提供的,这将在本提案中发挥基本作用。我的主要目标是通过开发有原则的、理论上合理的概率模型和算法,将聚类范式扩展到需要更丰富表示的问题,来推进无监督机器学习的最新技术。这些更丰富、更复杂的数据表示提供了在许多聚类不够好的情况下对数据进行良好建模的能力。除了推进理论,我还将为使用这些表示的概率模型开发有效的学习和推理算法。我的大部分工作的起点将是非参数贝叶斯方法,特别是印度自助餐过程(IBP)。非参数方法被设计得非常灵活,可以比具有固定数量参数的不灵活的模型更好地建模数据。我的方法将能够从数据中自动推断出正确的模型大小(参数数量)。我将重点介绍对无监督机器学习的六个具体新贡献。首先,我将开发概率模型,其中每个数据点可以同时属于多个重叠的聚类。其次,我将把按需集群范式扩展到关系数据,创建一种方法,使计算机能够执行简单形式的类比推理。第三,我将开发ibp学习和推理的有效方法。第四,使用IBP,我将创造一种独立分量分析(一种广泛使用的信号处理方法)的新方法,使自动学习信号中分量的数量成为可能。第五,我将开发新的概率无监督方法,让计算机将它们从一个任务中学到的知识转移到其他任务。最后,我将探索高级概率论和随机过程在实际非参数机器学习方法设计中的新用途。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Katherine Heller其他文献
OTC Product: BioSafe Diabetes Risk Assessment
- DOI:
10.1331/japha.2008.08529 - 发表时间:
2008-07-01 - 期刊:
- 影响因子:
- 作者:
Katherine Heller - 通讯作者:
Katherine Heller
Performance of machine learning models for predicting high-severity symptoms in multiple sclerosis
- DOI:
10.1038/s41598-024-63888-x - 发表时间:
2025-05-25 - 期刊:
- 影响因子:3.900
- 作者:
Subhrajit Roy;Diana Mincu;Lev Proleev;Chintan Ghate;Jennifer S. Graves;David F. Steiner;Fletcher Lee Hartsell;Katherine Heller - 通讯作者:
Katherine Heller
OTC Product: SinuCleanse for Rhinosinusitis
- DOI:
10.1331/154434506775268607 - 发表时间:
2006-01-01 - 期刊:
- 影响因子:
- 作者:
Katherine Heller - 通讯作者:
Katherine Heller
Evaluating the Usability and Impact of an Artificial Intelligence-Powered Clinical Decision Support System for Depression Treatment
- DOI:
10.1016/j.biopsych.2020.02.451 - 发表时间:
2020-05-01 - 期刊:
- 影响因子:
- 作者:
Myriam Tanguay-Sela;David Benrimoh;Kelly Perlman;Sonia Israel;Joseph Mehltretter;Caitrin Armstrong;Robert Fratila;Sagar Parikh;Jordan Karp;Katherine Heller;Ipsit Vahia;Daniel Blumberger;Sherif Karama;Simone Vigod;Gail Myhr;Ruben Martins;Colleen Rollins;Christina Popescu;Eryn Lundrigan;Emily Snook - 通讯作者:
Emily Snook
The Case for Globalizing Fairness: A Mixed Methods Study on Colonialism, AI, and Health in Africa
全球化公平案例:关于非洲殖民主义、人工智能和健康的混合方法研究
- DOI:
- 发表时间:
2024 - 期刊:
- 影响因子:0
- 作者:
M. Asiedu;Awa Dieng;Alexander Haykel;Negar Rostamzadeh;Stephen R. Pfohl;Chirag Nagpal;Maria Nagawa;Abigail Oppong;Sanmi Koyejo;Katherine Heller - 通讯作者:
Katherine Heller
Katherine Heller的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Katherine Heller', 18)}}的其他基金
CAREER: Interacting Dynamic Bayesian Models for Social Behavior and Reasoning
职业:社会行为和推理的互动动态贝叶斯模型
- 批准号:
1553465 - 财政年份:2016
- 资助金额:
$ 29.72万 - 项目类别:
Standard Grant
BRAIN EAGER: Integrative Cross-Modal and Cross-Species Brain Models: Motivation and Reward
BRAIN EAGER:综合跨模式和跨物种大脑模型:动机和奖励
- 批准号:
1451017 - 财政年份:2014
- 资助金额:
$ 29.72万 - 项目类别:
Standard Grant
Bayesian Models of Social Behavior Using Online Resources
使用在线资源的社会行为贝叶斯模型
- 批准号:
1339593 - 财政年份:2013
- 资助金额:
$ 29.72万 - 项目类别:
Standard Grant
Workshop for Women in Machine Learning
机器学习女性研讨会
- 批准号:
1346800 - 财政年份:2013
- 资助金额:
$ 29.72万 - 项目类别:
Standard Grant
Bayesian Models of Social Behavior using Online Resources
使用在线资源的社会行为贝叶斯模型
- 批准号:
1048563 - 财政年份:2011
- 资助金额:
$ 29.72万 - 项目类别:
Standard Grant
相似海外基金
Data-driven phenotyping of central disorders of hypersomnolence with unsupervised clustering: toward more reliable diagnostic criteria
无监督聚类的数据驱动的中枢性嗜睡症表型分析:寻求更可靠的诊断标准
- 批准号:
481046 - 财政年份:2023
- 资助金额:
$ 29.72万 - 项目类别:
Unsupervised Deep Learning-Based Detection and Clustering for Biodiversity Analysis using Underwater Imagery
使用水下图像进行生物多样性分析的基于无监督深度学习的检测和聚类
- 批准号:
575766-2022 - 财政年份:2022
- 资助金额:
$ 29.72万 - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Master's
A Study on Deep Learning of Unsupervised Image Segmentation by Differentiable Clustering
基于可微分聚类的无监督图像分割深度学习研究
- 批准号:
20K19837 - 财政年份:2020
- 资助金额:
$ 29.72万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
CIF: CAREER: Robust, Interpretable, and Efficient Unsupervised Learning with K-set Clustering
CIF:职业:使用 K 集聚类进行稳健、可解释且高效的无监督学习
- 批准号:
1845076 - 财政年份:2019
- 资助金额:
$ 29.72万 - 项目类别:
Continuing Grant
Cardiac alterations in young adults born very preterm: unsupervised clustering identifies subgroups related to perinatal complications.
非常早产的年轻人的心脏改变:无监督聚类识别与围产期并发症相关的亚组。
- 批准号:
383200 - 财政年份:2018
- 资助金额:
$ 29.72万 - 项目类别:
Statistical theory of unsupervised learning with a focus on clustering methods
以聚类方法为重点的无监督学习统计理论
- 批准号:
26880031 - 财政年份:2014
- 资助金额:
$ 29.72万 - 项目类别:
Grant-in-Aid for Research Activity Start-up
Effective improvement of time-series pattern recognition systems using clustering and unsupervised adaptive training
使用聚类和无监督自适应训练有效改进时间序列模式识别系统
- 批准号:
23700218 - 财政年份:2011
- 资助金额:
$ 29.72万 - 项目类别:
Grant-in-Aid for Young Scientists (B)
NetSE Small: Unsupervised flow-based clustering
NetSE Small:无监督的基于流的集群
- 批准号:
0915552 - 财政年份:2009
- 资助金额:
$ 29.72万 - 项目类别:
Continuing Grant
Unsupervised clustering of very large databases.
超大型数据库的无监督集群。
- 批准号:
304299-2004 - 财政年份:2005
- 资助金额:
$ 29.72万 - 项目类别:
Postgraduate Scholarships - Doctoral
Unsupervised clustering of very large databases.
超大型数据库的无监督集群。
- 批准号:
304299-2004 - 财政年份:2004
- 资助金额:
$ 29.72万 - 项目类别:
Postgraduate Scholarships - Doctoral