Learning from Multiple-Instance and Unlabeled Data

从多实例和未标记数据中学习

基本信息

  • 批准号:
    9988314
  • 负责人:
  • 金额:
    $ 21.72万
  • 依托单位:
  • 依托单位国家:
    美国
  • 项目类别:
    Standard Grant
  • 财政年份:
    2000
  • 资助国家:
    美国
  • 起止时间:
    2000-09-01 至 2003-08-31
  • 项目状态:
    已结题

项目摘要

Learning from Multiple-Instance and Unlabeled DataSally A. GoldmanDepartment of Computer ScienceWashington UniversitySt. Louis, MO 63130 PROJECT SUMMARYIn standard supervised learning each example is given a label with the correct (or possibly noisy) classification. In unsupervised learning, all the individual examples are unlabeled with just a single overall label. This project is studying two learning models that fall between these two extremes. In the multiple-instance model the learner only receives labeled collections (or bags) of examples. A bag is classified as positive if and only if at least one of the examples in the bag is classified as positive by the target concept. Supervised and unsupervised learning can be thought of as two special cases of this model. In supervised learning, each example is in its own bag, and in unsupervised learning, all examples are together in one bag. The multiple-instance model was motivated by the drug activity prediction problem where each example is a possible shape for a molecule of interest and each bag contains all likely shapes for the molecule. By accurately predicting which molecules will bind to an unknown protein, one can accelerate the discovery process for new drugs, hence reducing cost. Existing multiple-instance learning algorithms use boolean labels for the bags. However, in the drug activity prediction problem, the true label is a real-valued affinity value measurement which gives the strength of the binding. This project is performing an in-depth study of learning in the multiple-instance model with real-valued labels including empirical studies using real drug binding data. Other applications areas will also be explored.This project is also studying learning when much of the available data is unlabeled. In many application areas (e.g. the classification of web pages as appropriate or inappropriate for minors, or medical applications) there is a small amount of labeled data along with a large pool of unlabeled data. This project is studying techniques to use the unlabeled data to improve the performance of standard supervised learning algorithms. In particular, a method of co-training is being studied in which there are two independent learning algorithms which are originally trained on the labeled data. Then using statistical techniques, each learner will repeatedly select some of the unlabeled data to labeled for the other learner. This project will perform empirical studies and also theoretical studies to understand the limitations of various approaches to develop better learning algorithms.
从多实例和未标记数据中学习Sally A. Goldman华盛顿大学计算机科学系圣路易斯分校Louis, MO 63130 项目摘要在​​标准监督学习中,每个示例都会被赋予一个具有正确(或可能有噪声)分类的标签。 在无监督学习中,所有单独的示例都没有标记,只有一个整体标签。 该项目正在研究两种介于这两个极端之间的学习模型。 在多实例模型中,学习者仅接收带标签的示例集合(或包)。 当且仅当包中的至少一个示例被目标概念分类为肯定时,该包才被分类为肯定。 监督学习和无监督学习可以被认为是该模型的两个特殊情况。 在监督学习中,每个示例都放在自己的包中,而在无监督学习中,所有示例都放在一个包中。 多实例模型的动机是药物活性预测问题,其中每个示例都是感兴趣分子的可能形状,并且每个袋子包含该分子的所有可能形状。 通过准确预测哪些分子将与未知蛋白质结合,可以加速新药的发现过程,从而降低成本。 现有的多实例学习算法对袋子使用布尔标签。 然而,在药物活性预测问题中,真正的标记是实值亲和力值测量,它给出了结合的强度。 该项目正在对具有实值标签的多实例模型中的学习进行深入研究,包括使用真实药物结合数据的实证研究。 还将探索其他应用领域。该项目还在研究许多可用数据未标记时的学习。 在许多应用领域(例如,将网页分类为适合或不适合未成年人,或医疗应用),存在少量标记数据和大量未标记数据。 该项目正在研究使用未标记数据来提高标准监督学习算法性能的技术。 特别是,正在研究一种协同训练方法,其中有两个独立的学习算法最初是在标记数据上进行训练的。 然后使用统计技术,每个学习者将重复选择一些未标记的数据来为其他学习者进行标记。 该项目将进行实证研究和理论研究,以了解各种方法的局限性,以开发更好的学习算法。

项目成果

期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

数据更新时间:{{ journalArticles.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ monograph.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ sciAawards.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ conferencePapers.updateTime }}

{{ item.title }}
  • 作者:
    {{ item.author }}

数据更新时间:{{ patent.updateTime }}

Sally Goldman其他文献

Sally Goldman的其他文献

{{ item.title }}
{{ item.translation_title }}
  • DOI:
    {{ item.doi }}
  • 发表时间:
    {{ item.publish_year }}
  • 期刊:
  • 影响因子:
    {{ item.factor }}
  • 作者:
    {{ item.authors }}
  • 通讯作者:
    {{ item.author }}

{{ truncateString('Sally Goldman', 18)}}的其他基金

Applying Multiple-Instance Learning to Content-Based Image Retrieval
将多实例学习应用于基于内容的图像检索
  • 批准号:
    0329241
  • 财政年份:
    2003
  • 资助金额:
    $ 21.72万
  • 项目类别:
    Continuing Grant
Applying Learning Theory to Networking Problems
将学习理论应用于网络问题
  • 批准号:
    9734940
  • 财政年份:
    1998
  • 资助金额:
    $ 21.72万
  • 项目类别:
    Standard Grant
NSF Young Investigator: New Directions in Computational Learning Theory
NSF 青年研究员:计算学习理论的新方向
  • 批准号:
    9357707
  • 财政年份:
    1993
  • 资助金额:
    $ 21.72万
  • 项目类别:
    Continuing Grant
The Role of the Environment in On-Line Learning
环境在在线学习中的作用
  • 批准号:
    9110108
  • 财政年份:
    1991
  • 资助金额:
    $ 21.72万
  • 项目类别:
    Standard Grant

相似国自然基金

基于Multiple Collocation的北半球多源雪深数据长时序融合研究
  • 批准号:
    42001289
  • 批准年份:
    2020
  • 资助金额:
    24.0 万元
  • 项目类别:
    青年科学基金项目

相似海外基金

NEMO - Net zero events using multiple open data sources
NEMO - 使用多个开放数据源的净零事件
  • 批准号:
    10114096
  • 财政年份:
    2024
  • 资助金额:
    $ 21.72万
  • 项目类别:
    SME Support
Vaccination of poultry infected with multiple Salmonella serovars
感染多种沙门氏菌血清型的家禽的疫苗接种
  • 批准号:
    LP230100209
  • 财政年份:
    2024
  • 资助金额:
    $ 21.72万
  • 项目类别:
    Linkage Projects
MULTI-STRESS: Quantifying the impacts of multiple stressors in multiple dimensions to improve ecological forecasting
多重压力:在多个维度量化多种压力源的影响,以改进生态预测
  • 批准号:
    NE/Z000130/1
  • 财政年份:
    2024
  • 资助金额:
    $ 21.72万
  • 项目类别:
    Research Grant
Collaborative Research: NSFDEB-NERC: Warming's silver lining? Thermal compensation at multiple levels of organization may promote stream ecosystem stability in response to drought
合作研究:NSFDEB-NERC:变暖的一线希望?
  • 批准号:
    2312706
  • 财政年份:
    2024
  • 资助金额:
    $ 21.72万
  • 项目类别:
    Standard Grant
Transition Metal - Main Group Multiple Bonding
过渡金属 - 主族多重键合
  • 批准号:
    2349123
  • 财政年份:
    2024
  • 资助金额:
    $ 21.72万
  • 项目类别:
    Standard Grant
CAREER: Stochasticity and Resilience in Reinforcement Learning: From Single to Multiple Agents
职业:强化学习中的随机性和弹性:从单个智能体到多个智能体
  • 批准号:
    2339794
  • 财政年份:
    2024
  • 资助金额:
    $ 21.72万
  • 项目类别:
    Continuing Grant
固形がんによる機能抑制に打ち勝つイヌのmultipleスイッチレセプターCAR-T細胞の開発
开发犬多重开关受体 CAR-T 细胞,克服实体瘤引起的功能抑制
  • 批准号:
    24K18030
  • 财政年份:
    2024
  • 资助金额:
    $ 21.72万
  • 项目类别:
    Grant-in-Aid for Early-Career Scientists
Collaborative Research: Multiple Team Membership (MTM) through Technology: A path towards individual and team wellbeing?
协作研究:通过技术实现多重团队成员 (MTM):通往个人和团队福祉的道路?
  • 批准号:
    2345652
  • 财政年份:
    2024
  • 资助金额:
    $ 21.72万
  • 项目类别:
    Standard Grant
Collaborative Research: Phenotypic and lineage diversification after key innovation(s): multiple evolutionary pathways to air-breathing in labyrinth fishes and their allies
合作研究:关键创新后的表型和谱系多样化:迷宫鱼及其盟友呼吸空气的多种进化途径
  • 批准号:
    2333683
  • 财政年份:
    2024
  • 资助金额:
    $ 21.72万
  • 项目类别:
    Continuing Grant
Collaborative Research: Phenotypic and lineage diversification after key innovation(s): multiple evolutionary pathways to air-breathing in labyrinth fishes and their allies
合作研究:关键创新后的表型和谱系多样化:迷宫鱼及其盟友呼吸空气的多种进化途径
  • 批准号:
    2333684
  • 财政年份:
    2024
  • 资助金额:
    $ 21.72万
  • 项目类别:
    Continuing Grant
{{ showInfoDetail.title }}

作者:{{ showInfoDetail.author }}

知道了