权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Learning and inference with large image corpora

使用大型图像语料库进行学习和推理

基本信息

批准号：
RGPIN-2020-06848
负责人：
Fleet, David
金额：
$ 4.01万
依托单位：
University of Toronto
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2021
资助国家：
加拿大
起止时间：
2021-01-01 至 2022-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=741382
关键词：
Learning inference large image corpora

项目摘要

This proposal targets two domains within my broader research program on computer vision and machine learning: 1) deep latent variable models for large-scale data; and 2) algorithms for 3D molecular reconstruction with electron cryo-microscopy (cryo-EM). We recently formulated a new class of probabilistic encoder-decoder based on two principles: 1) symmetry, for which the encoder and decoder are consistent; and 2) high mutual information between observations and latent states. The formulation appears to address 2 problems with variational auto-encoders, ie, its asymmetric loss, with an approximate encoder, and its tendency to produce pathological results, known as posterior collapse. Preliminary results show the approach produces excellent models with high mutually information, and stable training. We aim to 1) to complete the development of the theoretical basis for this model, and test it empirically with high-dimensional image and language data. We then plan to extend it to support semi-supervised image translation tasks, and the learning of conditional and compositional deep representations on large scale corpora, with applications to few-shot learning. The goal of Cryo-EM is to estimate 3D bio-molecular structure at atomic resolutions given 2D images from an electron microscope. Our recent algorithms are now used in a state-of-the-art software pipeline called cryoSPARC. We propose several directions toward the next generation of cryo-EM algorithms: 1) We address a long-standing problem in cryo-EM, namely how to measure the quality of estimated structures in a principaled fashion. We propose to develop a new principled formulation as a form of cross-validation from statistical machine learning; 2) We plan to develop new algorithms for heterogeneous particles (vs current methods that assume particles are identical up to a rigid transform). The new method, non-uniform refinement, will allow signal-to-noise levels to vary spatially, yielding improved resolution of estimated structures; 3) We plan use deep learning to denoise estimated 3D maps, using new techniques that do not reequire noiseless ground truth data, within a meta-learning framework; 4) We plan to develop algorithms for reconstructing flexible proteins using a combination of normal mode analysis, thermodynamics, and parameterized deformations to learn deep conditional particle dynamics, yielding new algorithms for highly dynamic proteins. Impact: Unsupervised learning may be the next breakthrough in machine learning, thereby avoiding to need to collect of vast amounts of annotated training data. Cryo-EM has been disruptive in molecular biology and durg discovery. The new methods proposed here will maintain our leadership in this exicting field. Finally, training students in learning and vision is essential; Previous HQP from my group, all residing in Canada, include M Brubaker (Borealis AI), M Norouzi (Google Brain), R Urtasun (Uber ATG), and Leonid Sigal (UBC).

本提案针对我在计算机视觉和机器学习方面更广泛的研究计划中的两个领域：1)大规模数据的深潜变量模型；2)电子冷冻显微镜（cryo-EM）三维分子重建算法。我们最近提出了一类新的基于两个原则的概率编码器：1)对称性，即编码器和解码器是一致的；2)观测值与潜态之间的互信息较高。该配方似乎解决了变分自编码器的两个问题，即其不对称丢失，具有近似编码器，以及其产生病理结果的倾向，即后向塌陷。初步结果表明，该方法产生的模型互信息高，训练稳定。我们的目标是：1)完成该模型的理论基础的开发，并用高维图像和语言数据对其进行实证检验。然后，我们计划将其扩展到支持半监督图像翻译任务，以及在大规模语料库上学习条件和组合深度表示，并应用于少量学习。Cryo-EM的目标是在给定电子显微镜的二维图像的原子分辨率下估计三维生物分子结构。我们最近的算法现在被用于一个叫做cryoSPARC的最先进的软件管道。我们提出了下一代低温电镜算法的几个方向：1)我们解决了低温电镜中一个长期存在的问题，即如何以一种原则的方式测量估计结构的质量。我们建议开发一种新的原则公式，作为统计机器学习交叉验证的一种形式；2)我们计划开发针对异质粒子的新算法（相对于目前假设粒子相同直至刚性变换的方法）。新方法，非均匀细化，将允许信号噪声水平在空间上变化，从而提高估计结构的分辨率；3)我们计划在元学习框架内，使用不需要无噪声地面真值数据的新技术，使用深度学习对估计的3D地图进行降噪；4)我们计划开发用于重建柔性蛋白质的算法，结合正常模式分析，热力学和参数化变形来学习深度条件粒子动力学，从而产生高动态蛋白质的新算法。影响：无监督学习可能是机器学习的下一个突破，从而避免需要收集大量带注释的训练数据。低温电镜技术在分子生物学和药物发现方面具有颠覆性意义。这里提出的新方法将保持我们在这个现有领域的领导地位。最后，培养学生的学习能力和远见是必不可少的；我小组以前的HQP，都居住在加拿大，包括M Brubaker (Borealis AI), M Norouzi（谷歌Brain）， R Urtasun （Uber ATG）和Leonid signal （UBC）。