CAREER: Model-based compression and probabilistic analysis of non-Markovian sequences
职业:非马尔可夫序列的基于模型的压缩和概率分析
基本信息
- 批准号:2144974
- 负责人:
- 金额:$ 55.95万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Continuing Grant
- 财政年份:2022
- 资助国家:美国
- 起止时间:2022-10-01 至 2027-09-30
- 项目状态:未结题
- 来源:
- 关键词:
项目摘要
This project aims to develop efficient data-compression and analysis methods for large and complex data based on probabilistic models that will facilitate algorithm design, analysis, and evaluation. The project advances flexible probabilistic models capable of accurately representing such data. These models will be leveraged to design scalable analysis and compression algorithms, establish their fundamental limits, and provide provable performance guarantees. In particular, the project will study data-compression algorithms for removing redundancy in large-scale data-storage systems, where traditional compression methods are computationally infeasible. It will also develop novel estimation and testing algorithms for genomic sequences, where existing probabilistic models are too restrictive to faithfully represent their internal statistical structure. The project considers fundamental problems in information theory and statistical signal processing and has the potential to contribute to public health through more accurate statistical analysis of genomic data. The research results will be incorporated in a range of educational activities, including developing interactive and accessible online courses that will emphasize connections between mathematics, engineering, and science, and promote a principled model-based approach to solving engineering and scientific problems. The project has two research thrusts, which correspond to two critical settings in which conventional probabilistic models of sequences, most commonly Markov as well as independent and identically distributed (iid) models, and their associated methods, are inapplicable. The first thrust focuses on sequences with long-range redundancy, i.e., with long repeated blocks appearing at large distances, common in terabyte-scale data storage systems. The project will develop generative data-driven models for sources with approximate repeats, establish information-theoretic bounds on compressing them, and develop and optimize compression algorithms, including compression of distributed sources and universal compression for sources with unknown parameters. The second thrust focuses on evolutionary sources, i.e., those that produce data through consecutive edits, used to model the generation process of genomic data. Problems such as parameter estimation, hypothesis testing, and the prediction of future behavior for evolutionary sources will be addressed by formulating a stochastic approximation framework in which asymptotic and finite-time behavior of sequences are analyzed. The resulting analysis methods and algorithms developed in this thrust will be used to study several problems in bioinformatics.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
该项目旨在基于概率模型为大型复杂数据开发有效的数据压缩和分析方法,从而促进算法设计、分析和评估。该项目提出了能够准确表示此类数据的灵活概率模型。这些模型将用于设计可伸缩的分析和压缩算法,确定其基本限制,并提供可证明的性能保证。特别是,该项目将研究用于在大型数据存储系统中消除冗余的数据压缩算法,传统的压缩方法在计算上是不可行的。它还将为基因组序列开发新的估计和测试算法,其中现有的概率模型限制太大,无法忠实地表示其内部统计结构。该项目考虑信息论和统计信号处理方面的基本问题,并有可能通过对基因组数据进行更准确的统计分析,为公共卫生作出贡献。研究成果将被纳入一系列教育活动,包括开发交互式和可访问的在线课程,这些课程将强调数学、工程和科学之间的联系,并促进基于原则的基于模型的方法来解决工程和科学问题。该项目有两个研究重点,它们对应于两个关键设置,其中传统的序列概率模型,最常见的是马尔可夫模型,以及独立和同分布(iid)模型及其相关方法,都不适用。第一个重点是具有远程冗余的序列,即在大距离上出现的长重复块,在太字节规模的数据存储系统中很常见。该项目将为具有近似重复的源开发生成数据驱动模型,建立压缩它们的信息论界限,并开发和优化压缩算法,包括分布式源的压缩和具有未知参数的源的通用压缩。第二个重点是进化来源,即那些通过连续编辑产生数据的来源,用于模拟基因组数据的生成过程。参数估计、假设检验和演化源未来行为的预测等问题将通过制定一个随机逼近框架来解决,其中分析了序列的渐近和有限时间行为。由此产生的分析方法和算法将用于研究生物信息学中的几个问题。该奖项反映了美国国家科学基金会的法定使命,并通过使用基金会的知识价值和更广泛的影响审查标准进行评估,被认为值得支持。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Farzad Farnoud其他文献
Constrained Code for Data Storage in DNA via Nanopore Sequencing
通过纳米孔测序在 DNA 中存储数据的约束代码
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Kallie Whritenour;M. Civelek;Farzad Farnoud - 通讯作者:
Farzad Farnoud
Noise and uncertainty in string-duplication systems
字符串复制系统中的噪声和不确定性
- DOI:
- 发表时间:
2017 - 期刊:
- 影响因子:0
- 作者:
Siddhartha Jain;Farzad Farnoud;Moshe Schwartz;Jehoshua Bruck - 通讯作者:
Jehoshua Bruck
A general framework for distributed vote aggregation
分布式投票聚合的通用框架
- DOI:
10.1109/acc.2013.6580423 - 发表时间:
2012 - 期刊:
- 影响因子:0
- 作者:
B. Touri;Farzad Farnoud;A. Nedić;O. Milenkovic - 通讯作者:
O. Milenkovic
A Statistical Analysis of Duplication Errors in the Nanopore Sequencing Channel
纳米孔测序通道重复错误的统计分析
- DOI:
- 发表时间:
2023 - 期刊:
- 影响因子:0
- 作者:
Sarvin Motamen;Hao Lou;Farzad Farnoud - 通讯作者:
Farzad Farnoud
Small-sample distribution estimation over sticky channels
粘性通道上的小样本分布估计
- DOI:
- 发表时间:
2009 - 期刊:
- 影响因子:0
- 作者:
Farzad Farnoud;O. Milenkovic;N. Santhanam - 通讯作者:
N. Santhanam
Farzad Farnoud的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Farzad Farnoud', 18)}}的其他基金
Collaborative Research: CIF: Small: Versatile Data Synchronization: Novel Codes and Algorithms for Practical Applications
合作研究:CIF:小型:多功能数据同步:实际应用的新颖代码和算法
- 批准号:
2312871 - 财政年份:2023
- 资助金额:
$ 55.95万 - 项目类别:
Standard Grant
CIF: Small: Collaborative Research: Rank Aggregation with Heterogeneous Information Sources: Efficient Algorithms and Fundamental Limits
CIF:小型:协作研究:异构信息源的排名聚合:高效算法和基本限制
- 批准号:
1908544 - 财政年份:2019
- 资助金额:
$ 55.95万 - 项目类别:
Standard Grant
CIF: NSF-BSF: Small: Collaborative Research: Characterization and Mitigation of Noise in a Live DNA Storage Channel
CIF:NSF-BSF:小型:合作研究:活体 DNA 存储通道中噪声的表征和缓解
- 批准号:
1816409 - 财政年份:2018
- 资助金额:
$ 55.95万 - 项目类别:
Standard Grant
CRII: CIF: Model-based Compression of Biological Sequences
CRII:CIF:基于模型的生物序列压缩
- 批准号:
1755773 - 财政年份:2018
- 资助金额:
$ 55.95万 - 项目类别:
Standard Grant
相似国自然基金
基于类脑视觉的面部编码机理与模型
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于EMA的肺癌心理痛苦患者自杀意念风险预警模型及即时适配干预策
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于机器学习构建急性头晕/眩晕缺血性脑卒中预测模型的研究
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于智能优化算法的糖尿病诊疗决策模型构建及应用——融合重庆三甲医院临床数据
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于小目标检测与DeepSeek大模型的智能医学检测及诊疗研究
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于智能网联技术的新型平面智能停车场研究
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于多模态大模型的智能机器人复杂动态环境具身感知与学习机制研究
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于新型贝叶斯优化的镁合金性能定制设计与模型可视化研究
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
基于图神经网络的结构非线性动态响应预测研究
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
数字经济背景下基于招聘大数据的应用型本科学生能力图谱构建
- 批准号:
- 批准年份:2025
- 资助金额:0.0 万元
- 项目类别:省市级项目
相似海外基金
CAREER: Efficient Large Language Model Inference Through Codesign: Adaptable Software Partitioning and FPGA-based Distributed Hardware
职业:通过协同设计进行高效的大型语言模型推理:适应性软件分区和基于 FPGA 的分布式硬件
- 批准号:
2339084 - 财政年份:2024
- 资助金额:
$ 55.95万 - 项目类别:
Continuing Grant
CAREER: Co-designing a Service Exchange Model for Sustaining Community-based Respite Care
职业:共同设计服务交换模型以维持基于社区的临时护理
- 批准号:
2145049 - 财政年份:2023
- 资助金额:
$ 55.95万 - 项目类别:
Continuing Grant
CAREER: Model-based Analysis of Dynamic Networks using Continuous-time Network Models
职业:使用连续时间网络模型对动态网络进行基于模型的分析
- 批准号:
2318751 - 财政年份:2022
- 资助金额:
$ 55.95万 - 项目类别:
Continuing Grant
CAREER: Reconciling Model-Based and Learning-Based Imaging: Theory, Algorithms, and Applications
职业:协调基于模型和基于学习的成像:理论、算法和应用
- 批准号:
2043134 - 财政年份:2021
- 资助金额:
$ 55.95万 - 项目类别:
Continuing Grant
CAREER: Model-based Analysis of Dynamic Networks using Continuous-time Network Models
职业:使用连续时间网络模型对动态网络进行基于模型的分析
- 批准号:
2047955 - 财政年份:2021
- 资助金额:
$ 55.95万 - 项目类别:
Continuing Grant
Career Formation of Graduates in Rural Ghana: A Follow-up Study Based on a Multi-track Model
加纳农村毕业生的职业形成:基于多轨模型的跟踪研究
- 批准号:
21K13531 - 财政年份:2021
- 资助金额:
$ 55.95万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
CAREER: A Model-Based Rosetta Stone to Decipher the Stratigraphic Expression of Glacial Isostasy
事业:基于模型的罗塞塔石碑破译冰川均衡的地层表达
- 批准号:
2046244 - 财政年份:2021
- 资助金额:
$ 55.95万 - 项目类别:
Continuing Grant
CAREER: The Science of Measurement-based Stability Assessment and Model Validation for Microgrids
职业:基于测量的微电网稳定性评估和模型验证科学
- 批准号:
1944689 - 财政年份:2020
- 资助金额:
$ 55.95万 - 项目类别:
Continuing Grant
Proposal of education model for social implementation based on analysis of career development effect after graduation by social implementation ability cultivated in science and technology education
基于科技教育培养的社会实施能力对毕业后职业发展影响分析的社会实施教育模式建议
- 批准号:
20H01751 - 财政年份:2020
- 资助金额:
$ 55.95万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
CAREER: Advancing Space Optical Communication Systems Via Hybrid Model-Based and Learning-Based Frameworks
职业:通过基于模型和基于学习的混合框架推进空间光通信系统
- 批准号:
1944828 - 财政年份:2020
- 资助金额:
$ 55.95万 - 项目类别:
Continuing Grant