CAREER: Scalable and Adaptable Sparsity-driven Methods for more Efficient AI Systems

职业：可扩展且适应性强的稀疏驱动方法，可实现更高效的人工智能系统

基本信息

批准号：
2238291
负责人：
Gheorghi Guzun
金额：
$ 55.03万
依托单位：
San Jose State University Foundation
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2023
资助国家：
美国
起止时间：
2023-03-01 至 2028-02-29
项目状态：
未结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=2238291&HistoricalAwards=false
关键词：
CAREER Scalable Adaptable Sparsity driven

项目摘要

Artificial Intelligence (AI) and, in particular, Deep Neural Networks (DNN) have achieved better than human accuracy on many cognitive tasks involving images, natural language processing, and protein structure, among others. Unfortunately, due to high data processing demands, AI systems are typically run on power-hungry specialized computing hardware. Quantization, or approximation to smaller numerical values, has been used to reduce computing requirements. However, the fixed low bit-width DNNs may suffer losses in accuracy due to quantization errors. Many existing software solutions for quantization are also fixed or limited in bit-width choices. To address this trade-off and leverage data sparsity, the research team will investigate state-of-the-art methods and develop novel data quantization, encoding, and compression algorithms to integrate with existing AI systems. The methods developed have the potential to not only improve performance but also to reduce power requirements and boost the energy efficiency of AI systems. They will enable AI applications such as DNN inference on small devices, thus reducing the load on cloud infrastructure, improving user experience, providing data privacy, and avoiding security risks. The work proposed in this project has the potential to push the boundaries in many AI applications that run on energy storage-constrained devices, such as smart sensing, wearable devices, and autonomous driving. The research and educational tools will facilitate and increase student and research community participation in advancing AI research. The work will be conducted at a minority-serving institution, and the funding will support students from underrepresented groups.The research goal of this project is to investigate quantization and compression methods that can leverage sparsity and improve efficiency in AI systems. The principal investigator (PI) plans to study adaptable quantization and compression methods to leverage sparsity in AI systems while minimizing the overhead in non-sparse situations and minimizing accuracy loss. The trade-off between accuracy and performance with the proposed methods will be studied and defined for automated tunable prioritization of either accuracy, performance, or energy efficiency. The PI plans to develop a prototype with parallel execution of the proposed methods to make the proposed methods truly effective for data centers and advanced hardware architectures. The proposed methods will be packaged into an AI vector primitives library that will be integrated with several popular Deep Learning frameworks as proof of concept, primarily targeting GPU and CPU systems. An integration API will be developed for frameworks like Pytorch or TensorFlow to allow easy integration with other vector primitives. Software libraries will be integrated with a web-based learning platform with automated feedback and a motivating environment to encourage student participation in solving AI challenges.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

在许多涉及图像，自然语言处理和蛋白质结构等的认知任务上，人工智能（AI），尤其是深层神经网络（DNN）在许多认知任务上取得了更好的成就。不幸的是，由于高数据处理需求，AI系统通常在渴望强力的专业计算硬件上运行。量化或近似较小的数值值已用于减少计算要求。但是，固定的低位宽度DNN可能由于量化错误而遭受准确性损失。许多现有的用于量化的软件解决方案也固定或限制位宽度选择。为了解决这一权衡并利用数据稀疏性，研究团队将研究最新方法，并开发新的数据量化，编码和压缩算法，以与现有的AI系统集成。开发的方法不仅有可能提高性能，还可以降低功率需求并提高AI系统的能源效率。他们将启用AI应用程序，例如小型设备上的DNN推断，从而减少云基础架构的负载，改善用户体验，提供数据隐私并避免安全风险。该项目中提出的工作有可能在许多AI应用程序中突破界限，这些应用程序在能源受限的设备上运行，例如智能传感，可穿戴设备和自动驾驶。研究和教育工具将促进并增加学生和研究社区参与推进AI研究。这项工作将在少数派服务机构进行，资金将支持来自代表性不足的群体的学生。该项目的研究目标是研究可以利用稀疏性并提高AI系统效率的量化和压缩方法。首席研究者（PI）计划研究适应性的量化和压缩方法，以利用AI系统中的稀疏性，同时最大程度地减少非偏差情况下的开销，并最大程度地减少准确性损失。将研究和定义精度和拟议方法之间的精度和性能之间的权衡，以自动化精度，性能或能源效率的优先级。 PI计划通过并行执行提出的方法来开发原型，以使所提出的方法真正有效地对数据中心和高级硬件体系结构有效。所提出的方法将包装到AI向量原始库中，该库将与几个流行的深度学习框架集成为概念证明，主要针对GPU和CPU系统。将为Pytorch或TensorFlow等框架开发集成API，以便于与其他向量原始图的集成。软件库将与一个基于网络的学习平台集成，该平台具有自动反馈和激励环境，以鼓励学生参与解决AI挑战。该奖项反映了NSF的法定任务，并被认为是值得通过基金会的知识分子的评估来提供支持的，并具有更广泛的影响。