权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Security and compilers for machine learning

机器学习的安全性和编译器

基本信息

批准号：
2906291
负责人：
金额：
--
依托单位：
Imperial College London
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2024
资助国家：
英国
起止时间：
2024 至无数据
项目状态：
未结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2906291
关键词：
Security compilers machine learning

项目摘要

Machine learning (ML) is rapidly gaining traction across various industries, promising transformative benefits in diverse fields. However, the increasing reliance on ML systems has brought to light the crucial need for robust security and safety measures. This is due to the inherent vulnerabilities associated with ML models and the potential consequences of their misuse.One of the primary concerns is the susceptibility of ML models to adversarial attacks, where malicious actors manipulate data, model parameters, or model architecture to exploit the system. These attacks can result in biased, inaccurate, and even dangerous decision-making. Additionally, the complexity of ML models makes it challenging to identify and mitigate vulnerabilities, making them difficult to defend against.Another significant issue is AI alignment. Alignment refers to the process of ensuring that artificial intelligence (AI) systems behave in ways that align with human values and objectives. It involves developing techniques to guide AI models towards making decisions and taking actions that are beneficial to humanity, while minimizing potential harms. AI alignment is crucial for the responsible development and deployment of AI systems, as it helps ensure that AI technologies align with human interests and are used ethically and beneficially.This PhD explores various avenues in improving ML security and safety. The initial projects are as follows.One project is to improve the quality of human preference data used to fine-tune ML models. Alignment relies heavily on the quality of human preferences, but much of the existing data is generated by overworked and underpaid workers with no real incentive to provide good data. This project would experimentally research the effect of paying workers bonuses based on whether their preferences successfully improve the performance of the model on existing benchmarks. The research will also determine whether increasing human motivation in this way increases the performance of the model even in metrics which are not rewarded. If so, this could lead to better performance in metrics for which there are no good benchmarks, such as political bias.Another project to improve the security of models against adversarial attack is to investigate whether the new push towards self-rewarding language models creates an opportunity for backdoors to be amplified through the inherent feedback loop in these self-rewarding models. This follows from ideas such as Model Collapse, in which training on LLM-generated data can lead to total performance failure, and the existing body of work on data poisoning to insert backdoors in LLMs.A third project is to investigate various methods for locking machine learning models to specific hardware, such as by using a difficult-to-forge hardware fingerprint (e.g. based on the number of clock cycles required to complete an operation) as an encryption key for the weights of the model, or by optimising models for particular quantisation schemes that only exist on some hardware.This project aligns with the EPSRC research area "Artificial intelligence technologies".

机器学习（ML）正在各个行业迅速获得关注，有望在各个领域带来变革性的好处。然而，对机器学习系统的日益依赖已经揭示了对强大的安全和安全措施的关键需求。这是由于与ML模型相关的固有漏洞以及滥用ML模型的潜在后果。其中一个主要问题是ML模型对对抗性攻击的敏感性，在对抗性攻击中，恶意参与者操纵数据、模型参数或模型架构来利用系统。这些攻击可能导致有偏见、不准确甚至危险的决策。此外，机器学习模型的复杂性使得识别和减轻漏洞变得具有挑战性，使它们难以防御。另一个重要问题是AI的一致性。一致性指的是确保人工智能（AI）系统的行为方式与人类的价值观和目标保持一致的过程。它涉及开发技术，指导人工智能模型做出有利于人类的决策和行动，同时最大限度地减少潜在的危害。人工智能的一致性对于负责任的开发和部署人工智能系统至关重要，因为它有助于确保人工智能技术符合人类利益，并以道德和有益的方式使用。本博士探讨了提高机器学习安全性的各种途径。最初的项目如下：其中一个项目是提高用于微调ML模型的人类偏好数据的质量。一致性在很大程度上依赖于人类偏好的质量，但现有的大部分数据都是由过度劳累和工资过低的工人产生的，他们没有提供优质数据的真正动机。这个项目将通过实验研究，根据员工的偏好是否成功地改善了模型在现有基准上的表现，来支付员工奖金的效果。该研究还将确定以这种方式增加人类动机是否会提高模型的性能，即使在没有奖励的指标中也是如此。如果是这样，这可能会导致在政治偏见等没有良好基准的指标上取得更好的表现。另一个提高模型对抗对抗性攻击的安全性的项目是调查自我奖励语言模型的新趋势是否会通过这些自我奖励模型中的固有反馈循环为后门被放大创造机会。这源于模型崩溃（Model Collapse）等想法，其中对llm生成的数据进行训练可能导致总体性能失败，以及在llm中插入后门的现有数据中毒工作。第三个项目是研究将机器学习模型锁定到特定硬件的各种方法，例如通过使用难以伪造的硬件指纹（例如，基于完成操作所需的时钟周期数）作为模型权重的加密密钥，或者通过优化仅存在于某些硬件上的特定量化方案的模型。该项目与EPSRC的研究领域“人工智能技术”保持一致。