权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Discovering Individual and Social Preferences through Inverse Reinforcement Learning

通过逆强化学习发现个人和社会偏好

基本信息

批准号：
ES/S00176X/1
负责人：
Amir Jahangiri
金额：
$ 37.1万
依托单位：
University of Essex
依托单位国家：
英国
项目类别：
Fellowship
财政年份：
2018
资助国家：
英国
起止时间：
2018 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=ES%2FS00176X%2F1
关键词：
Discovering Individual Social Preferences through

项目摘要

Organisations that provide services and create products often base their decisions on questionnaires and/or other explicit forms of communication with their user base (e.g. patients, customers, citizens). The aim of this information exchange between providers and users is to uncover the users' "reward function", i.e. what users actually want from their interactions and what issues exist with the current product/service line-up. Explicit forms of information exchange can be cumbersome and expensive to design for organisations and are intrusive to the user. Furthermore, response bias is a well-known problem for survey based methods, particularly around sensitive topics, where respondents maybe unwilling to engage due to social or cultural concerns. Some practical solutions to response bias are provided by indirect questioning methods (item count and randomized response techniques). However, none of these solutions are practical for large scale and real time settings.We postulate that ideally an organisation should try to elicit the reward function of its user base (i.e. what states are preferred by users) by using observational data generated from user activity. Inspired by recent literature in AI research, we propose a three-facet programme that aims to directly attack the problem of what users want by a) trying to infer the user reward function through the collection of behavioural data (e.g. website clicks, traffic behaviour, movie preferences); b) creating short, non-intrusive online questionnaires that will remove any uncertainties; and c) exploiting user preferences in order to improve service and product provision.The proposed research aims to contribute to developing methods that can be embedded in artificial intelligence systems which must elicit and understand preferences by interacting with humans in order to adapt their behaviour and allow for a more natural experience and interaction.Through this research we have four key objectives: (a) understand user preferences and develop methods to uncover and learn the reward function through data and behaviours; (b) develop interactive and conversational methods for eliciting responses and interactions from users that allow for a more natural user experience with automatic systems; (c) explore the social limitations of our approach (for instance, to what extend are personal rewards not dictated by individual preferences, but rather by social coercion?); and (d) investigate what steps can be taken to fully automate the procedure of provisioning new services and products through eliciting preferences via the methods developed under (a) and (b).This Fellowship provides a unique opportunity to bring together artificial intelligence techniques and social science to tackle problems that are faced by a range of businesses and organisations in dealing with clients and customers and attempting to elicit preferences and needs through behaviours and interactions. We will be working closely with our industry partners in this project, British Telecom (BT) and the Essex County Council (ECC), to investigate the issues and challenges of eliciting and understanding preferences as being faced in their own contexts to inform and shape the programme of work.

提供服务和创建产品的组织通常根据调查问卷和/或与其用户群（例如患者、客户、公民）的其他明确沟通形式做出决策。提供商和用户之间的这种信息交换的目的是揭示用户的“奖励功能”，即用户实际上想要从他们的交互中得到什么以及当前产品/服务阵容存在什么问题。对于组织来说，显式的信息交换形式设计起来可能很麻烦且昂贵，并且对用户来说是一种干扰。此外，对于基于调查的方法来说，响应偏差是一个众所周知的问题，特别是在敏感主题方面，受访者可能由于社会或文化问题而不愿意参与。间接提问方法（项目计数和随机回答技术）提供了一些针对回答偏差的实用解决方案。然而，这些解决方案对于大规模和实时设置来说都不实用。我们假设，理想情况下，组织应该尝试通过使用从用户活动生成的观察数据来得出其用户群的奖励函数（即用户喜欢什么状态）。受最近人工智能研究文献的启发，我们提出了一个三方面的计划，旨在通过以下方式直接解决用户想要什么的问题：a）尝试通过收集行为数据（例如网站点击、流量行为、电影偏好）来推断用户奖励函数； b) 创建简短的、非侵入性的在线调查问卷，以消除任何不确定性；拟议的研究旨在开发可嵌入人工智能系统的方法，这些方法必须通过与人类交互来引发和理解偏好，以适应他们的行为并允许更自然的体验和交互。通过这项研究，我们有四个关键目标：（a）了解用户偏好并开发通过数据和行为发现和学习奖励函数的方法； (b) 开发交互式和对话方法，以引起用户的响应和交互，从而为自动系统提供更自然的用户体验； (c) 探讨我们的方法的社会局限性（例如，个人奖励在多大程度上不是由个人偏好决定，而是由社会强制决定？）； (d) 研究可以采取哪些步骤，通过 (a) 和 (b) 中开发的方法引发偏好，从而完全自动化提供新服务和产品的过程。该奖学金提供了一个独特的机会，将人工智能技术和社会科学结合起来，以解决一系列企业和组织在与客户和顾客打交道并试图通过行为和互动引发偏好和需求时所面临的问题。我们将与该项目中的行业合作伙伴英国电信 (BT) 和埃塞克斯郡议会 (ECC) 密切合作，调查在其自身背景下获取和理解偏好的问题和挑战，从而为工作计划提供信息和制定。