权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Identifying & Classifying Bias in Cultural Heritage Catalogues: Applying Natural Language Processing to University of Edinburgh Archival Descriptions

识别

基本信息

批准号：
2356289
负责人：
金额：
--
依托单位：
University of Edinburgh
依托单位国家：
英国
项目类别：
Studentship
财政年份：
2020
资助国家：
英国
起止时间：
2020 至无数据
项目状态：
已结题

来源：
https://gtr.ukri.org/projects?ref=studentship-2356289
关键词：
Identifying Classifying Bias Cultural Heritage

项目摘要

The objective of this project is to develop a context-informed approach to bias detection, executed as a series of case studies beginning with the University of Edinburgh's Archive. Motivated by separate yet related strands of research in the fields of Natural Language Processing (NLP) and Cultural Heritage, the project identifies opportunity to improve large-scale, automated bias detection. Taking a cross-disciplinary approach, the project applies NLP and data visualisation to archival descriptions. NLP approaches such as topic modelling and sentiment analysis will analyse and classify the language of the Archive's descriptions. Due to the context-dependency of bias, data visualisation provides a suitable approach to presenting results of the NLP analysis. Interactive data visualisations will present the results in their associated geographic areas and time periods, enabling people to see associations that Archive items have with different types of bias. The project will propose a visualisation framework for presenting bias in human language content, which, based on the author's knowledge, has yet to be proposed. Rather than eliminate bias, the project seeks to identify and classify bias, arguing that bias deserves a place in cultural heritage institutions.Bias, though problematic when one-sided, is informative when presented transparently. Bias communicates the perspective of specific groups of people during specific time periods in history; recording historical biases informs understandings of societal evolution and the various perspectives that have existed on a topic [1]. Identifying different types of bias helps researchers understand how representative their dataset is, where more types of bias being present suggests a more representative dataset. This project seeks to develop techniques for identifying and classifying bias that will bring value to cultural heritage institutions and the public they serve, making bias transparent in human language content anywhere from an archival description to a social media post.The project seeks to develop bias-detecting technology beginning with a case study with free-text, human-written, archival descriptions. Cataloguers first wrote archival descriptions on paper in the 1930s and then in databases beginning in the 1970s. Explicitly, the language of archival descriptions reflects their historical contexts, using terms considered racist, sexist or otherwise inappropriately biased today. Implicitly, missing information in archival descriptions regarding certain groups of people reflects historical biases. These types of explicit and implicit bias can be found in textual data beyond cultural heritage catalogues, such as in newspapers and social media posts. As a result, while improving the transparency of the Archive's descriptions, the outcomes of this project could also inform research on returning representative search results [5], implementing fair algorithms [2], and identifying bias in social media [3, 4].References1. Holterhoff, K. (2017) "From Disclaimer to Critique: Race and the Digital Image Archivist." In: Digital Humanities Quarterly 11.3 URL: http://digitalhumanities.org:8081/dhq/vol/11/3/ 000324/000324.html2. IEEE. (2016) Ethically Aligned Design: A Vision for Prioritizing Human Wellbeing with Artificial Intelligence and Autonomous Systems. Version 1. http://standards.ieee.org/develop/indconn/ ec/autonomous%20systems.html 12.05.20183. Recasens, M., Danescu-Nculescu-Mizil, C., Jurafsky, D. (2013). "Linguistic Models for Analyzing and Detecting Biased Language." Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics 1650-1659.

这个项目的目标是开发一个上下文知情的方法来检测偏差，执行一系列的案例研究开始与爱丁堡大学的档案。受自然语言处理（NLP）和文化遗产领域独立但相关的研究的启发，该项目确定了改进大规模自动偏见检测的机会。该项目采用跨学科的方法，将NLP和数据可视化应用于档案描述。NLP方法，如主题建模和情感分析，将分析和分类档案的描述语言。由于偏差的上下文依赖性，数据可视化提供了一种合适的方法来呈现NLP分析的结果。交互式数据可视化将在其相关的地理区域和时间段中呈现结果，使人们能够看到存档项目与不同类型的偏见之间的关联。该项目将提出一个可视化框架，用于呈现人类语言内容中的偏见，根据作者的知识，尚未提出。该项目不是消除偏见，而是寻求识别和分类偏见，认为偏见应该在文化遗产机构中占有一席之地。偏见虽然在片面时有问题，但在透明时是有信息的。偏见传达了历史上特定时期特定人群的观点;记录历史偏见有助于理解社会演变和关于某个主题的各种观点[1]。识别不同类型的偏差有助于研究人员了解他们的数据集的代表性，其中存在更多类型的偏差表明数据集更具代表性。该项目旨在开发识别和分类偏见的技术，为文化遗产机构及其服务的公众带来价值，使偏见在人类语言内容中变得透明，从档案描述到社交媒体帖子。该项目旨在开发偏见检测技术，从自由文本的案例研究开始，人类书写的档案描述。编目员在20世纪30年代首次将档案描述写在纸上，然后在20世纪70年代开始在数据库中。明确地说，档案描述的语言反映了它们的历史背景，使用了今天被认为是种族主义、性别歧视或其他不适当的偏见的术语。档案中关于某些群体的描述中隐含的、缺失的信息反映了历史偏见。这些类型的显性和隐性偏见可以在文化遗产目录之外的文本数据中找到，例如报纸和社交媒体帖子。因此，在提高档案描述的透明度的同时，该项目的成果还可以为返回代表性搜索结果[5]，实施公平算法[2]和识别社交媒体中的偏见[3，4]的研究提供信息。Holterhoff，K.（2017）“从免责声明到批评：种族和数字图像档案。《数字人文季刊》11.3网址：http://digitalhumanities.org:8081/dhq/vol/11/3/ 000324.html2。IEEE。（2016年）伦理对齐的设计：优先考虑人类福祉与人工智能和自主系统的愿景。版本1. http://standards.ieee.org/develop/indconn/ ec/casualous%20systems.html 12.05.20183. Recasens，M.，Danescu-Nculescu-米齐尔，C.，Jurafsky，D.（2013年）。“分析和检测有偏见的语言的语言模型。第51届计算语言学协会年会论文集，1650-1659。