Enriching, repairing and merging taxonomies by inducing qualitative spatial representations from the web
通过从网络中引入定性空间表示来丰富、修复和合并分类法
基本信息
- 批准号:EP/K021788/1
- 负责人:
- 金额:$ 12.61万
- 依托单位:
- 依托单位国家:英国
- 项目类别:Research Grant
- 财政年份:2013
- 资助国家:英国
- 起止时间:2013 至 无数据
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Taxonomies encode how different terms or concepts from a given domain are related to each other. They are used to standardise vocabularies (e.g. biologists use taxonomies to organise species into broader categories such as family and order), and to categorise content such that it can be more easily searched (e.g. librarians assigning categories from a taxonomy to books). While taxonomies are traditionally the result of a careful and time-consuming manual process, recent developments in the world wide web have led to a proliferation of taxonomies of a more informal nature. Online retailers such as Amazon, for instance, organise their products using an ad hoc taxonomy, which reflects how customers use their website, rather than any commitment on the semantics of the underlying product categories. Similarly, applications such as Foursquare allow users to contribute to a taxonomy of place types.While these informal taxonomies are useful to organise online content (e.g. products on Amazon, or venues on Foursquare), they are often of poor quality, and difficult to reuse among different applications. Moreover, like traditional taxonomies, they focus on a very limited set of semantic relations; usually only the relation "is a sub-category of" is considered. In contrast, in practice the semantic relationship between two categories may not be so clear-cut, among others because of the existence of borderline cases (e.g. should a pub which serves food be categorised as a restaurant?). Nonetheless, the widespread availability of taxonomies is of potentially great interest, provided that they can be improved using automated methods. The goal of this project is to study how such an improvement can be realised, by statistically analysing meta-data that is available on the web, and in particular from so-called Web 2.0 websites such as Flickr, where users describe photos using short textual annotations called tags.The proposed approach is built on the idea of discovering semantic relationships between categories by statistically analysing such meta-data. On the one hand, these relations will encode information about typicality and similarity. To see why such relations are useful, consider an application which allows a user to search for restaurants in Cardiff. The search engine may rank venues of type "restaurant" by taking into account features such as distance to the city centre and average ratings (if available). However, as another criterion, one would also want to see "normal" restaurants before venues such as breakfast places, coffee houses, or pubs, which may be considered as restaurants, broadly speaking, but are not what users would typically be interested in when querying about restaurants. Similarly, when the user's query asks about "Sichuan restaurants in Cardiff", and no such restaurants are known, instances of the most similar categories may be shown instead (e.g. Cantonese restaurants). On the other hand, the relations that are discovered will also encode information that can help us to pinpoint likely errors in existing taxonomies and that can help us to merge different taxonomies to get a single coherent view of a given domain. In particular, these relations will allow us to detect irregularities in existing taxonomies. For example, given the assumption that similar categories usually have similar properties, and the knowledge that Cantonese and Sichuan restaurants are very similar, a taxonomy in which Cantonese and Sichuan restaurants are both sub-categories of Chinese restaurants will be considered more regular than a taxonomy in which they have different super-categories.Our approach is unique in its data-driven approach to enrich taxonomies with semantic relations for common-sense reasoning, as well as in the proposed methods for repairing and merging existing taxonomies. Regarding applications, the results of this project will form a crucial stepping-stone towards more intelligent search engines.
分类法对来自给定领域的不同术语或概念如何相互关联进行编码。它们被用来标准化词汇(例如,生物学家使用分类法将物种组织成更广泛的类别,如科和目),并对内容进行分类,以便更容易搜索(例如,图书管理员将分类法中的类别分配给书籍)。虽然分类法传统上是一个仔细而耗时的人工过程的结果,但最近万维网的发展导致了分类法的扩散,这种分类法的性质更加非正式。例如,亚马逊(Amazon)等在线零售商使用一种特别的分类法来组织产品,这种分类法反映了客户如何使用其网站,而不是对潜在产品类别的语义做出任何承诺。类似地,Foursquare等应用程序允许用户对地点类型进行分类。虽然这些非正式的分类法对组织在线内容很有用(例如Amazon上的产品,或Foursquare上的场所),但它们通常质量很差,而且难以在不同的应用程序中重用。此外,与传统分类法一样,它们只关注非常有限的语义关系;通常只考虑关系“是of的子类别”。相比之下,在实践中,两个类别之间的语义关系可能不那么明确,其中一个原因是存在边缘情况(例如,提供食物的酒吧是否应归类为餐馆?)尽管如此,分类法的广泛可用性具有潜在的巨大意义,前提是分类法可以使用自动化方法进行改进。该项目的目标是研究如何通过统计分析网络上可用的元数据来实现这种改进,特别是来自所谓的web 2.0网站,如Flickr,用户使用称为标签的简短文本注释来描述照片。提出的方法是建立在通过统计分析这些元数据来发现类别之间的语义关系的思想之上的。一方面,这些关系将编码关于典型性和相似性的信息。要了解这种关系为什么有用,请考虑一个允许用户搜索Cardiff餐馆的应用程序。搜索引擎可能会根据到市中心的距离和平均评分(如果有的话)等特征对“餐厅”类型的场所进行排名。然而,作为另一个标准,人们还希望在诸如早餐店、咖啡馆或酒吧之类的场所之前看到“正常”的餐馆,这些场所从广义上讲可能被视为餐馆,但在查询餐馆时,用户通常不会感兴趣。类似地,当用户查询“卡迪夫的四川餐馆”,而没有这样的餐馆时,可能会显示最相似类别的实例(例如广东餐馆)。另一方面,发现的关系还将编码信息,这些信息可以帮助我们查明现有分类法中可能存在的错误,并帮助我们合并不同的分类法,以获得给定领域的单一一致视图。特别是,这些关系将允许我们检测现有分类法中的不规则性。例如,假设相似的类别通常具有相似的属性,并且知道广东餐馆和四川餐馆非常相似,那么广东餐馆和四川餐馆都是中国餐馆的子类别的分类法将被认为比它们具有不同超类别的分类法更有规律。我们的方法的独特之处在于,它采用数据驱动的方法,用语义关系丰富分类法,以便进行常识性推理,我们还提出了修复和合并现有分类法的方法。在应用方面,这个项目的成果将成为迈向更智能搜索引擎的重要基石。
项目成果
期刊论文数量(6)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Inducing semantic relations from conceptual spaces: A data-driven approach to plausible reasoning
- DOI:10.1016/j.artint.2015.07.002
- 发表时间:2015-11
- 期刊:
- 影响因子:0
- 作者:J. Derrac;Steven Schockaert
- 通讯作者:J. Derrac;Steven Schockaert
Realizing RCC8 networks using convex regions
使用凸区域实现 RCC8 网络
- DOI:10.48550/arxiv.1410.2442
- 发表时间:2014
- 期刊:
- 影响因子:0
- 作者:Schockaert S
- 通讯作者:Schockaert S
Commonsense reasoning based on betweenness and direction in distributional models
分布模型中基于介数和方向的常识推理
- DOI:
- 发表时间:2015
- 期刊:
- 影响因子:0
- 作者:Steven Schockaert
- 通讯作者:Steven Schockaert
Characterising Semantic Relatedness using Interpretable Directions in Conceptual Spaces
- DOI:10.3233/978-1-61499-419-0-243
- 发表时间:2014-08
- 期刊:
- 影响因子:0
- 作者:J. Derrac;Steven Schockaert
- 通讯作者:J. Derrac;Steven Schockaert
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Steven Schockaert其他文献
Using social media to find places of interest: a case study
使用社交媒体寻找感兴趣的地方:案例研究
- DOI:
10.1145/2442952.2442954 - 发表时间:
2012 - 期刊:
- 影响因子:0
- 作者:
Steven Van Canneyt;O. Laere;Steven Schockaert;B. Dhoedt - 通讯作者:
B. Dhoedt
Cardiff University at SemEval-2020 Task 6: Fine-tuning BERT for Domain-Specific Definition Classification
卡迪夫大学 SemEval-2020 任务 6:针对特定领域的定义分类微调 BERT
- DOI:
- 发表时间:
2020 - 期刊:
- 影响因子:0
- 作者:
Shelan S. Jeawak;Luis Espinosa Anke;Steven Schockaert - 通讯作者:
Steven Schockaert
Possible and Necessary Answer Sets of Possibilistic Answer Set Programs
可能性答案集程序的可能和必要答案集
- DOI:
10.1109/ictai.2012.117 - 发表时间:
2012 - 期刊:
- 影响因子:0
- 作者:
Kim Bauters;Steven Schockaert;M. D. Cock;D. Vermeir - 通讯作者:
D. Vermeir
Modelling Monotonic and Non-Monotonic Attribute Dependencies with Embeddings: A Theoretical Analysis
- DOI:
10.24432/c5gw2z - 发表时间:
2021-06 - 期刊:
- 影响因子:0
- 作者:
Steven Schockaert - 通讯作者:
Steven Schockaert
Sentence Selection Strategies for Distilling Word Embeddings from BERT
从 BERT 中提取词嵌入的句子选择策略
- DOI:
- 发表时间:
2022 - 期刊:
- 影响因子:0
- 作者:
Yixiao Wang;Zied Bouraoui;Luis Espinosa;Steven Schockaert - 通讯作者:
Steven Schockaert
Steven Schockaert的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Steven Schockaert', 18)}}的其他基金
Reasoning about Structured Story Representations
关于结构化故事表示的推理
- 批准号:
EP/W003309/1 - 财政年份:2022
- 资助金额:
$ 12.61万 - 项目类别:
Fellowship
Encyclopedic Lexical Representations for Natural Language Processing
自然语言处理的百科全书式词汇表示
- 批准号:
EP/V025961/1 - 财政年份:2021
- 资助金额:
$ 12.61万 - 项目类别:
Research Grant
相似海外基金
Small animal model for evaluating the impacts of cleft lip repairing scar on craniofacial growth and development
评价唇裂修复疤痕对颅面生长发育影响的小动物模型
- 批准号:
10642519 - 财政年份:2023
- 资助金额:
$ 12.61万 - 项目类别:
Storying and repairing water places in Wiradjuri Country
讲述和修复 Wiradjuri 地区的水源
- 批准号:
LP220100360 - 财政年份:2023
- 资助金额:
$ 12.61万 - 项目类别:
Linkage Projects
Harnessing the potential of CNS progenitor cells for repairing progressive multiple sclerosis.
利用中枢神经系统祖细胞修复进行性多发性硬化症的潜力。
- 批准号:
488468 - 财政年份:2023
- 资助金额:
$ 12.61万 - 项目类别:
Operating Grants
FMRG Eco: Manufacturing, repairing, and re-using biomineralized infrastructure materials through low-energy biological processes
FMRG Eco:通过低能耗生物过程制造、修复和再利用生物矿化基础设施材料
- 批准号:
2328351 - 财政年份:2023
- 资助金额:
$ 12.61万 - 项目类别:
Standard Grant
Repairing memory & place: An Indigenous-led approach to urban water design
修复记忆
- 批准号:
LP200301587 - 财政年份:2023
- 资助金额:
$ 12.61万 - 项目类别:
Linkage Projects
Hierarchically-Structured Conduits with Programmed Release of Neurotrophic Factors for Repairing Large Defects in Thick Nerves
具有程序化释放神经营养因子的分层结构导管用于修复粗神经的大缺损
- 批准号:
10579569 - 财政年份:2023
- 资助金额:
$ 12.61万 - 项目类别:
The role of the inflammatory microenvironment in Acellular Nerve Allografts (ANAs) repairing nerve gaps
炎症微环境在无细胞神经同种异体移植物(ANA)修复神经间隙中的作用
- 批准号:
10678384 - 财政年份:2023
- 资助金额:
$ 12.61万 - 项目类别:
SaTC: CORE: Small: Hardware-assisted Self-repairing in Decentralized Cloud Storage against Malicious Attacks
SaTC:CORE:小型:去中心化云存储中的硬件辅助自我修复抵御恶意攻击
- 批准号:
2225424 - 财政年份:2022
- 资助金额:
$ 12.61万 - 项目类别:
Standard Grant
Insights into the function of senescent osteoblast in fracture repairing.
深入了解衰老成骨细胞在骨折修复中的功能。
- 批准号:
22K20960 - 财政年份:2022
- 资助金额:
$ 12.61万 - 项目类别:
Grant-in-Aid for Research Activity Start-up
Smart Self-Repairing System for Concrete Beams
混凝土梁智能自修复系统
- 批准号:
RGPIN-2019-04285 - 财政年份:2022
- 资助金额:
$ 12.61万 - 项目类别:
Discovery Grants Program - Individual