权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Mathematical methods and algorithms for learningeffective embeddings of semi-structured informationfor anomaly detection problems

用于学习半结构化信息有效嵌入以解决异常检测问题的数学方法和算法

基本信息

批准号：
448795504
负责人：
Professor Dr. Martin Spindler
金额：
--
依托单位：
Russian Foundation for Basic Research
依托单位国家：
德国
项目类别：
Research Grants
财政年份：
资助国家：
德国
起止时间：
项目状态：
未结题

来源：
https://gepris.dfg.de/gepris/projekt/448795504?language=en
关键词：
Mathematical methods algorithms learningeffective embeddings

项目摘要

The rise of digitization leads to the availability of huge and novel data sets which are often semi-structured. Although the analysis of such data sets is challenging, it offers great opportunities for researchers. The goal of the project is to develop models for better anomaly detection on the base of those semi-structured data. Health care industry provides challenges related to important applications like fraud detection, recommendation systems and decision support systems. These challenges can be solved with learning from collected data. Economic and financial (time series) industry also require outlier and novelty detection as an important first step in processing time series data.In those domains it is of vital importance to detect anomalies and outliers, as they have a high relevance. For example, the case of fraudulent claims, which usually differ considerably from default claims, shall be detected. In clinical / medical decision support systems unusual cases which need special treatment should be filtered out. For economic and financial data it is very important to perform outlier and change detection in an automatic way. The goal of this project is to develop Deep Learning and Machine Learning methods for anomaly and outlier detection and apply them to the tasks mentioned above, namely fraud detection in insurance and outlier detection in financial time series. These will be possible as all the tasks above share the type of input data related to important problems in healthcare, economics and financial areas: they are sequences of various length, so they belong to semi-structured datasets.The project consists of three parts. First, development of efficient deep representations and embeddings of semi-structured information such as graphs and sequences. Doing this, we will construct efficient semantic-level similarity measures, which will allow us to establish what is the norm to detect anomaly. Second, we will develop effective end-to-end learnable approaches to anomaly detection and imbalanced classification for semi-structured information. Third, we'll develop problem-oriented data mining approaches for fraud detection, outlier detection in (financial) time series, recommendation systems and decision support systems with applicationsin health care, insurance, finance and economics.To sum up, the final goal of this proposal is to enable effective representations of semi-structured information and develop end-to-end approaches for anomaly detection, that are ready to use for the solution of real-world applied problems.

数字化的兴起带来了大量新颖的数据集，这些数据集通常是半结构化的。虽然分析这些数据集具有挑战性，但它为研究人员提供了巨大的机会。该项目的目标是在这些半结构化数据的基础上开发更好的异常检测模型。医疗保健行业提供了与欺诈检测、推荐系统和决策支持系统等重要应用相关的挑战。这些挑战可以通过从收集的数据中学习来解决。经济和金融（时间序列）行业也需要异常值和新奇值检测作为处理时间序列数据的重要第一步。在这些领域中，检测异常值和异常值至关重要，因为它们具有高度的相关性。例如，欺诈性索赔案件，通常与违约索赔有很大区别，应予以发现。在临床/医疗决策支持系统中，需要特殊处理的异常病例应被过滤掉。对于经济和金融数据，以自动的方式执行异常和变化检测是非常重要的。该项目的目标是开发用于异常和异常检测的深度学习和机器学习方法，并将其应用于上述任务，即保险中的欺诈检测和金融时间序列中的异常检测。这些都是可能的，因为上述所有任务都共享与医疗保健、经济和金融领域的重要问题相关的输入数据类型：它们是不同长度的序列，因此它们属于半结构化数据集。首先，开发有效的深度表示和半结构化信息（如图形和序列）的嵌入。这样做，我们将构建有效的语义级相似性度量，这将使我们能够建立什么是检测异常的规范。其次，我们将开发有效的端到端可学习方法来对半结构化信息进行异常检测和不平衡分类。第三，我们将开发面向问题的数据挖掘方法，用于欺诈检测，异常检测，（金融）时间序列，推荐系统和决策支持系统，应用于医疗保健，保险，金融和经济。总之，本提案的最终目标是实现半结构化信息的有效表示，并开发端到端的异常检测方法，可以用来解决实际应用问题。