权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Combining Text Mining and Multivariate Time Series Modelling

结合文本挖掘和多元时间序列建模

基本信息

批准号：
426470111
负责人：
Professor Dr. Peter Winker
金额：
--
依托单位：
Lehrstuhl für Statistik und Ökonometrie
依托单位国家：
德国
项目类别：
Research Grants
财政年份：
2019
资助国家：
德国
起止时间：
2018-12-31 至 2023-12-31
项目状态：
已结题

来源：
https://gepris.dfg.de/gepris/projekt/426470111?language=en
关键词：
Combining Text Mining Multivariate Time

项目摘要

Collections of texts are considered as a valuable source of information for applied economic analysis. Recent developments in the access to large sets of documents, e.g., scientific abstracts, articles, news items, social media messages or statements of different institutions, and in the methods developed for extracting information from texts increase the interest in this type of data. However, the knowledge about the performance of these methods, in particular when combined with the usual econometric methods is still rather limited. Therefore, the objective of the TEXTMOD project is to contribute to the development of methods and to improve the understanding of how the information obtained from text mining can be incorporated in econometric models. Thereby, the focus is on multivariate time series models. The indicators are constructed using models which try to identify relevant themes in large collections of documents without human intervention. An example of text-based time series, which can be of interest in economic research and can add information content to classical real economic indicators, is a topic trend describing how the importance of a given topic (e.g. related to inflation) changed over time. While a substantial number of methods have been proposed over the last few years for identifying topics and their trends over time, there is little evidence on the statistical properties of these procedures, their relative performance and their interaction with more traditional modelling approaches. Consequently, a central aim of the project is to investigate sensitivity to parameter settings, robustness to variations of the textual sample and uncertainty associated with these algorithms. In the project, additional methods for comparing the results of topic modelling across samples or resulting from different methods will be proposed. In a further important step, different methods for deriving trends in topics will be considered and finally the consequences of including them in time series models, e.g., the widely used vector autoregressive model, will be studied. Special emphasis will be put on the appropriate interpretation of results, evaluation of additional insights from using text-based data and rigorous measurement of the estimation uncertainty which will be captured by means of joint confidence bands. The methods will be applied to study the relationships between real economic indicators and trends in topics found for scientific corpora in economics from Poland and Germany.

文本集被认为是应用经济分析的宝贵信息来源。最近在获取大量文件（例如科学摘要、文章、新闻、社交媒体信息或不同机构的声明）以及从文本中提取信息的方法方面的发展，增加了对这类数据的兴趣。然而，关于这些方法的性能的知识，特别是当与通常的计量经济学方法相结合时，仍然相当有限。因此，TEXTMOD项目的目标是促进方法的发展，并提高对如何将从文本挖掘中获得的信息纳入计量经济模型的理解。因此，重点是多变量时间序列模型。这些指标是使用模型构建的，这些模型试图在没有人为干预的情况下识别大量文件中的相关主题。基于文本的时间序列的一个例子是描述给定主题（例如与通货膨胀相关）的重要性如何随时间变化的主题趋势，它可以在经济研究中引起兴趣，并可以为经典的实体经济指标添加信息内容。虽然在过去几年中提出了大量的方法来确定主题及其随时间的趋势，但关于这些程序的统计特性、它们的相对性能以及它们与更传统的建模方法的相互作用的证据很少。因此，该项目的中心目标是研究对参数设置的敏感性，对文本样本变化的鲁棒性以及与这些算法相关的不确定性。在该项目中，将提出其他方法来比较跨样本或不同方法产生的主题建模结果。在另一个重要的步骤中，将考虑得出主题趋势的不同方法，最后将研究将它们包括在时间序列模型中的后果，例如广泛使用的向量自回归模型。将特别强调对结果的适当解释、对使用基于文本的数据的额外见解的评价以及将通过联合置信带获得的估计不确定性的严格测量。这些方法将用于研究波兰和德国经济学科学语料库中发现的主题中实际经济指标与趋势之间的关系。