权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Effective Computational Optimization in Data Mining and Financial Applications

数据挖掘和金融应用中的有效计算优化

基本信息

批准号：
RGPIN-2014-03978
负责人：
Li, Yuying
金额：
$ 1.89万
依托单位：
University of Waterloo
依托单位国家：
加拿大
项目类别：
Discovery Grants Program - Individual
财政年份：
2014
资助国家：
加拿大
起止时间：
2014-01-01 至 2015-12-31
项目状态：
已结题

来源：
https://www.nserc-crsng.gc.ca/ase-oro/Details-Detailles_eng.asp?id=566214
关键词：
Effective Computational Optimization Data Mining

项目摘要

In the past five years, two phrases dominate discussion in the global news: market instability and big data. The main focus of this proposal is to develop computationally efficient and effective methods to optimally utilize available data, in order to improve health care, business and finance, including detection and prevention of systematic risk in financial markets. After the 2008 financial market collapse, many have asked the question: why had there been no warnings of the signs of troubles? If these signs exist, why were they not detected? How can we achieve better detection and prevention of such a future market demise? The Dodd Frank Wall Street Reform and Consumer Protection Act has resulted in a societal mandate to identify risks to financial stability from the events of financial firms. Although there is no clear definition of systemic risk measure, Biasias et al (2012) recently propose that a robust framework incorporating a diverse collection of perspectives and processes be adopted to dynamically adapt systemic risk measures to changes in financial market structures. Khandani et al (2010) have applied machine learning techniques to bank transactions and credit-bureau data of customers in order to predict consumer credit risk. In particular, it is suggested that the proportion of predicted delinquencies is a signal to systemic risk indicator in consumer lending. Hardle et al (2007) use scores from support vector machines to estimate default probabilities of financial firms. With increasing accumulation of data, efficient and effective data mining methods stand to potentially offer solutions to challenging problems faced in finance, business, and our lives in general. The “2011 McKinsey Report on Big Data” estimates that data mining could potentially bring $300 billion annual value to US health care, 250 billion annual value to the European public administration sector, and a $600 billion potential consumer surplus. While there have been major advances in information gathering, to turn these estimates into realities, we need a commensurate advance in data analytics. The urgency of solving challenging data analysis problems is illustrated by the recent Heritage Provider Network (HPN) sponsored global incentivized competition. In an effort to identify at-risk individuals earlier and ensure they receive prompt treatment, the objective of the competition was to create algorithms that use patient data to predict hospitalizations. The competition ran for two years with a grand prize of $3 million, attracting nearly 2000 participants from various disciplines around the world. Together with my PhD student Aditya Tayal* and colleague Thomas Coleman, we investigated and developed several computational optimization algorithms that ultimately led us to securing a fourth place ranking in the competition. A data mining method has three components: minimizing training error, maximizing stability and a mechanism to balance the trade off between the two objectives. The remaining challenging optimization problems in data mining are typically nonconvex and large scale. For example, in many real data analysis problems, only very limited labels are available. How do we learn a predictive model, using partial label information? How do we optimally select features from a collection of available data that are relevant for a particular prediction task? Many practical data mining problems have a rare class and a majority class. How do we develop computationally efficiently nonlinear methods for these unbalanced problems? The main goal of the research proposed here is to solve these challenging but relevant optimization problems in data mining and apply them to health care, finance, business, and other industries.

在过去的五年里，两个短语主导了全球新闻的讨论：市场不稳定和大数据。该提案的主要重点是开发计算效率高和有效的方法，以最佳方式利用现有数据，以改善医疗保健、商业和金融，包括发现和预防金融市场的系统性风险。2008年金融市场崩溃后，许多人提出了这样一个问题：为什么没有出现任何麻烦迹象的警告？如果这些迹象存在，为什么没有被发现？我们如何才能更好地发现和预防这种未来的市场消亡？《多德-弗兰克华尔街改革和消费者保护法》导致了一项社会任务，即从金融公司的事件中识别金融稳定的风险。虽然系统性风险度量没有明确的定义，但Biasias等人（2012）最近提出，应采用一个包含各种观点和过程的强大框架，以动态调整系统性风险度量以适应金融市场结构的变化。Khandani等人（2010年）将机器学习技术应用于银行交易和客户信用局数据，以预测消费者信用风险。特别是，它建议，预测拖欠的比例是一个信号，以系统性风险指标，在消费贷款。Hardle等人（2007）使用支持向量机的分数来估计金融公司的违约概率。随着数据的不断积累，高效和有效的数据挖掘方法有望为金融，商业和我们生活中面临的挑战性问题提供解决方案。《2011年麦肯锡大数据报告》估计，数据挖掘可能为美国医疗保健带来3000亿美元的年价值，为欧洲公共管理部门带来2500亿美元的年价值，以及6000亿美元的潜在消费者盈余。虽然在信息收集方面取得了重大进展，但要将这些估计变为现实，我们需要在数据分析方面取得相应的进展。解决具有挑战性的数据分析问题的紧迫性是由最近的遗产提供商网络（HPN）赞助的全球激励竞争。为了更早地识别高危人群并确保他们得到及时治疗，比赛的目标是创建使用患者数据预测住院的算法。比赛为期两年，奖金高达300万美元，吸引了来自世界各地不同学科的近2000名参赛者。我们与我的博士生Aditya Tayal* 和同事托马斯科尔曼一起研究并开发了几种计算优化算法，最终使我们在比赛中获得了第四名。数据挖掘方法有三个组成部分：最小化训练误差，最大化稳定性和平衡两个目标之间的权衡机制。数据挖掘中剩下的具有挑战性的优化问题通常是非凸的和大规模的。例如，在许多真实的数据分析问题中，只有非常有限的标签可用。我们如何使用部分标签信息来学习预测模型？我们如何从一系列可用数据中最佳地选择与特定预测任务相关的特征？许多实际的数据挖掘问题都有一个稀有类和一个多数类。我们如何开发计算效率高的非线性方法，这些不平衡的问题？这里提出的研究的主要目标是解决这些具有挑战性但相关的数据挖掘优化问题，并将其应用到医疗保健，金融，商业和其他行业。

项目成果

期刊论文数量（0）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

数据更新时间：{{ journalArticles.updateTime }}

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Li, Yuying其他文献

Immunotherapy combined with chemotherapy improved clinical outcomes over bevacizumab combined with chemotherapy as first-line therapy in adenocarcinoma patients.

DOI：
10.1002/cam4.5356
发表时间：
2023-03
期刊：
CANCER MEDICINE
影响因子：
4
作者：
Wang, Min;Li, Ji;Xu, Shuhui;Li, Yuying;Li, Jiatong;Yu, Jinming;Tang, Xiaoyong;Zhu, Hui
通讯作者：
Zhu, Hui

Influence of Atmospheric Phosphorus and Nitrogen Sedimentation on Water Quality in the Middle Route Project of the South-to-North Water Diversion in Henan Province.

河南省南水北调中线工程大气磷、氮沉积对水质的影响

DOI：
10.3390/ijerph192114346
发表时间：
2022-11-02
期刊：
INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH
影响因子：
0
作者：
Qiu, Yunlin;Zhang, Yun;Lan, Pengcheng;Liu, Han;Wang, Hongtian;Wang, Wanping;Zhao, Peng;Li, Yuying
通讯作者：
Li, Yuying

Preparation and Biochemical Characteristics of a New IgG-Type Monoclonal Antibody against K Subgroup Avian Leukosis Virus.

DOI：
10.1021/acsomega.2c06375
发表时间：
2023-01-10
期刊：
ACS OMEGA
影响因子：
4.1
作者：
Zhang, Xiaochen;Li, Hongmei;Wang, Chengcheng;Du, Yixuan;Li, Yuying;Zhang, Liwei;Huang, Mengjie;Qiu, Jianhua;Guo, Huijun
通讯作者：
Guo, Huijun