Effective Computational Optimization in Data Mining and Financial Applications
数据挖掘和金融应用中的有效计算优化
基本信息
- 批准号:RGPIN-2014-03978
- 负责人:
- 金额:$ 1.89万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2014
- 资助国家:加拿大
- 起止时间:2014-01-01 至 2015-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
In the past five years, two phrases dominate discussion in the global news: market instability and big data. The main focus of this proposal is to develop computationally efficient and effective methods to optimally utilize available data, in order to improve health care, business and finance, including detection and prevention of systematic risk in financial markets. After the 2008 financial market collapse, many have asked the question: why had there been no warnings of the signs of troubles? If these signs exist, why were they not detected? How can we achieve better detection and prevention of such a future market demise? The Dodd Frank Wall Street Reform and Consumer Protection Act has resulted in a societal mandate to identify risks to financial stability from the events of financial firms. Although there is no clear definition of systemic risk measure, Biasias et al (2012) recently propose that a robust framework incorporating a diverse collection of perspectives and processes be adopted to dynamically adapt systemic risk measures to changes in financial market structures. Khandani et al (2010) have applied machine learning techniques to bank transactions and credit-bureau data of customers in order to predict consumer credit risk. In particular, it is suggested that the proportion of predicted delinquencies is a signal to systemic risk indicator in consumer lending. Hardle et al (2007) use scores from support vector machines to estimate default probabilities of financial firms. With increasing accumulation of data, efficient and effective data mining methods stand to potentially offer solutions to challenging problems faced in finance, business, and our lives in general. The “2011 McKinsey Report on Big Data” estimates that data mining could potentially bring $300 billion annual value to US health care, 250 billion annual value to the European public administration sector, and a $600 billion potential consumer surplus. While there have been major advances in information gathering, to turn these estimates into realities, we need a commensurate advance in data analytics. The urgency of solving challenging data analysis problems is illustrated by the recent Heritage Provider Network (HPN) sponsored global incentivized competition. In an effort to identify at-risk individuals earlier and ensure they receive prompt treatment, the objective of the competition was to create algorithms that use patient data to predict hospitalizations. The competition ran for two years with a grand prize of $3 million, attracting nearly 2000 participants from various disciplines around the world. Together with my PhD student Aditya Tayal* and colleague Thomas Coleman, we investigated and developed several computational optimization algorithms that ultimately led us to securing a fourth place ranking in the competition. A data mining method has three components: minimizing training error, maximizing stability and a mechanism to balance the trade off between the two objectives. The remaining challenging optimization problems in data mining are typically nonconvex and large scale. For example, in many real data analysis problems, only very limited labels are available. How do we learn a predictive model, using partial label information? How do we optimally select features from a collection of available data that are relevant for a particular prediction task? Many practical data mining problems have a rare class and a majority class. How do we develop computationally efficiently nonlinear methods for these unbalanced problems? The main goal of the research proposed here is to solve these challenging but relevant optimization problems in data mining and apply them to health care, finance, business, and other industries.
在过去的五年里,两个短语主导了全球新闻的讨论:市场不稳定和大数据。该提案的主要重点是开发计算效率高和有效的方法,以最佳方式利用现有数据,以改善医疗保健、商业和金融,包括发现和预防金融市场的系统性风险。2008年金融市场崩溃后,许多人提出了这样一个问题:为什么没有出现任何麻烦迹象的警告?如果这些迹象存在,为什么没有被发现?我们如何才能更好地发现和预防这种未来的市场消亡?《多德-弗兰克华尔街改革和消费者保护法》导致了一项社会任务,即从金融公司的事件中识别金融稳定的风险。虽然系统性风险度量没有明确的定义,但Biasias等人(2012)最近提出,应采用一个包含各种观点和过程的强大框架,以动态调整系统性风险度量以适应金融市场结构的变化。Khandani等人(2010年)将机器学习技术应用于银行交易和客户信用局数据,以预测消费者信用风险。特别是,它建议,预测拖欠的比例是一个信号,以系统性风险指标,在消费贷款。Hardle等人(2007)使用支持向量机的分数来估计金融公司的违约概率。随着数据的不断积累,高效和有效的数据挖掘方法有望为金融,商业和我们生活中面临的挑战性问题提供解决方案。《2011年麦肯锡大数据报告》估计,数据挖掘可能为美国医疗保健带来3000亿美元的年价值,为欧洲公共管理部门带来2500亿美元的年价值,以及6000亿美元的潜在消费者盈余。虽然在信息收集方面取得了重大进展,但要将这些估计变为现实,我们需要在数据分析方面取得相应的进展。解决具有挑战性的数据分析问题的紧迫性是由最近的遗产提供商网络(HPN)赞助的全球激励竞争。为了更早地识别高危人群并确保他们得到及时治疗,比赛的目标是创建使用患者数据预测住院的算法。比赛为期两年,奖金高达300万美元,吸引了来自世界各地不同学科的近2000名参赛者。我们与我的博士生Aditya Tayal* 和同事托马斯科尔曼一起研究并开发了几种计算优化算法,最终使我们在比赛中获得了第四名。数据挖掘方法有三个组成部分:最小化训练误差,最大化稳定性和平衡两个目标之间的权衡机制。数据挖掘中剩下的具有挑战性的优化问题通常是非凸的和大规模的。例如,在许多真实的数据分析问题中,只有非常有限的标签可用。我们如何使用部分标签信息来学习预测模型?我们如何从一系列可用数据中最佳地选择与特定预测任务相关的特征?许多实际的数据挖掘问题都有一个稀有类和一个多数类。我们如何开发计算效率高的非线性方法,这些不平衡的问题?这里提出的研究的主要目标是解决这些具有挑战性但相关的数据挖掘优化问题,并将其应用到医疗保健,金融,商业和其他行业。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Li, Yuying其他文献
Immunotherapy combined with chemotherapy improved clinical outcomes over bevacizumab combined with chemotherapy as first-line therapy in adenocarcinoma patients.
- DOI:
10.1002/cam4.5356 - 发表时间:
2023-03 - 期刊:
- 影响因子:4
- 作者:
Wang, Min;Li, Ji;Xu, Shuhui;Li, Yuying;Li, Jiatong;Yu, Jinming;Tang, Xiaoyong;Zhu, Hui - 通讯作者:
Zhu, Hui
Influence of Atmospheric Phosphorus and Nitrogen Sedimentation on Water Quality in the Middle Route Project of the South-to-North Water Diversion in Henan Province.
河南省南水北调中线工程大气磷、氮沉积对水质的影响
- DOI:
10.3390/ijerph192114346 - 发表时间:
2022-11-02 - 期刊:
- 影响因子:0
- 作者:
Qiu, Yunlin;Zhang, Yun;Lan, Pengcheng;Liu, Han;Wang, Hongtian;Wang, Wanping;Zhao, Peng;Li, Yuying - 通讯作者:
Li, Yuying
Preparation and Biochemical Characteristics of a New IgG-Type Monoclonal Antibody against K Subgroup Avian Leukosis Virus.
- DOI:
10.1021/acsomega.2c06375 - 发表时间:
2023-01-10 - 期刊:
- 影响因子:4.1
- 作者:
Zhang, Xiaochen;Li, Hongmei;Wang, Chengcheng;Du, Yixuan;Li, Yuying;Zhang, Liwei;Huang, Mengjie;Qiu, Jianhua;Guo, Huijun - 通讯作者:
Guo, Huijun
Phosphate-Functionalized Polyethylene with High Adsorption of Uranium(VI)
高吸附铀(VI)的磷酸盐官能化聚乙烯
- DOI:
10.1021/acsomega.7b00375 - 发表时间:
2017-07-01 - 期刊:
- 影响因子:4.1
- 作者:
Shao, Dadong;Li, Yuying;Marwani, Hadi M. - 通讯作者:
Marwani, Hadi M.
Integrated metagenomics and molecular ecological network analysis of bacterial community composition during the phytoremediation of cadmium-contaminated soils by bioenergy crops
生物能源作物修复镉污染土壤过程中细菌群落组成的综合宏基因组学和分子生态网络分析
- DOI:
10.1016/j.ecoenv.2017.07.019 - 发表时间:
2017-11-01 - 期刊:
- 影响因子:6.8
- 作者:
Chen, Zhaojin;Zheng, Yuan;Li, Yuying - 通讯作者:
Li, Yuying
Li, Yuying的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Li, Yuying', 18)}}的其他基金
Methodology of Learning Optimal Decisions from Market Data in Financial Technology
金融科技中从市场数据学习最优决策的方法
- 批准号:
RGPIN-2020-04331 - 财政年份:2022
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
Methodology of Learning Optimal Decisions from Market Data in Financial Technology
金融科技中从市场数据学习最优决策的方法
- 批准号:
RGPIN-2020-04331 - 财政年份:2021
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
A data driven approach for optimal stochastic control in finance
金融领域最优随机控制的数据驱动方法
- 批准号:
530985-2018 - 财政年份:2020
- 资助金额:
$ 1.89万 - 项目类别:
Collaborative Research and Development Grants
Methodology of Learning Optimal Decisions from Market Data in Financial Technology
金融科技中从市场数据学习最优决策的方法
- 批准号:
RGPIN-2020-04331 - 财政年份:2020
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
Effective Computational Optimization in Data Mining and Financial Applications
数据挖掘和金融应用中的有效计算优化
- 批准号:
RGPIN-2014-03978 - 财政年份:2019
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
A data driven approach for optimal stochastic control in finance
金融领域最优随机控制的数据驱动方法
- 批准号:
530985-2018 - 财政年份:2019
- 资助金额:
$ 1.89万 - 项目类别:
Collaborative Research and Development Grants
A data driven approach for optimal stochastic control in finance
金融领域最优随机控制的数据驱动方法
- 批准号:
530985-2018 - 财政年份:2018
- 资助金额:
$ 1.89万 - 项目类别:
Collaborative Research and Development Grants
Effective Computational Optimization in Data Mining and Financial Applications
数据挖掘和金融应用中的有效计算优化
- 批准号:
RGPIN-2014-03978 - 财政年份:2017
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
Effective Computational Optimization in Data Mining and Financial Applications
数据挖掘和金融应用中的有效计算优化
- 批准号:
RGPIN-2014-03978 - 财政年份:2016
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
Effective Computational Optimization in Data Mining and Financial Applications
数据挖掘和金融应用中的有效计算优化
- 批准号:
RGPIN-2014-03978 - 财政年份:2015
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
Computational Methods for Analyzing Toponome Data
- 批准号:60601030
- 批准年份:2006
- 资助金额:17.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Computational Infrastructure for Automated Force Field Development and Optimization
用于自动力场开发和优化的计算基础设施
- 批准号:
10699200 - 财政年份:2023
- 资助金额:
$ 1.89万 - 项目类别:
Coupling PDE-Based Computational Inversion and Learning Via Weighted Optimization
通过加权优化耦合基于偏微分方程的计算反演和学习
- 批准号:
2309802 - 财政年份:2023
- 资助金额:
$ 1.89万 - 项目类别:
Standard Grant
Development of a Fast and Accurate Computational Method through Learning-based Iterative Alternating Optimization
通过基于学习的迭代交替优化开发快速准确的计算方法
- 批准号:
23K16953 - 财政年份:2023
- 资助金额:
$ 1.89万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Development of prodrug-type anticancer drugs using a design of experiment system based on computational chemistry and Bayesian optimization
利用基于计算化学和贝叶斯优化的实验系统设计开发前药型抗癌药物
- 批准号:
23K19424 - 财政年份:2023
- 资助金额:
$ 1.89万 - 项目类别:
Grant-in-Aid for Research Activity Start-up
New statistical and computational tools for optimization of planarian behavioral chemical screens
用于优化涡虫行为化学筛选的新统计和计算工具
- 批准号:
10658688 - 财政年份:2023
- 资助金额:
$ 1.89万 - 项目类别:
Advanced Colonoscopy Training Developed Through Manikin Sensorization and Computational Optimization Modeling
通过人体模型传感和计算优化建模开发的高级结肠镜检查培训
- 批准号:
10719474 - 财政年份:2023
- 资助金额:
$ 1.89万 - 项目类别:
Computational and experimental assessment of pelvic stability and optimization of technology to guide reconstruction
骨盆稳定性的计算和实验评估以及指导重建的技术优化
- 批准号:
RGPIN-2022-04993 - 财政年份:2022
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
Computational Optimization and Intelligence for Decision Making
计算优化和智能决策
- 批准号:
DDG-2019-05314 - 财政年份:2022
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Development Grant
CAREER: Automated Synthesis of Compound Machines Using Computational Design Optimization
职业:使用计算设计优化自动合成复合机器
- 批准号:
2311078 - 财政年份:2022
- 资助金额:
$ 1.89万 - 项目类别:
Standard Grant
A Computational Framework for Design and Optimization of Dynamic Membrane Processes
动态膜过程设计和优化的计算框架
- 批准号:
2140946 - 财政年份:2022
- 资助金额:
$ 1.89万 - 项目类别:
Standard Grant