Effective Computational Optimization in Data Mining and Financial Applications
数据挖掘和金融应用中的有效计算优化
基本信息
- 批准号:RGPIN-2014-03978
- 负责人:
- 金额:$ 1.89万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2019
- 资助国家:加拿大
- 起止时间:2019-01-01 至 2020-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
In the past five years, two phrases dominate discussion in the global news: market instability and big data. The main focus of this proposal is to develop computationally efficient and effective methods to optimally utilize available data, in order to improve health care, business and finance, including detection and prevention of systematic risk in financial markets.**After the 2008 financial market collapse, many have asked the question: why had there been no warnings of the signs of troubles? If these signs exist, why were they not detected? How can we achieve better detection and prevention of such a future market demise?**The Dodd Frank Wall Street Reform and Consumer Protection Act has resulted in a societal mandate to identify risks to financial stability from the events of financial firms. Although there is no clear definition of systemic risk measure, Biasias et al (2012) recently propose that a robust framework incorporating a diverse collection of perspectives and processes be adopted to dynamically adapt systemic risk measures to changes in financial market structures. Khandani et al (2010) have applied machine learning techniques to bank transactions and credit-bureau data of customers in order to predict consumer credit risk. In particular, it is suggested that the proportion of predicted delinquencies is a signal to systemic risk indicator in consumer lending. Hardle et al (2007) use scores from support vector machines to estimate default probabilities of financial firms.**With increasing accumulation of data, efficient and effective data mining methods stand to potentially offer solutions to challenging problems faced in finance, business, and our lives in general. The "2011 McKinsey Report on Big Data" estimates that data mining could potentially bring $300 billion annual value to US health care, 250 billion annual value to the European public administration sector, and a $600 billion potential consumer surplus. While there have been major advances in information gathering, to turn these estimates into realities, we need a commensurate advance in data analytics. **The urgency of solving challenging data analysis problems is illustrated by the recent Heritage Provider Network (HPN) sponsored global incentivized competition. In an effort to identify at-risk individuals earlier and ensure they receive prompt treatment, the objective of the competition was to create algorithms that use patient data to predict hospitalizations. The competition ran for two years with a grand prize of $3 million, attracting nearly 2000 participants from various disciplines around the world. Together with my PhD student Aditya Tayal* and colleague Thomas Coleman, we investigated and developed several computational optimization algorithms that ultimately led us to securing a fourth place ranking in the competition.**A data mining method has three components: minimizing training error, maximizing stability and a mechanism to balance the trade off between the two objectives. The remaining challenging optimization problems in data mining are typically nonconvex and large scale. For example, in many real data analysis problems, only very limited labels are available. How do we learn a predictive model, using partial label information? How do we optimally select features from a collection of available data that are relevant for a particular prediction task? Many practical data mining problems have a rare class and a majority class. How do we develop computationally efficiently nonlinear methods for these unbalanced problems? The main goal of the research proposed here is to solve these challenging but relevant optimization problems in data mining and apply them to health care, finance, business, and other industries.
过去五年,两个词主导了全球新闻的讨论:市场不稳定和大数据。该提案的主要重点是开发计算效率高且有效的方法来最佳地利用现有数据,以改善医疗保健、商业和金融,包括检测和预防金融市场的系统性风险。**2008 年金融市场崩溃后,许多人提出这样的问题:为什么没有出现问题迹象的警告?如果这些迹象存在,为什么没有被发现? 我们如何才能更好地发现和预防此类未来的市场消亡?**《多德·弗兰克华尔街改革和消费者保护法案》已导致社会授权识别金融公司事件对金融稳定的风险。尽管系统性风险衡量标准尚无明确定义,但 Biasias 等人(2012)最近提出,应采用一个包含多种观点和流程的稳健框架,以动态调整系统性风险衡量标准以适应金融市场结构的变化。 Khandani 等人(2010)将机器学习技术应用于银行交易和客户信用局数据,以预测消费者信用风险。特别是,预测拖欠比例是消费贷款系统性风险指标的一个信号。 Hardle 等人 (2007) 使用支持向量机的分数来估计金融公司的违约概率。**随着数据积累的不断增加,高效且有效的数据挖掘方法有可能为金融、商业和我们的生活中面临的挑战性问题提供解决方案。 《2011年麦肯锡大数据报告》估计,数据挖掘每年可能为美国医疗保健带来3000亿美元的价值,为欧洲公共管理部门带来2500亿美元的年价值,以及6000亿美元的潜在消费者剩余。虽然信息收集方面取得了重大进展,但为了将这些估计变成现实,我们需要在数据分析方面取得相应的进步。 **最近遗产提供商网络 (HPN) 赞助的全球激励竞赛说明了解决具有挑战性的数据分析问题的紧迫性。 为了尽早识别高危人群并确保他们得到及时治疗,竞赛的目标是创建使用患者数据预测住院情况的算法。 该赛事历时两年,奖金达300万美元,吸引了来自世界各地近2000名不同学科的参赛者。 我们与我的博士生 Aditya Tayal* 和同事 Thomas Coleman 一起研究并开发了几种计算优化算法,最终使我们在竞赛中获得了第四名。**数据挖掘方法包含三个组成部分:最小化训练误差、最大化稳定性以及平衡两个目标之间的权衡机制。数据挖掘中剩下的具有挑战性的优化问题通常是非凸的和大规模的。 例如,在许多实际的数据分析问题中,只有非常有限的标签可用。我们如何使用部分标签信息来学习预测模型?我们如何从与特定预测任务相关的可用数据集合中最佳地选择特征?许多实际的数据挖掘问题都有稀有类和多数类。我们如何为这些不平衡问题开发计算有效的非线性方法?这里提出的研究的主要目标是解决数据挖掘中这些具有挑战性但相关的优化问题,并将其应用于医疗保健、金融、商业和其他行业。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Li, Yuying其他文献
Immunotherapy combined with chemotherapy improved clinical outcomes over bevacizumab combined with chemotherapy as first-line therapy in adenocarcinoma patients.
- DOI:
10.1002/cam4.5356 - 发表时间:
2023-03 - 期刊:
- 影响因子:4
- 作者:
Wang, Min;Li, Ji;Xu, Shuhui;Li, Yuying;Li, Jiatong;Yu, Jinming;Tang, Xiaoyong;Zhu, Hui - 通讯作者:
Zhu, Hui
Influence of Atmospheric Phosphorus and Nitrogen Sedimentation on Water Quality in the Middle Route Project of the South-to-North Water Diversion in Henan Province.
河南省南水北调中线工程大气磷、氮沉积对水质的影响
- DOI:
10.3390/ijerph192114346 - 发表时间:
2022-11-02 - 期刊:
- 影响因子:0
- 作者:
Qiu, Yunlin;Zhang, Yun;Lan, Pengcheng;Liu, Han;Wang, Hongtian;Wang, Wanping;Zhao, Peng;Li, Yuying - 通讯作者:
Li, Yuying
Preparation and Biochemical Characteristics of a New IgG-Type Monoclonal Antibody against K Subgroup Avian Leukosis Virus.
- DOI:
10.1021/acsomega.2c06375 - 发表时间:
2023-01-10 - 期刊:
- 影响因子:4.1
- 作者:
Zhang, Xiaochen;Li, Hongmei;Wang, Chengcheng;Du, Yixuan;Li, Yuying;Zhang, Liwei;Huang, Mengjie;Qiu, Jianhua;Guo, Huijun - 通讯作者:
Guo, Huijun
Phosphate-Functionalized Polyethylene with High Adsorption of Uranium(VI)
高吸附铀(VI)的磷酸盐官能化聚乙烯
- DOI:
10.1021/acsomega.7b00375 - 发表时间:
2017-07-01 - 期刊:
- 影响因子:4.1
- 作者:
Shao, Dadong;Li, Yuying;Marwani, Hadi M. - 通讯作者:
Marwani, Hadi M.
Integrated metagenomics and molecular ecological network analysis of bacterial community composition during the phytoremediation of cadmium-contaminated soils by bioenergy crops
生物能源作物修复镉污染土壤过程中细菌群落组成的综合宏基因组学和分子生态网络分析
- DOI:
10.1016/j.ecoenv.2017.07.019 - 发表时间:
2017-11-01 - 期刊:
- 影响因子:6.8
- 作者:
Chen, Zhaojin;Zheng, Yuan;Li, Yuying - 通讯作者:
Li, Yuying
Li, Yuying的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Li, Yuying', 18)}}的其他基金
Methodology of Learning Optimal Decisions from Market Data in Financial Technology
金融科技中从市场数据学习最优决策的方法
- 批准号:
RGPIN-2020-04331 - 财政年份:2022
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
Methodology of Learning Optimal Decisions from Market Data in Financial Technology
金融科技中从市场数据学习最优决策的方法
- 批准号:
RGPIN-2020-04331 - 财政年份:2021
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
A data driven approach for optimal stochastic control in finance
金融领域最优随机控制的数据驱动方法
- 批准号:
530985-2018 - 财政年份:2020
- 资助金额:
$ 1.89万 - 项目类别:
Collaborative Research and Development Grants
Methodology of Learning Optimal Decisions from Market Data in Financial Technology
金融科技中从市场数据学习最优决策的方法
- 批准号:
RGPIN-2020-04331 - 财政年份:2020
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
A data driven approach for optimal stochastic control in finance
金融领域最优随机控制的数据驱动方法
- 批准号:
530985-2018 - 财政年份:2019
- 资助金额:
$ 1.89万 - 项目类别:
Collaborative Research and Development Grants
A data driven approach for optimal stochastic control in finance
金融领域最优随机控制的数据驱动方法
- 批准号:
530985-2018 - 财政年份:2018
- 资助金额:
$ 1.89万 - 项目类别:
Collaborative Research and Development Grants
Effective Computational Optimization in Data Mining and Financial Applications
数据挖掘和金融应用中的有效计算优化
- 批准号:
RGPIN-2014-03978 - 财政年份:2017
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
Effective Computational Optimization in Data Mining and Financial Applications
数据挖掘和金融应用中的有效计算优化
- 批准号:
RGPIN-2014-03978 - 财政年份:2016
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
Effective Computational Optimization in Data Mining and Financial Applications
数据挖掘和金融应用中的有效计算优化
- 批准号:
RGPIN-2014-03978 - 财政年份:2015
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
Effective Computational Optimization in Data Mining and Financial Applications
数据挖掘和金融应用中的有效计算优化
- 批准号:
RGPIN-2014-03978 - 财政年份:2014
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
Computational Methods for Analyzing Toponome Data
- 批准号:60601030
- 批准年份:2006
- 资助金额:17.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Computational Infrastructure for Automated Force Field Development and Optimization
用于自动力场开发和优化的计算基础设施
- 批准号:
10699200 - 财政年份:2023
- 资助金额:
$ 1.89万 - 项目类别:
Coupling PDE-Based Computational Inversion and Learning Via Weighted Optimization
通过加权优化耦合基于偏微分方程的计算反演和学习
- 批准号:
2309802 - 财政年份:2023
- 资助金额:
$ 1.89万 - 项目类别:
Standard Grant
Development of a Fast and Accurate Computational Method through Learning-based Iterative Alternating Optimization
通过基于学习的迭代交替优化开发快速准确的计算方法
- 批准号:
23K16953 - 财政年份:2023
- 资助金额:
$ 1.89万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Development of prodrug-type anticancer drugs using a design of experiment system based on computational chemistry and Bayesian optimization
利用基于计算化学和贝叶斯优化的实验系统设计开发前药型抗癌药物
- 批准号:
23K19424 - 财政年份:2023
- 资助金额:
$ 1.89万 - 项目类别:
Grant-in-Aid for Research Activity Start-up
New statistical and computational tools for optimization of planarian behavioral chemical screens
用于优化涡虫行为化学筛选的新统计和计算工具
- 批准号:
10658688 - 财政年份:2023
- 资助金额:
$ 1.89万 - 项目类别:
Advanced Colonoscopy Training Developed Through Manikin Sensorization and Computational Optimization Modeling
通过人体模型传感和计算优化建模开发的高级结肠镜检查培训
- 批准号:
10719474 - 财政年份:2023
- 资助金额:
$ 1.89万 - 项目类别:
Computational Optimization and Intelligence for Decision Making
计算优化和智能决策
- 批准号:
DDG-2019-05314 - 财政年份:2022
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Development Grant
Computational and experimental assessment of pelvic stability and optimization of technology to guide reconstruction
骨盆稳定性的计算和实验评估以及指导重建的技术优化
- 批准号:
RGPIN-2022-04993 - 财政年份:2022
- 资助金额:
$ 1.89万 - 项目类别:
Discovery Grants Program - Individual
A Computational Framework for Design and Optimization of Dynamic Membrane Processes
动态膜过程设计和优化的计算框架
- 批准号:
2140946 - 财政年份:2022
- 资助金额:
$ 1.89万 - 项目类别:
Standard Grant
CDS&E: Enabling Quantum Technology Design Optimization Using Large-Scale Quantum Information Preserving Computational Electromagnetics Methods
CDS
- 批准号:
2202389 - 财政年份:2022
- 资助金额:
$ 1.89万 - 项目类别:
Standard Grant














{{item.name}}会员




