Pattern and Knowledge Discovery on Relational, Biosequence and Multiple Temporal Sequence Data
关系、生物序列和多时间序列数据的模式和知识发现
基本信息
- 批准号:RGPIN-2017-05042
- 负责人:
- 金额:$ 1.68万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2018
- 资助国家:加拿大
- 起止时间:2018-01-01 至 2019-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
As information technology advances, a tremendous amount of data is generated in all industries. There is an increasing amount of attempts to leverage this large amount of data, with the assistance of high-performance computing to develop intelligent system for various applications. Despite their success, the models that turn data into these applications are often black-boxes. This has two drawbacks. First, the users have no trust since “how” is hidden. Second, it is difficult for human to interpret “why”. ******Here we introduce a new paradigm: From Pattern to Knowledge (P2K). It first discovers strong statistical associations/relations from data autonomously. It represents them as patterns, pattern clusters and their association/co-occurrence to reflect the “what” and “where” of critical information without explicit reliance on prior knowledge usually unavailable or difficult to get. It then comes up with the “how” of robust algorithms to conduct analysis and direct further search to disclose the “why” of the underlying mechanisms --- interpretable/verifiable. P2K will make existing machine intelligence approaches more robust and reliable while revealing useful and actionable knowledge. ******Hence, the objective of this proposal is to develop P2K, targeting on 3 types of data at its initial phase: relational, bio-sequence and multiple temporal sequence data. We choose bio-sequence from bioinformatics as a platform to validate the scientific values and effectiveness of P2K. In the last five years, from biosequences, we have developed algorithms to discover, prune, locate and analyze statistically significant patterns, pattern clusters and their association/co-occurrences so as to reveal local and distant functional domains and relationship without relying explicitly on prior knowledge or clues; b) use the patterns discovered as features for predictive analysis. The effectiveness of P2K is backed by strong publications.******In the next five years, for biosequence data we will develop algorithms to predict binding sites/partners between proteins, protein and DNA/RNA, protein and aptamers to reduce user's reliance on structures, saving them time/budget and help drug discovery and disease treatment to identify small molecules that can inhibit binding. For relational data, we will complete a scalable system to discover and analyze patterns for mixed-mode data, including using patterns extracted from business/finance reports via our text mining module Text-P2K in a semi-supervised fashion to assist decision making. For multiple time-series data, we will leverage the discovered temporal associations of pattern clusters to capture a wide range of local relations along and across individual series and use them as features/patterns for interpretation and forecasting. It can help finance firms to identify rare movements to control risk, and help factories to identify machinery faults in advance.
随着信息技术的进步,各行各业都产生了大量的数据。越来越多的人试图利用这些大量的数据,在高性能计算的帮助下开发用于各种应用的智能系统。尽管它们取得了成功,但将数据转化为这些应用程序的模型通常是黑匣子。这有两个缺点。首先,用户没有信任,因为“如何”是隐藏的。 第二,人类很难解释“为什么”。** 这里我们介绍一种新的范式:从模式到知识(P2 K)。它首先自主地从数据中发现强统计关联/关系。它将它们表示为模式、模式集群及其关联/共现,以反映关键信息的“什么”和“在哪里”,而无需明确依赖通常不可用或难以获得的先验知识。然后,它提出了“如何”强大的算法进行分析和指导进一步的搜索,以揭示“为什么”的底层机制-可解释/可验证。P2 K将使现有的机器智能方法更加强大和可靠,同时揭示有用和可操作的知识。** 因此,本提案的目标是发展P2 K,在其初始阶段针对3种类型的数据:关系数据、生物序列数据和多时间序列数据。我们选择生物信息学中的生物序列作为平台来验证P2 K的科学价值和有效性。在过去的五年里,从生物序列,我们已经开发出算法来发现,修剪,定位和分析统计上显著的模式,模式集群及其关联/共现,以揭示本地和远程功能域和关系,而不显着依赖于先验知识或线索; B)使用发现的模式作为预测分析的特征。P2 K的有效性得到了强有力的出版物的支持。在接下来的五年里,对于生物序列数据,我们将开发算法来预测蛋白质、蛋白质与DNA/RNA、蛋白质与适体之间的结合位点/伴侣,以减少用户对结构的依赖,节省他们的时间/预算,并帮助药物发现和疾病治疗识别可以抑制结合的小分子。对于关系数据,我们将完成一个可扩展的系统来发现和分析混合模式数据的模式,包括使用通过我们的文本挖掘模块Text-P2 K以半监督方式从业务/财务报告中提取的模式来辅助决策。对于多个时间序列数据,我们将利用所发现的模式集群的时间关联来捕获沿着和跨单个序列的广泛的局部关系,并将它们用作解释和预测的特征/模式。它可以帮助金融公司识别罕见的运动,以控制风险,并帮助工厂提前识别机器故障。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Wong, Andrew其他文献
Text as data: Using text-based features for proteins representation and for computational prediction of their characteristics
- DOI:
10.1016/j.ymeth.2014.10.027 - 发表时间:
2015-03-01 - 期刊:
- 影响因子:4.8
- 作者:
Shatkay, Hagit;Brady, Scott;Wong, Andrew - 通讯作者:
Wong, Andrew
The effect of mid-life insulin resistance and type 2 diabetes on older-age cognitive state: the explanatory role of early-life advantage
- DOI:
10.1007/s00125-019-4949-3 - 发表时间:
2019-10-01 - 期刊:
- 影响因子:8.2
- 作者:
James, Sarah-Naomi;Wong, Andrew;Richards, Marcus - 通讯作者:
Richards, Marcus
Immunochromatographic diagnostic test analysis using Google Glass.
- DOI:
10.1021/nn500614k - 发表时间:
2014-03-25 - 期刊:
- 影响因子:17.1
- 作者:
Feng, Steve;Caire, Romain;Cortazar, Bingen;Turan, Mehmet;Wong, Andrew;Ozcan, Aydogan - 通讯作者:
Ozcan, Aydogan
Does Bitcoin behave as a currency?: A standard monetary model approach
- DOI:
10.1016/j.irfa.2020.101518 - 发表时间:
2020-07-01 - 期刊:
- 影响因子:8.2
- 作者:
Hui, Cho-Hoi;Lo, Chi-Fai;Wong, Andrew - 通讯作者:
Wong, Andrew
Crude oil price dynamics with crash risk under fundamental shocks
- DOI:
10.1016/j.najef.2020.101238 - 发表时间:
2020-11-01 - 期刊:
- 影响因子:3.6
- 作者:
Hui, Cho-Hoi;Lo, Chi-Fai;Wong, Andrew - 通讯作者:
Wong, Andrew
Wong, Andrew的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Wong, Andrew', 18)}}的其他基金
Pattern and Knowledge Discovery on Relational, Biosequence and Multiple Temporal Sequence Data
关系、生物序列和多时间序列数据的模式和知识发现
- 批准号:
RGPIN-2017-05042 - 财政年份:2021
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Pattern and Knowledge Discovery on Relational, Biosequence and Multiple Temporal Sequence Data
关系、生物序列和多时间序列数据的模式和知识发现
- 批准号:
RGPIN-2017-05042 - 财政年份:2020
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Pattern and Knowledge Discovery on Relational, Biosequence and Multiple Temporal Sequence Data
关系、生物序列和多时间序列数据的模式和知识发现
- 批准号:
RGPIN-2017-05042 - 财政年份:2019
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Biomechanical Investigation of Subsynovial Connective Tissue Motion Relating to Carpal Tunnel Syndrome
与腕管综合征相关的滑膜下结缔组织运动的生物力学研究
- 批准号:
543466-2019 - 财政年份:2019
- 资助金额:
$ 1.68万 - 项目类别:
Alexander Graham Bell Canada Graduate Scholarships - Master's
Market Assessment proposal for P2K (pattern to knowledge) deep knowledge discovery artificial intelligence (A.I.) software system
P2K(知识模式)深度知识发现人工智能(A.I.)软件系统市场评估建议
- 批准号:
523278-2018 - 财政年份:2018
- 资助金额:
$ 1.68万 - 项目类别:
Idea to Innovation
Decreasing the moisture content variation of flakes in the drying process of Oriented Strand Board (OSB) manufacturing using multivariate time-series pattern discovery
使用多元时间序列模式发现来减少定向刨花板 (OSB) 制造干燥过程中薄片的含水量变化
- 批准号:
529533-2018 - 财政年份:2018
- 资助金额:
$ 1.68万 - 项目类别:
Engage Grants Program
Piloting study of a novel process for recycling spent lithium-ion batteries
废旧锂离子电池回收新工艺试点研究
- 批准号:
517611-2017 - 财政年份:2018
- 资助金额:
$ 1.68万 - 项目类别:
Experience Awards (previously Industrial Undergraduate Student Research Awards)
The effects of muscle fatigue on shoulder function
肌肉疲劳对肩部功能的影响
- 批准号:
509513-2017 - 财政年份:2017
- 资助金额:
$ 1.68万 - 项目类别:
University Undergraduate Student Research Awards
Bench scale study of a novel process for recycling spent lithium-ion batteries
废旧锂离子电池回收新工艺的实验室规模研究
- 批准号:
508253-2017 - 财政年份:2017
- 资助金额:
$ 1.68万 - 项目类别:
Experience Awards (previously Industrial Undergraduate Student Research Awards)
Pattern and Knowledge Discovery on Relational, Biosequence and Multiple Temporal Sequence Data
关系、生物序列和多时间序列数据的模式和知识发现
- 批准号:
RGPIN-2017-05042 - 财政年份:2017
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
相似海外基金
Pattern and Knowledge Discovery on Relational, Biosequence and Multiple Temporal Sequence Data
关系、生物序列和多时间序列数据的模式和知识发现
- 批准号:
RGPIN-2017-05042 - 财政年份:2021
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Pattern and Knowledge Discovery on Relational, Biosequence and Multiple Temporal Sequence Data
关系、生物序列和多时间序列数据的模式和知识发现
- 批准号:
RGPIN-2017-05042 - 财政年份:2020
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Pattern and Knowledge Discovery on Relational, Biosequence and Multiple Temporal Sequence Data
关系、生物序列和多时间序列数据的模式和知识发现
- 批准号:
RGPIN-2017-05042 - 财政年份:2019
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Market Assessment proposal for P2K (pattern to knowledge) deep knowledge discovery artificial intelligence (A.I.) software system
P2K(知识模式)深度知识发现人工智能(A.I.)软件系统市场评估建议
- 批准号:
523278-2018 - 财政年份:2018
- 资助金额:
$ 1.68万 - 项目类别:
Idea to Innovation
Pattern and Knowledge Discovery on Relational, Biosequence and Multiple Temporal Sequence Data
关系、生物序列和多时间序列数据的模式和知识发现
- 批准号:
RGPIN-2017-05042 - 财政年份:2017
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Knowledge Discovery from Huge Data Ensemble by an Integration of Automatic Data Selection and Pattern Extraction
通过集成自动数据选择和模式提取从海量数据集合中发现知识
- 批准号:
25280085 - 财政年份:2013
- 资助金额:
$ 1.68万 - 项目类别:
Grant-in-Aid for Scientific Research (B)
Study on Pattern Inference based on Positive Examples and its Application to Knowledge Discovery
基于正例的模式推理及其在知识发现中的应用研究
- 批准号:
19500125 - 财政年份:2007
- 资助金额:
$ 1.68万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Pattern analysis and knowledge discovery: theory and applications
模式分析和知识发现:理论与应用
- 批准号:
8042-1992 - 财政年份:1995
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Pattern analysis and knowledge discovery: theory and applications
模式分析与知识发现:理论与应用
- 批准号:
8042-1992 - 财政年份:1994
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Pattern analysis and knowledge discovery: theory and applications
模式分析和知识发现:理论与应用
- 批准号:
8042-1992 - 财政年份:1993
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual