Record Linkage Across Heterogeneous Data Sources
记录异构数据源之间的链接
基本信息
- 批准号:RGPIN-2014-05304
- 负责人:
- 金额:$ 1.68万
- 依托单位:
- 依托单位国家:加拿大
- 项目类别:Discovery Grants Program - Individual
- 财政年份:2015
- 资助国家:加拿大
- 起止时间:2015-01-01 至 2016-12-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
In the context of creating longitudinal data from census data, record linkage refers to finding the same person across several censuses. The recent emergence of 100 percent national census collections enables a systematic identification and linking of the same individuals across censuses in order to create a new database of individual life-course information. The main challenge is that unique identifiers do not exist for this historical data, thus one must use attributes common to all of the databases and compare their values to determine whether two records refer to the same entity. Other challenges are presented by different database formats, typographical errors, missing data and ill-reported data (both intentional and inadvertent). Furthermore, not everyone in a census is present in the next one because death and emigration remove people from the population, while births and immigration add new people who were not present in the previous census but who may have characteristics similar to those who were present. Finally, processing the cross product of millions of records when linking two census collections presents significant computation challenges.
The overall objective of my research is to create and extract knowledge from historical longitudinal data. The motivation driving my research is to significantly advance the understanding of the Canadian society through computational science. As part of this proposal, I will focus on two goals: the short term goal is the creation of large scale longitudinal data from historical censuses, a complex yet critical step towards my objective; and the long term goal of automatically extracting useful knowledge from the longitudinal data that would enrich our understanding about the history and economics of the Canadian society.
The recent emergence of 100 percent digitized Canadian census collections enables for the first time a large scale, data-driven, understanding of key society changes such as migration, social mobility, labour market adjustments and intergenerational inequality. The main challenge from a social science perspective is the large scale generation of individual life-course information due to the mostly manual linking techniques employed, strongly limiting the data available for their studies. A first key impact
of my research will consist of automatic large scale generation of longitudinal data from historical censuses. To achieve this, I will significantly advance the state-of-the art in the automated record linkage through the five key results that will act as milestone towards my short term goal: better feature construction; tighter bounds for
candidate selection; more accurate classification models; increased linking coverage through the use of family information; and, a standardized benchmark for evaluating and validating historical record linkage. I will share the resulting longitudinal data with researchers in history and the social sciences; they have been waiting for longitudinal data of this nature and scale in order to resolve pressing research questions about society, history and economy. A second key impact will consist of employing knowledge extraction techniques on the generated longitudinal data to identify interesting patterns about the Canadian society. I expect that this result will also require devising new knowledge extraction techniques that are better at identifying patterns in large-scale longitudinal data, which should also make them suitable for related research areas such as social networks. Moreover, by sharing the patterns with social researchers, we will both validate and further advance the understanding of the socio-economic
changes of the Canadian society.
在根据人口普查数据创建纵向数据的背景下,记录联系指的是在几次人口普查中找到同一个人。最近出现了100%的全国人口普查集合,这使得能够系统地识别和联系不同人口普查中的相同个人,以便创建一个新的个人生命历程信息数据库。主要的挑战是这些历史数据不存在唯一的标识符,因此必须使用所有数据库共有的属性并比较它们的值,以确定两条记录是否引用相同的实体。其他挑战还包括不同的数据库格式、排版错误、数据缺失和报告错误的数据(包括有意和无意的)。此外,并不是所有人口普查的人都会出现在下一次人口普查中,因为死亡和移民会将人口从人口中移走,而出生和移民则会增加新的人口,这些人在上一次人口普查中没有出现,但他们的特征可能与之前的人口普查中的人口特征相似。最后,在连接两个人口普查集合时,处理数百万条记录的叉积将带来巨大的计算挑战。
我研究的总体目标是从历史纵向数据中创造和提取知识。推动我研究的动机是通过计算科学显著促进对加拿大社会的理解。作为这项提议的一部分,我将重点关注两个目标:短期目标是从历史人口普查中创建大规模纵向数据,这是实现我的目标的复杂但关键的一步;长期目标是自动从纵向数据中提取有用的知识,这将丰富我们对加拿大社会历史和经济的理解。
最近出现了100%数字化的加拿大人口普查收集,首次使人们能够大规模、以数据为导向,了解关键的社会变化,如移徙、社会流动性、劳动力市场调整和代际不平等。从社会科学的角度来看,主要的挑战是,由于使用的主要是人工链接技术,个人生命历程信息的大规模生成,严重限制了可用于他们研究的数据。第一个关键影响
我的研究将包括从历史人口普查中自动大规模生成纵向数据。为了实现这一目标,我将通过作为我短期目标的里程碑的五个关键结果,显著推进自动记录链接的最先进水平:更好的功能构建;更严格的界限
这包括:选择候选人;更准确的分类模型;通过使用家庭信息扩大联系覆盖面;以及评价和验证历史记录联系的标准化基准。我将与历史和社会科学研究人员分享由此产生的纵向数据;他们一直在等待这种性质和规模的纵向数据,以解决有关社会、历史和经济的紧迫研究问题。第二个关键影响将包括对生成的纵向数据采用知识提取技术,以确定关于加拿大社会的有趣模式。我预计,这一结果还需要设计新的知识提取技术,这些技术更善于识别大规模纵向数据中的模式,这也应该使它们适用于社交网络等相关研究领域。此外,通过与社会研究人员分享这些模式,我们将验证并进一步促进对社会经济
加拿大社会的变化。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Antonie, Luiza其他文献
Selection Bias Encountered in the Systematic Linking of Historical Census Records
- DOI:
10.1017/ssh.2020.15 - 发表时间:
2020-01-01 - 期刊:
- 影响因子:0.8
- 作者:
Antonie, Luiza;Inwood, Kris;Summerfield, Fraser - 通讯作者:
Summerfield, Fraser
Tracking people over time in 19th century Canada for longitudinal analysis
- DOI:
10.1007/s10994-013-5421-0 - 发表时间:
2014-04-01 - 期刊:
- 影响因子:7.5
- 作者:
Antonie, Luiza;Inwood, Kris;Ross, J. Andrew - 通讯作者:
Ross, J. Andrew
Full-Time and Part-Time Work and the Gender Wage Gap
- DOI:
10.1007/s11293-020-09677-z - 发表时间:
2020-08-13 - 期刊:
- 影响因子:0.6
- 作者:
Antonie, Luiza;Gatto, Laura;Plesca, Miana - 通讯作者:
Plesca, Miana
Antonie, Luiza的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Antonie, Luiza', 18)}}的其他基金
Bias and Representativeness in Linked Data
关联数据的偏差和代表性
- 批准号:
RGPIN-2020-05948 - 财政年份:2022
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Bias and Representativeness in Linked Data
关联数据的偏差和代表性
- 批准号:
RGPIN-2020-05948 - 财政年份:2021
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Bias and Representativeness in Linked Data
关联数据的偏差和代表性
- 批准号:
RGPIN-2020-05948 - 财政年份:2020
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Data unification for customer profile generation
用于生成客户档案的数据统一
- 批准号:
543346-2019 - 财政年份:2019
- 资助金额:
$ 1.68万 - 项目类别:
Engage Grants Program
Record Linkage Across Heterogeneous Data Sources
记录异构数据源之间的链接
- 批准号:
RGPIN-2014-05304 - 财政年份:2019
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Record Linkage Across Heterogeneous Data Sources
记录异构数据源之间的链接
- 批准号:
RGPIN-2014-05304 - 财政年份:2018
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Record Linkage Across Heterogeneous Data Sources
记录异构数据源之间的链接
- 批准号:
RGPIN-2014-05304 - 财政年份:2017
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Record Linkage Across Heterogeneous Data Sources
记录异构数据源之间的链接
- 批准号:
RGPIN-2014-05304 - 财政年份:2016
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
Record Linkage Across Heterogeneous Data Sources
记录异构数据源之间的链接
- 批准号:
RGPIN-2014-05304 - 财政年份:2014
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Grants Program - Individual
相似国自然基金
运用Linkage Chemistry合成新型聚合物缀合物和刷形共聚物
- 批准号:20974058
- 批准年份:2009
- 资助金额:12.0 万元
- 项目类别:面上项目
连锁群选育法(Linkage Group Selection)在柔嫩艾美耳球虫表型相关基因研究中应用
- 批准号:30700601
- 批准年份:2007
- 资助金额:17.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Informing intervention responses to violent offenders through data linkage
通过数据链接告知对暴力犯罪者的干预反应
- 批准号:
DP240101812 - 财政年份:2024
- 资助金额:
$ 1.68万 - 项目类别:
Discovery Projects
Healthy Jozi: A Staged Approach to Better Workplace Food Choices and Chronic Disease Screening and Linkage to Care
健康 Jozi:更好的工作场所食物选择和慢性病筛查以及与护理联系的分阶段方法
- 批准号:
MR/Z000467/1 - 财政年份:2024
- 资助金额:
$ 1.68万 - 项目类别:
Research Grant
Linkage of HIV amino acid variants to protective host alleles at CHD1L and HLA class I loci in an African population
非洲人群中 HIV 氨基酸变异与 CHD1L 和 HLA I 类基因座的保护性宿主等位基因的关联
- 批准号:
502556 - 财政年份:2024
- 资助金额:
$ 1.68万 - 项目类别:
Occupational exposure to ionizing radiation and the impacts on cancer incidence, and mortality: a record linkage cohort study of nearly one million workers in the Canadian National Dose Registry
电离辐射的职业暴露及其对癌症发病率和死亡率的影响:一项针对加拿大国家剂量登记处近百万工人的创纪录的连锁队列研究
- 批准号:
480070 - 财政年份:2023
- 资助金额:
$ 1.68万 - 项目类别:
Operating Grants
REACHing underserved and undiagnosed populations living with STBBIs in Alberta, Saskatchewan and Manitoba: "Test, treat and linkage to culturally appropriate care"
覆盖阿尔伯塔省、萨斯喀彻温省和曼尼托巴省服务不足和未确诊的 STBBI 患者:“检测、治疗并与文化上适当的护理联系起来”
- 批准号:
487552 - 财政年份:2023
- 资助金额:
$ 1.68万 - 项目类别:
Operating Grants
Linkage Projects - Grant ID: LP200301389
联动项目 - 拨款 ID:LP200301389
- 批准号:
ARC : LP200301389 - 财政年份:2023
- 资助金额:
$ 1.68万 - 项目类别:
Linkage Projects
Linkage Projects - Grant ID: LP200301540
联动项目 - 拨款 ID:LP200301540
- 批准号:
ARC : LP200301540 - 财政年份:2023
- 资助金额:
$ 1.68万 - 项目类别:
Linkage Projects
Development and application of high-throughput glycoproteomics using sialic acid linkage specific derivatization
唾液酸键特异性衍生化高通量糖蛋白组学的开发与应用
- 批准号:
23K06078 - 财政年份:2023
- 资助金额:
$ 1.68万 - 项目类别:
Grant-in-Aid for Scientific Research (C)
Elucidation of the pathological control mechanisms of heart failure via a nutritional approach in the Gut-Heart linkage.
通过肠-心联系中的营养方法阐明心力衰竭的病理控制机制。
- 批准号:
23K16816 - 财政年份:2023
- 资助金额:
$ 1.68万 - 项目类别:
Grant-in-Aid for Early-Career Scientists
Construction of an evaluation system for feeding and swallowing disorders and sarcopenia after stroke and search for key molecules in the brain-gut-muscle linkage
中风后进食吞咽障碍及肌少症评价体系的构建及脑-肠-肌联系关键分子的探索
- 批准号:
23H03263 - 财政年份:2023
- 资助金额:
$ 1.68万 - 项目类别:
Grant-in-Aid for Scientific Research (B)