Creating a Data Quality Control Framework for Producing New Personnel-Based S&E Indicators
创建数据质量控制框架以产生新的基于人员的S
基本信息
- 批准号:1917663
- 负责人:
- 金额:$ 46.2万
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2019
- 资助国家:美国
- 起止时间:2019-09-01 至 2022-08-31
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
Science and Engineering (S&E) research generates substantial returns in terms of human knowledge, social and economic benefits. Nations around the globe compete for scientific and technological leadership through substantial research funding and focused efforts to develop highly trained workforces. To date, efforts to measure and understand national and international trends in S&E and to assess global strengths and weaknesses, have largely relied on the analysis of documents such as patents and publications using big, growing datasets. But this approach too often misses or mistakenly identifies the people and teams who do productive science and engineering work. Robust indicators of the size, composition, collaboration, and mobility of the S&E workforce within and across nations are largely missing from analysis and reporting. These key aspects of the national and international scientific enterprise are poorly captured by data analysis focused on documents and citations. To address this problem, this project develops person level workforce and collaboration measures that could add granularity to comparisons of international S&E competitiveness and lead to new policy insights for S&E workforce training, hiring, and retention for a nation's future. The prerequisite of such person level indicators is that individual researchers who appear in multiple bibliographic datasets are correctly identified and linked. Effective identification and linkage of authors based on their names is daunting because names are often ambiguous. This is particularly the case for Asian names, which poses a significant problem as Asian researchers play an increasingly important role in many fields of research. This project addresses the challenge of systematically and routinely disambiguating names in big bibliographic datasets using a new Automated and Stratified Entity Disambiguation framework. Core datasets for this effort are derived using a new method that relies on multiple data fields and an iterative process to automatically create disambiguated datasets that can be used to train artificial intelligence tools to conduct robust person level analysis. To improve disambiguation accuracy, name instances are stratified into two groups according to name-ethnicity and disambiguated separately to produce optimal models learned on the automatically generated truth data. Based on the disambiguated data, this project develops new person-level S&E indicators that characterize the landscape and trends of the international S&E research workforce across all science and engineering fields. The new big data tools for automatic disambiguation at scale will be documented and released publicly to enable expansion, validation, and reuse by the science community as well as science of science policy researchers.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.
科学和工程研究在人类知识、社会和经济效益方面产生了巨大的回报。地球仪世界各国通过大量研究资金和集中精力培养训练有素的劳动力来争夺科学和技术领导地位。迄今为止,衡量和理解国家和国际科技发展趋势以及评估全球优势和劣势的努力,主要依赖于使用大型不断增长的数据集对专利和出版物等文件进行分析。 但这种方法往往会错过或错误地识别从事生产性科学和工程工作的人员和团队。在分析和报告中,很大程度上缺少关于国家内部和国家之间的S E劳动力的规模、组成、协作和流动性的强有力指标。国家和国际科学事业的这些关键方面,在侧重于文献和引文的数据分析中,得不到很好的捕捉。为了解决这个问题,该项目开发的人的水平的劳动力和协作措施,可以增加粒度的国际S E竞争力的比较,并导致新的政策见解S E劳动力培训,招聘和保留一个国家的未来。这种个人水平指标的前提是,出现在多个书目数据集中的研究人员被正确识别和链接。根据作者的姓名有效识别和联系作者是令人生畏的,因为姓名往往含糊不清。亚洲人的名字尤其如此,这是一个重大问题,因为亚洲研究人员在许多研究领域发挥着越来越重要的作用。该项目解决了使用新的自动化和分层实体消歧框架在大型书目数据集中系统地和常规地消除名称歧义的挑战。这项工作的核心数据集是使用一种新方法获得的,该方法依赖于多个数据字段和一个迭代过程来自动创建消歧数据集,这些数据集可用于训练人工智能工具,以进行强大的个人层面分析。为了提高消歧的准确性,名称实例被分层成两组,根据名称种族和消歧分别产生最佳模型上自动生成的真理数据学习。基于消除歧义的数据,该项目开发了新的个人层面的S E指标,其特征在于跨所有科学和工程领域的国际S E研究队伍的景观和趋势。用于大规模自动消歧的新的大数据工具将被记录并公开发布,以便科学界以及科学政策研究者的科学扩展,验证和重用。该奖项反映了NSF的法定使命,并被认为值得通过使用基金会的智力价值和更广泛的影响审查标准进行评估来支持。
项目成果
期刊论文数量(4)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
Model Reuse in Machine Learning for Author Name Disambiguation: An Exploration of Transfer Learning
- DOI:10.1109/access.2020.3031112
- 发表时间:2020-10
- 期刊:
- 影响因子:3.9
- 作者:Jinseok Kim;Jason Owen-Smith
- 通讯作者:Jinseok Kim;Jason Owen-Smith
Effect of Chinese characters on machine learning for Chinese author name disambiguation: A counterfactual evaluation
- DOI:10.1177/01655515211018171
- 发表时间:2021-05
- 期刊:
- 影响因子:2.4
- 作者:Jinseok Kim;Jenna Kim;Jinmo Kim
- 通讯作者:Jinseok Kim;Jenna Kim;Jinmo Kim
ORCID-linked labeled data for evaluating author name disambiguation at scale
- DOI:10.1007/s11192-020-03826-6
- 发表时间:2021-02-11
- 期刊:
- 影响因子:3.9
- 作者:Kim, Jinseok;Owen-Smith, Jason
- 通讯作者:Owen-Smith, Jason
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Jason Owen-Smith其他文献
To Patent or Not: Faculty Decisions and Institutional Success at Technology Transfer
- DOI:
10.1023/a:1007892413701 - 发表时间:
2001-01-01 - 期刊:
- 影响因子:4.300
- 作者:
Jason Owen-Smith;Walter W. Powell - 通讯作者:
Walter W. Powell
MP5-19 THE IMPACT OF CARE COORDINATION ON RADICAL PROSTATECTOMY OUTCOMES
- DOI:
10.1016/j.juro.2015.02.246 - 发表时间:
2015-04-01 - 期刊:
- 影响因子:
- 作者:
John M. Hollingsworth;Russell J. Funk;Spencer A. Garrison;Jason Owen-Smith;Samuel R. Kaufman;Bruce E. Landon;James E. Montie;Brahmajee K. Nallamothu - 通讯作者:
Brahmajee K. Nallamothu
PD25-09 CLINICAL INTEGRATION IS ASSOCIATED WITH LOWER COSTS OF CARE AMONG PATIENTS UNDERGOING PROSTATECTOMY
- DOI:
10.1016/j.juro.2016.02.239 - 发表时间:
2016-04-01 - 期刊:
- 影响因子:
- 作者:
John M. Hollingsworth;Russell Funk;Amy Luckenbaugh;Jason Owen-Smith;Samuel Kaufman;Brahmajee Nallamothu - 通讯作者:
Brahmajee Nallamothu
Jason Owen-Smith的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Jason Owen-Smith', 18)}}的其他基金
Collaborative Research: RUI: HNDS-R: Stepping out of flatland: Complex networks, topological data analysis, and the progress of science
合作研究:RUI:HNDS-R:走出平地:复杂网络、拓扑数据分析和科学进步
- 批准号:
2318170 - 财政年份:2023
- 资助金额:
$ 46.2万 - 项目类别:
Standard Grant
Collaborative Research: Industries of Ideas: A prototype system for measuring the effects of research investments on regional firms and jobs
协作研究:创意产业:衡量研究投资对区域企业和就业影响的原型系统
- 批准号:
2332571 - 财政年份:2023
- 资助金额:
$ 46.2万 - 项目类别:
Cooperative Agreement
ECR: BCSER: IRM: Building Big Data Capacity for Education and Social Science Research Communities Using Restricted Administrative Data
ECR:BCSER:IRM:使用受限管理数据为教育和社会科学研究界构建大数据能力
- 批准号:
1937251 - 财政年份:2020
- 资助金额:
$ 46.2万 - 项目类别:
Standard Grant
Collaborative Research: Impacts of Hard/Soft Skills on STEM Workforce Trajectories
合作研究:硬/软技能对 STEM 劳动力轨迹的影响
- 批准号:
1954981 - 财政年份:2020
- 资助金额:
$ 46.2万 - 项目类别:
Continuing Grant
Collaborative Research: New Insights into STEM Pathways: The Role of Peers, Networks, and Demand.
协作研究:STEM 途径的新见解:同行、网络和需求的作用。
- 批准号:
1760609 - 财政年份:2018
- 资助金额:
$ 46.2万 - 项目类别:
Standard Grant
Medical Decision-Making and Network Assembly Mechanisms in Inpatient Surgical Care
住院外科护理中的医疗决策和网络组装机制
- 批准号:
1560987 - 财政年份:2016
- 资助金额:
$ 46.2万 - 项目类别:
Continuing Grant
Collaborative Research: STEM Training, Employment in Industry, and Entrepreneurship
合作研究:STEM 培训、工业就业和创业
- 批准号:
1535370 - 财政年份:2015
- 资助金额:
$ 46.2万 - 项目类别:
Standard Grant
Building Community and a New Data Infrastructure for Science Policy
为科学政策建立社区和新的数据基础设施
- 批准号:
1262447 - 财政年份:2013
- 资助金额:
$ 46.2万 - 项目类别:
Standard Grant
Estimating the Economic and Scientific Impact of Federal R&D Spending by Universities
估计联邦 R 的经济和科学影响
- 批准号:
1158711 - 财政年份:2012
- 资助金额:
$ 46.2万 - 项目类别:
Standard Grant
From Bank to Bench to Breakthrough: Selection, Access, and Use of Human Stem Cell Research Methods
从银行到实验室再到突破:人类干细胞研究方法的选择、获取和使用
- 批准号:
0949708 - 财政年份:2009
- 资助金额:
$ 46.2万 - 项目类别:
Standard Grant
相似国自然基金
Scalable Learning and Optimization: High-dimensional Models and Online Decision-Making Strategies for Big Data Analysis
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:合作创新研究团队
Data-driven Recommendation System Construction of an Online Medical Platform Based on the Fusion of Information
- 批准号:
- 批准年份:2024
- 资助金额:万元
- 项目类别:外国青年学者研究基金项目
Development of a Linear Stochastic Model for Wind Field Reconstruction from Limited Measurement Data
- 批准号:
- 批准年份:2020
- 资助金额:40 万元
- 项目类别:
基于Linked Open Data的Web服务语义互操作关键技术
- 批准号:61373035
- 批准年份:2013
- 资助金额:77.0 万元
- 项目类别:面上项目
Molecular Interaction Reconstruction of Rheumatoid Arthritis Therapies Using Clinical Data
- 批准号:31070748
- 批准年份:2010
- 资助金额:34.0 万元
- 项目类别:面上项目
高维数据的函数型数据(functional data)分析方法
- 批准号:11001084
- 批准年份:2010
- 资助金额:16.0 万元
- 项目类别:青年科学基金项目
染色体复制负调控因子datA在细胞周期中的作用
- 批准号:31060015
- 批准年份:2010
- 资助金额:25.0 万元
- 项目类别:地区科学基金项目
Computational Methods for Analyzing Toponome Data
- 批准号:60601030
- 批准年份:2006
- 资助金额:17.0 万元
- 项目类别:青年科学基金项目
相似海外基金
Collaborative Research: Fusion of Siloed Data for Multistage Manufacturing Systems: Integrative Product Quality and Machine Health Management
协作研究:多级制造系统的孤立数据融合:集成产品质量和机器健康管理
- 批准号:
2323083 - 财政年份:2024
- 资助金额:
$ 46.2万 - 项目类别:
Standard Grant
CAREER: Causal Modeling for Data Quality and Bias Mitigation
职业:数据质量和偏差缓解的因果建模
- 批准号:
2340124 - 财政年份:2024
- 资助金额:
$ 46.2万 - 项目类别:
Continuing Grant
National Edge AI Hub for Real Data: Edge Intelligence for Cyber-disturbances and Data Quality
用于真实数据的国家边缘人工智能中心:针对网络干扰和数据质量的边缘智能
- 批准号:
EP/Y028813/1 - 财政年份:2024
- 资助金额:
$ 46.2万 - 项目类别:
Research Grant
Conference: The 2024 Joint Research Conference on Statistics in Quality, Industry, and Technology (JRC 2024) - Data Science and Statistics for Industrial Innovation
会议:2024年质量、工业和技术统计联合研究会议(JRC 2024)——数据科学与统计促进产业创新
- 批准号:
2404998 - 财政年份:2024
- 资助金额:
$ 46.2万 - 项目类别:
Standard Grant
Collaborative Research: Fusion of Siloed Data for Multistage Manufacturing Systems: Integrative Product Quality and Machine Health Management
协作研究:多级制造系统的孤立数据融合:集成产品质量和机器健康管理
- 批准号:
2323084 - 财政年份:2024
- 资助金额:
$ 46.2万 - 项目类别:
Standard Grant
Collaborative Research: Fusion of Siloed Data for Multistage Manufacturing Systems: Integrative Product Quality and Machine Health Management
协作研究:多级制造系统的孤立数据融合:集成产品质量和机器健康管理
- 批准号:
2323082 - 财政年份:2024
- 资助金额:
$ 46.2万 - 项目类别:
Standard Grant
EO4AgroClimate: VISualisation and Assessment of water quality using an Open Data Cube FOR the weStern English chAnnel - Vis4Sea.
EO4AgroClimate:使用西方英语频道 Vis4Sea 的开放数据立方体进行水质可视化和评估。
- 批准号:
ST/Y003039/1 - 财政年份:2023
- 资助金额:
$ 46.2万 - 项目类别:
Research Grant
Collaborative Research: Frameworks: Automated Quality Assurance and Quality Control for the StraboSpot Geologic Information System and Observational Data
合作研究:框架:StraboSpot 地质信息系统和观测数据的自动化质量保证和质量控制
- 批准号:
2311822 - 财政年份:2023
- 资助金额:
$ 46.2万 - 项目类别:
Standard Grant
Data Quality in Manufacturing Industrial Internet Integration
制造业工业互联网集成中的数据质量
- 批准号:
2331985 - 财政年份:2023
- 资助金额:
$ 46.2万 - 项目类别:
Standard Grant
Collaborative Research: Frameworks: Automated Quality Assurance and Quality Control for the StraboSpot Geologic Information System and Observational Data
合作研究:框架:StraboSpot 地质信息系统和观测数据的自动化质量保证和质量控制
- 批准号:
2311821 - 财政年份:2023
- 资助金额:
$ 46.2万 - 项目类别:
Standard Grant