权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Semiparametric Efficient Estimation of Models of Measurement Errors and Missing Data

测量误差和缺失数据模型的半参数高效估计

基本信息

批准号：
0452143
负责人：
Han Hong
金额：
$ 11.63万
依托单位：
Duke University
依托单位国家：
美国
项目类别：
Continuing Grant
财政年份：
2005
资助国家：
美国
起止时间：
2005-04-15 至 2007-03-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=0452143&HistoricalAwards=false
关键词：
Semiparametric Efficient Estimation Models Measurement

项目摘要

Many empirical studies in economics are complicated by the presence of relevant variables that are not observed, either because they are only available in an incomplete or corrupted way, or because they are unobservable by their own nature. Important examples include attrition in panel data analysis and the ubiquitous presence of measurement error which can potentially be correlated with the true unobserved variables. The program evaluation literature is concerned with the issue that one never observes individual outcomes with and without treatment. In these circumstances, identifying assumptions become necessary to overcome the lack of identification that results from the missing information in what will be referred to as the primary data set. One common solution to this identification problem is the assumption that the missing information can be recovered using auxiliary data sources under a conditional independence assumption. The key element of the identification strategy is that the auxiliary data set must provide information about the conditional distribution of the true variables of interest given a set of proxy variables, where the proxy variables are observed in both the primary sample and the auxiliary sample. This project derives semiparametric efficiency variance bounds for the estimation of parameters defined through generalized nonlinear method of moment models, where the sampling information consists of a primary sample and an auxiliary sample. The variables of interest in the moment conditions are not directly observable in the primary data set. The primary data set contains proxy variables which are correlated with the variables of interest. On the other hand, the auxiliary data set contains information about the conditional distribution of the variables of interest given the proxy variables. Identification is achieved by the assumption that this conditional distribution is the same in both the primary and auxiliary data sets.The results derived in this project are applicable to both the "verify-out-of-sample" case, where the two samples are independent, and the "verify-in-sample case", where the auxiliary sample is a subset of the primary sample. Sieve based semiparametric estimators are developed to achieve the semiparametric efficiency bounds when the propensity score is unknown, when the propensity is known, or when the propensity is assumed to belong to a correctly specified parametric family. These estimators only use one nonparametric estimate of conditional expectation and do not require two nonparametric estimates of both the conditional expectation of the moment functions and the propensity score. They require weaker regularity conditions than the existing ones in the literature. They also allow for unbounded support of conditional variables and nonsmooth moment conditions, and do not require the strong assumption that the propensity score function has to be uniformly bounded away from zero and one.These results will be extended to conditional moment models in which either the dependent variables or the conditioning variables are measured with errors. In these cases only a subset of the variables that are suspected to be measured with error are observable in the auxiliary data set. The estimators currently available require knowledge of the semiparametric efficiency variance bounds. A second extension will consider estimators based on nonparametric maximum likelihood principles that achieve the semiparametric efficiency bound without knowledge of its particular form. Extensive monte carlo simulations and an empirical illustration will be performed to evaluate the finite sample efficiency implications of competing estimators. The proposed project involves joint work with Professor Xiaohong Chen from New York University and Professor Alessandro Tarozzi from Duke University.Broader Impact: The results developed in this project will be applicable to a wide variety of models, including non classical measurement error models, missing data models and nonlinear treatment effect models. This proposal is part of a larger research agenda in the econometrics profession to develop methods to estimate models with latent variables. Many empirical studies in economics are complicated by the presence of relevant variables that are not observed, usually because they are only available in incomplete or corrupted ways. Important examples include attrition in panel data analysis and the presence of measurement error which can potentially be correlated with the true unobserved variables. Another example is the program evaluation literature, where the estimation of treatment effects has to overcome the fact that one never observes individual outcomes with and without treatment. In such circumstances, identifying assumptions based on conditional independence relations become necessary to overcome the lack of identification that results from the missing information in the primary data set. The proposed project will also provide useful guidance to the design of survey data sets, which generate the crucial data input for the analysis of econometric models.

经济学中的许多实证研究由于存在未被观察到的相关变量而变得复杂，这要么是因为它们只能以不完整或损坏的方式获得，要么是因为它们本身是不可观察的。重要的例子包括面板数据分析中的损耗和普遍存在的测量误差，这可能与真实的未观察到的变量相关。项目评估文献关注的问题是，人们从来没有观察到有或没有治疗的个体结果。在这种情况下，有必要确定假设，以克服因缺少信息而导致的缺乏识别的问题，这些信息将被称为原始数据集。这个识别问题的一个常见解决方案是假设丢失的信息可以在条件独立性假设下使用辅助数据源来恢复。识别策略的关键要素是，辅助数据集必须提供关于给定一组代理变量的感兴趣的真实变量的条件分布的信息，其中代理变量在主样本和辅助样本中都被观察到。本计画利用广义非线性矩量模型方法，推导出参数估计的半参数有效方差界，其中抽样资讯由一个主样本与一个辅助样本所组成。矩条件中的感兴趣变量在原始数据集中不可直接观察。主要数据集包含与感兴趣的变量相关的代理变量。另一方面，辅助数据集包含关于给定代理变量的感兴趣变量的条件分布的信息。通过假设这种条件分布在主数据集和辅助数据集中是相同的来实现识别。本项目中得出的结果适用于“样本外验证”情况（其中两个样本是独立的）和“样本内验证”情况（其中辅助样本是主样本的子集）。基于筛子的半参数估计的开发，以实现半参数效率界时的倾向得分是未知的，当的倾向是已知的，或当的倾向被假定为属于一个正确指定的参数家庭。这些估计只使用一个非参数估计的条件期望，并不需要两个非参数估计的条件期望的时刻函数和倾向得分。他们需要较弱的正则性条件比现有的文献中。它们还允许条件变量和非光滑矩条件的无界支持，并且不需要强假设，即倾向得分函数必须一致有界远离0和1.These结果将被扩展到条件矩模型中，其中无论是因变量或条件变量的测量误差。在这些情况下，只有一个子集的变量被怀疑是测量误差是可观察的辅助数据集。目前可用的估计需要半参数效率方差界的知识。第二个扩展将考虑基于非参数最大似然原理的估计量，这些原理在不知道其特定形式的情况下实现半参数效率界。广泛的蒙特卡洛模拟和实证说明将进行评估有限样本效率的影响，竞争估计。该项目由纽约大学的陈晓红教授和杜克大学的Alessandro Tarozzi教授共同参与。更广泛的影响：该项目的结果将适用于各种模型，包括非经典测量误差模型、缺失数据模型和非线性治疗效应模型。这个建议是计量经济学专业更大的研究议程的一部分，以开发方法来估计具有潜在变量的模型。经济学中的许多实证研究由于存在未观察到的相关变量而变得复杂，通常是因为它们只能以不完整或损坏的方式获得。重要的例子包括面板数据分析中的损耗和测量误差的存在，这可能与真实的未观察到的变量相关。另一个例子是项目评估文献，其中对治疗效果的估计必须克服这样一个事实，即人们从未观察过有治疗和无治疗的个体结果。在这种情况下，识别基于条件独立关系的假设成为必要的，以克服缺乏识别的主要数据集中的缺失信息的结果。拟议项目还将为调查数据集的设计提供有益的指导，这些数据集为计量经济学模型的分析提供重要的数据投入。