权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

New Methods to reduce Bias and Mean Square Error of Maximum Likelihood Estimators

减少最大似然估计的偏差和均方误差的新方法

基本信息

批准号：
8394896
负责人：
PRALAY SENCHAUDHURI
金额：
$ 45.36万
依托单位：
CYTEL, INC
依托单位国家：
美国
项目类别：
财政年份：
2009
资助国家：
美国
起止时间：
2009-07-01 至 2014-07-31
项目状态：
已结题

项目摘要

DESCRIPTION (provided by applicant): Categorical outcomes are ubiquitous in biomedical research, and generalized linear models (GLMs) represent the most widely applied methodology for testing associations between categorical variables and fixed investigative factors. Logistic regression in particular is the most frequently used model for binary data and has widespread applicability in the health, behavioral, and physical sciences. King and Ryan (2002) stated that there were 2,770 research papers published in 1999 in which "logistic regression" was in the title of the paper or among the keywords. King and Zeng (2001) referred to the use of the maximum likelihood method in logistic regression as "the nearly universal method". Maximum likelihood estimates (MLE) for logistic regression are based on large sample approximations that are reliable for problems with large samples and when the proportion of responses is not too small or too large. However, it has been known for several years that MLE are not reliable for small, sparse or unbalanced datasets, with the latter referring to a considerable difference between the number of zeros and ones of the response variable. Recent research has suggested a flexible means of correcting MLE bias and improving performance using a penalized likelihood-based approach, but the underlying theory has not been fully applied and implemented for practical use. In this project, we will extend the work begun during Phase 1 with logistic regression by (1) implementing the bias correction approach for a variety of other GLM's that include Poisson, multinomial, negative binomial, and censored survival data; (2) provide new diagnostic procedures that identify potential problems with near separability and MLE bias; (3) implement and evaluate an exact target estimation approach for bias correction in logistic regression; (4) improve the computational algorithms required for Aims 1-3; and (5) additionally implement the procedures in a SAS PROC. Given the ubiquity of categorical regression in public health and biomedical research, the final product of this effort will provide a critical intermediate alternative when analyzing data for which standard large-sample methods are unreliable and small-sample exact methods are infeasible. PUBLIC HEALTH RELEVANCE: Generalized linear models (such as logistic regression) for categorical data have widespread applicability in the health sciences. Maximum likelihood, the nearly universal method for computing estimates in generalized linear regression models, has been known to have high bias and mean square error for small, sparse or unbalanced datasets. We propose to develop commercial software that incorporates several new methods that have lower bias and mean square error in logistic regression and other generalized linear models and Cox proportional hazard models.

描述(申请人提供)：分类结果在生物医学研究中无处不在，广义线性模型(GLMS)代表了检验分类变量和固定研究因素之间关联的最广泛应用的方法学。尤其是Logistic回归是最常用的二进制数据模型，在健康、行为和物理科学中具有广泛的适用性。King和Ryan(2002)指出，1999年发表了2770篇研究论文，其中“Logistic回归”出现在论文的标题或关键词中。King和Zeng(2001)将Logistic回归中的极大似然法称为“几乎通用的方法”。Logistic回归的最大似然估计(MLE)是基于大样本近似的，对于大样本问题，当响应的比例不太小或太大时，这种近似是可靠的。然而，众所周知，最大似然估计对于小的、稀疏的或不平衡的数据集是不可靠的，后者指的是响应变量的零和一之间的相当大的差异。最近的研究已经提出了一种灵活的方法来纠正最大似然偏差，并使用基于惩罚的似然方法来提高性能，但其基本理论尚未完全应用于实际应用。在本项目中，我们将通过(1)对包括泊松、多项、负二项和删失生存数据在内的各种其他GLM实施偏差校正方法来扩展在第一阶段开始的Logistic回归的工作；(2)提供识别具有近可分离性和MLE偏差的潜在问题的新诊断程序；(3)实施和评估用于Logistic回归中偏差校正的精确目标估计方法；(4)改进AIMS 1-3所需的计算算法；以及(5)另外在SAS过程中实施这些程序。鉴于分类回归在公共卫生和生物医学研究中的普遍存在，在分析标准大样本方法不可靠和小样本精确方法不可行的数据时，这一努力的最终产品将提供一个关键的中间选择。公共卫生相关性：分类数据的广义线性模型(如Logistic回归)在卫生科学中具有广泛的适用性。最大似然法是广义线性回归模型中计算估计的一种几乎通用的方法，但它对于小数据集、稀疏数据集或不平衡数据集具有较高的偏差和均方误差。我们建议开发商业软件，其中结合了几种新的方法，这些方法在Logistic回归以及其他广义线性模型和Cox比例风险模型中具有较低的偏差和均方误差。