权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Development of statistical scoring system for essay-type tests

论文型考试统计评分系统的开发

基本信息

批准号：
17300088
负责人：
SHIBAYAMA Tadashi
金额：
$ 10.01万
依托单位：
Tohoku University (2007)Niigata University (2005-2006)
依托单位国家：
日本
项目类别：
Grant-in-Aid for Scientific Research (B)
财政年份：
2005
资助国家：
日本
起止时间：
2005 至 2007
项目状态：
已结题

项目摘要

Procedures1) In the first experiment the connected scoring design was used. A design matrix consisted of six raters and 290 essays. Six raters rated 260 essays by using heuristic method or analytical method. We obtained an incomplete data matrix with a total of 1560 scores. On the other hand we developed an automated scoring system by using these answers.2) In the second experiment 10 raters rated 280 essays. We obtained a complete data matrix. These data were used in simulation studies and these answers were used to improve the accuracy of the automated scoring system.Results1) Optimal scoring designOn the basis of simulation studies we proposed a connected scoring design in which two or three raters rated a common block of essays. But each rater dose not rate all essays. Using this design we can obtain the scores more efficiently2) Reliability assessment We proposed assessment procedures for reliability of the connected scoring design in respect of Generalizability theory and ANOVA approach for an incomplete data matrix. Inter correlation coefficient was only 0.293 in heuristic scoring method. On the other hand, it was 0.498 in analytical scoring method. Furthermore, if the scores were transformed by the proposed method, the coefficient was 0.586.3) Development of an automated scoring systemWe developed an automated scoring system on the basis of SVM (Support Vector Machine) and improved the system. If we give about 200 sample answers the system, it is able to rate essays accurately as well as human raters. This result means that the system can assist human raters in scoring procedures.

实验步骤1）实验一采用连续计分设计。设计矩阵由6名评分员和290篇文章组成。六位评分员对260篇文章进行了评分，采用启发式或分析法。我们得到了一个不完整的数据矩阵，共有1560个分数。另一方面，我们利用这些答案开发了一个自动评分系统。2）在第二个实验中，10名评分员对280篇文章进行评分。我们得到了完整的数据矩阵。结果1）最优评分设计在模拟研究的基础上，我们提出了一种由两个或三个评分员对同一组作文进行评分的联合评分设计。但是每个评分员并不对所有的文章进行评分。使用这种设计，我们可以更有效地获得分数2）可靠性评估针对不完整数据矩阵，我们提出了基于概化理论和方差分析方法的关联评分设计可靠性评估程序。启发式评分法的相关系数仅为0.293。另一方面分析评分法为0.498。3）自动评分系统的开发我们开发了一个基于SVM（Support Vector Machine）的自动评分系统，并对该系统进行了改进。如果我们给系统大约200个样本答案，它就能够像人类评分员一样准确地对文章进行评分。这一结果意味着该系统可以帮助人类评分程序。