Recognition of Handwritten Words in School Essays Using Conditional Random Fields
使用条件随机字段识别学校论文中的手写单词
基本信息
- 批准号:0750876
- 负责人:
- 金额:--
- 依托单位:
- 依托单位国家:美国
- 项目类别:Standard Grant
- 财政年份:2007
- 资助国家:美国
- 起止时间:2007-09-15 至 2009-02-28
- 项目状态:已结题
- 来源:
- 关键词:
项目摘要
This research concerns recognition of words in handwritten responses of children in reading comprehension tests. The approach to word recognition will be based on conditional random fields (CRFs), which are discriminative methods that do not make any assumptions about the underlying data and hence are known to be superior to Hidden Markov Modes (HMMs) for sequence labeling problems.The student response is first segmented into word images using an existing neural network based algorithm. Each word image is then over- segmented into a number of small segments such that the combination of segments forms character images. Segments are labeled as characters with probability evaluated from the CRF model. The total probability of a word image representing an entry from the lexicon is computed using a dynamic programming algorithm which evaluates the optimal combination of segments. A lexicon derived from the reading passage, testing prompt, answer rubric and student responses is used to limit the number of paths to explore.The state and transition parameters of the CRF model are estimated from handwriting samples. State parameters correspond to features such as: position (normalized by length), place (in start, middle or end), height, width, distances to prototype, deviations of height, etc. Transition parameters correspond to features such as: label of character pair (th, er, qu, etc), vertical overlap (of pixels of candidate character images), height difference, width difference, aspect ratio difference, bigram width, etc.The research test-bed will consist of scored handwritten responses to reading comprehension prompts from Grades 8 and 5 of an inner city school in Buffalo, New York. There are 300 Grade 8 responses and 200 Grade 5 responses, with about 100-150 words in each response. Training data for parameter estimation will initially consist of 150 student responses and 1,000 half-page writings of adults. These will be supplemented with additional school data as research progresses.Goal-oriented integration of complex document image analysis, natural language processing and machine learning will drive improved handwriting recognition methods. Children?s handwriting recognition has never before been studied in document analysis. Handwriting recognition technology for complex documents is as yet largely unavailable. Success will allow statewide testing to be done later in the school year with results provided sooner thereby having an impact on improved education.
本研究关注儿童在阅读理解测试中手写反应中的单词再认。单词识别的方法将基于条件随机场(CRF),这是一种判别方法,它不对底层数据做任何假设,因此在序列标记问题上上级隐马尔可夫模型(HALF)。学生的反应首先使用现有的基于神经网络的算法分割成单词图像。然后将每个单词图像过分割成多个小片段,使得片段的组合形成字符图像。片段被标记为具有从CRF模型评估的概率的字符。一个词的图像表示一个条目从词典的总概率计算使用动态规划算法,该算法评估的最佳组合的片段。从阅读文章、测试提示、答案规则和学生反应中提取的词汇表用于限制探索路径的数量,CRF模型的状态和转换参数由手写样本估计。状态参数对应于以下特征:(按长度标准化),地点(在开始、中间或结束时)、高度、宽度、到原型的距离、高度偏差等。过渡参数对应于以下特征:字符对标号(th、er、qu等),垂直重叠(候选字符图像的像素的)、高度差、宽度差、纵横比差、二元组宽度,研究测试台将包括对来自纽约的布法罗市中心学校的8年级和5年级的阅读理解提示的评分手写响应。8级答卷300份,5级答卷200份,每份答卷约100-150字。 参数估计的训练数据最初将包括150个学生的回答和1,000个半页的成人文章。随着研究的进展,这些数据将得到更多学校数据的补充。以目标为导向的复杂文档图像分析、自然语言处理和机器学习的集成将推动手写识别方法的改进。孩子们?的笔迹识别以前从未在文档分析中进行过研究。用于复杂文档的手写识别技术在很大程度上还不可用。成功将使全州范围的测试可以在学年晚些时候进行,结果可以更快地提供,从而对改善教育产生影响。
项目成果
期刊论文数量(0)
专著数量(0)
科研奖励数量(0)
会议论文数量(0)
专利数量(0)
数据更新时间:{{ journalArticles.updateTime }}
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
数据更新时间:{{ journalArticles.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ monograph.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ sciAawards.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ conferencePapers.updateTime }}
{{ item.title }}
- 作者:
{{ item.author }}
数据更新时间:{{ patent.updateTime }}
Sargur Srihari其他文献
Large scale address recognition systems Truthing, testing, tools, and other evaluation issues
- DOI:
10.1007/s100320200069 - 发表时间:
2002-03-01 - 期刊:
- 影响因子:2.500
- 作者:
Srirangaraj Setlur;Alfred Lawson;Venugopal Govindaraju;Sargur Srihari - 通讯作者:
Sargur Srihari
Sargur Srihari的其他文献
{{
item.title }}
{{ item.translation_title }}
- DOI:
{{ item.doi }} - 发表时间:
{{ item.publish_year }} - 期刊:
- 影响因子:{{ item.factor }}
- 作者:
{{ item.authors }} - 通讯作者:
{{ item.author }}
{{ truncateString('Sargur Srihari', 18)}}的其他基金
Knowledge-Based Document Image Understanding
基于知识的文档图像理解
- 批准号:
9014110 - 财政年份:1991
- 资助金额:
-- - 项目类别:
Continuing Grant
Workshop on Syntactic and Structural Pattern Recognition; Murray Hill, N.J.; June 13-15, 1990
句法和结构模式识别研讨会;
- 批准号:
8922687 - 财政年份:1990
- 资助金额:
-- - 项目类别:
Standard Grant
Knowledge Based Approaches to Document Image Understanding (Computer and Information Science)
基于知识的文档图像理解方法(计算机和信息科学)
- 批准号:
8613361 - 财政年份:1987
- 资助金额:
-- - 项目类别:
Standard Grant
Contextual Algorithms For Text Recognition
文本识别的上下文算法
- 批准号:
8010830 - 财政年份:1980
- 资助金额:
-- - 项目类别:
Standard Grant
Travel to Attend: Fourth International Congress of Cybernetics and Systems; Amsterdam, the Netherlands: August 21-25, 1978
前往参加:第四届国际控制论与系统大会;
- 批准号:
7817654 - 财政年份:1978
- 资助金额:
-- - 项目类别:
Standard Grant
相似海外基金
Computer recognition of handwritten characters, words, and symbols
计算机识别手写字符、单词和符号
- 批准号:
8989-2002 - 财政年份:2006
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual
Computer recognition of handwritten characters, words, and symbols
计算机识别手写字符、单词和符号
- 批准号:
8989-2002 - 财政年份:2005
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual
Computer recognition of handwritten characters, words, and symbols
计算机识别手写字符、单词和符号
- 批准号:
8989-2002 - 财政年份:2004
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual
Computer recognition of handwritten characters, words, and symbols
计算机识别手写字符、单词和符号
- 批准号:
8989-2002 - 财政年份:2003
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual
Computer recognition of handwritten characters, words, and symbols
计算机识别手写字符、单词和符号
- 批准号:
8989-2002 - 财政年份:2002
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual
Computer recognition of handwritten characters, words, and symbols
计算机识别手写字符、单词和符号
- 批准号:
8989-1998 - 财政年份:2001
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual
Computer recognition of handwritten characters, words, and symbols
计算机识别手写字符、单词和符号
- 批准号:
8989-1998 - 财政年份:2000
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual
Computer recognition of handwritten characters, words, and symbols
计算机识别手写字符、单词和符号
- 批准号:
8989-1998 - 财政年份:1999
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual
Computer recognition of handwritten characters, words, and symbols
计算机识别手写字符、单词和符号
- 批准号:
8989-1998 - 财政年份:1998
- 资助金额:
-- - 项目类别:
Discovery Grants Program - Individual














{{item.name}}会员




