权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

CHS: Medium: Collaborative Research: Scalable Integration of Data-Driven and Model-Based Methods for Large Vocabulary Sign Recognition and Search

CHS：中：协作研究：用于大词汇量符号识别和搜索的数据驱动和基于模型的方法的可扩展集成

基本信息

批准号：
1763523
负责人：
Dimitris Metaxas
金额：
$ 69万
依托单位：
Rutgers University New Brunswick
依托单位国家：
美国
项目类别：
Standard Grant
财政年份：
2018
资助国家：
美国
起止时间：
2018-08-01 至 2022-07-31
项目状态：
已结题

来源：
https://www.nsf.gov/awardsearch/showAward?AWD_ID=1763523&HistoricalAwards=false
关键词：
CHS Medium Collaborative Research Scalable

项目摘要

It is surprisingly difficult to look up an unfamiliar sign in American Sign Language (ASL). Most ASL dictionaries list signs in alphabetical order based on approximate English translations, so a user who does not understand a sign or know its English translation would not know how to find it. ASL lacks a written form or intuitive "alphabetical sorting" based on such a writing system. Although some dictionaries make available alternative ways to search for a sign, based on explicit specification of various properties, a user must often still look through hundreds of pictures of signs to find a match to the unfamiliar sign (if it is present at all in that dictionary). This research will create a framework that will enable the development of a user-friendly, video-based sign-lookup interface, for use with online ASL video dictionaries and resources, and for facilitation of ASL annotation. Input will consist of either a webcam recording of a sign by the user, or user identification of the start and end frames of a sign from a digital video. To test the efficacy of the new tools in real-world applications, the team will partner with the leading producer of pedagogical materials for ASL instruction in high schools and colleges, which is developing the first multimedia ASL dictionary with video-based ASL definitions for signs. The lookup interface will be used experimentally to search the ASL dictionary in ASL classes at Boston University and RIT. Project outcomes will revolutionize how deaf children, students learning ASL, or families with deaf children search ASL dictionaries. They will accelerate research on ASL linguistics and technology, by increasing efficiency, accuracy, and consistency of annotations of ASL videos through video-based sign lookup. And they will lay the groundwork for future technologies to benefit deaf users, such as search by video example through ASL video collections, or ASL-to-English translation, for which sign-recognition is a precursor. The new linguistically annotated video data and software tools will be shared publicly, for use by others in linguistic and computer science research, as well as in education. Sign recognition from video is still an open and difficult problem because of the nonlinearities involved in recognizing 3D structures from 2D video, and the complex linguistic organization of sign languages. The linguistic parameters relevant to sign production and discrimination include hand configuration and orientation, location relative to the body or in signing space, movement trajectory, and in some cases, facial expressions/head movements. An additional complication is that signs belonging to different classes have distinct internal structures, and are thus subject to different linguistic constraints and require distinct recognition strategies; yet prior research has generally failed to address these distinctions. The challenges are compounded by inter- and intra- signer variations, and, in continuous signing, by co-articulation effects (i.e., influence from adjacent signs) with respect to several of the above parameters. Purely data-driven approaches are ill-suited to sign recognition given the limited quantities of available, consistently annotated data and the complexity of the linguistic structures involved, which are hard to infer. Prior research has, for this reason, generally focused on selected aspects of the problem, often restricting the work to a limited vocabulary, and therefore resulting in methods that are not scalable. More importantly, few if any methods involve 4D (spatio-temporal) modeling and attention to the linguistic properties of specific types of signs. A new approach to computer-based recognition of ASL from video is needed. In this research, the approach will be to build a new hybrid, scalable, computational framework for sign identification from a large vocabulary, which has never before been achieved. This research will strategically combine state-of-the-art computer vision, machine-learning methods, and linguistic modeling. It will leverage the team's existing publicly shared ASL corpora and Sign Bank - linguistically annotated and categorized video recordings produced by native signers - which will be augmented to meet the requirements of this project.This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

在美国手语（ASL）中查找一个不熟悉的标志是非常困难的。大多数美国手语词典根据大致的英文翻译按字母顺序列出符号，因此不理解符号或不知道其英文翻译的用户将不知道如何找到它。美国手语缺乏基于这种书写系统的书面形式或直观的“字母排序”。尽管有些字典根据各种属性的明确说明，提供了搜索标志的替代方法，但用户仍然必须经常查看数百张标志图片，才能找到与不熟悉的标志相匹配的标志（如果字典中存在的话）。本研究将创建一个框架，用于开发用户友好的、基于视频的符号查找界面，用于在线ASL视频词典和资源，并促进ASL注释。输入将包括用户对标志的网络摄像头记录，或者用户从数字视频中识别标志的开始帧和结束帧。为了测试新工具在实际应用中的有效性，该团队将与领先的高中和大学ASL教学材料生产商合作，后者正在开发第一个多媒体ASL词典，其中包含基于视频的ASL符号定义。该查询接口将用于在波士顿大学和RIT的ASL课程中搜索ASL字典。项目成果将彻底改变聋儿、学习美国手语的学生或聋儿家庭搜索美国手语词典的方式。他们将通过基于视频的符号查找来提高美国手语视频注释的效率、准确性和一致性，从而加速美国手语语言学和技术的研究。它们将为未来造福聋人用户的技术奠定基础，比如通过美国手语视频集进行视频示例搜索，或者手语到英语的翻译，而手语识别是这些技术的先驱。新的语言注释视频数据和软件工具将公开共享，供其他人在语言和计算机科学研究以及教育中使用。由于从二维视频中识别三维结构所涉及的非线性和手语复杂的语言组织，从视频中识别手势仍然是一个开放和困难的问题。与手势产生和识别相关的语言参数包括手的形态和方向、相对于身体或手势空间的位置、运动轨迹，以及在某些情况下的面部表情/头部运动。另一个复杂因素是，属于不同类别的符号具有不同的内部结构，因此受到不同的语言约束，需要不同的识别策略；然而，先前的研究通常未能解决这些区别。手势之间和内部的变化，以及在连续签名中，与上述几个参数相关的协同发音效应（即相邻手势的影响），使挑战更加复杂。纯数据驱动的方法不适合签名识别，因为可用的数量有限，一致注释的数据和涉及的语言结构的复杂性，这很难推断。由于这个原因，先前的研究通常集中在问题的选定方面，经常将工作限制在有限的词汇表中，因此导致方法不可扩展。更重要的是，很少有方法涉及4D（时空）建模和关注特定类型符号的语言特性。需要一种新的计算机识别视频中手语的方法。在这项研究中，该方法将建立一个新的混合，可扩展的计算框架，用于从大词汇表中识别符号，这是以前从未实现过的。这项研究将战略性地结合最先进的计算机视觉、机器学习方法和语言建模。它将利用团队现有的公开共享的ASL语料库和手语库——由本地签名者制作的语言注释和分类视频记录——这将被增强以满足这个项目的需求。该奖项反映了美国国家科学基金会的法定使命，并通过使用基金会的知识价值和更广泛的影响审查标准进行评估，被认为值得支持。

项目成果

期刊论文数量（3）

专著数量（0）

科研奖励数量（0）

会议论文数量（0）

专利数量（0）

Isolated Sign Recognition using ASL Datasets with Consistent Text-based Gloss Labeling and Curriculum Learning

DOI：
发表时间：
2022
期刊：
影响因子：
0
作者：
Konstantinos M. Dafnis;Evgenia Chroni;C. Neidle;Dimitris N. Metaxas
通讯作者：
Konstantinos M. Dafnis;Evgenia Chroni;C. Neidle;Dimitris N. Metaxas

Bidirectional Skeleton-Based Isolated Sign Recognition using Graph Convolution Networks

使用图卷积网络的基于双向骨架的孤立符号识别

DOI：
发表时间：
2022
期刊：
20-25 June 2022.
影响因子：
0
作者：
Dafnis, Konstantinos M.;Chroni, Evgenia;Neidle, Carol;Metaxas, Dimitris
通讯作者：
Metaxas, Dimitris

American Sign Language Video Anonymization to Support Online Participation of Deaf and Hard of Hearing Users

DOI：
10.1145/3441852.3471200
发表时间：
2021-10
期刊：
Proceedings of the 23rd International ACM SIGACCESS Conference on Computers and Accessibility
影响因子：
0
作者：
Sooyeon Lee;Abraham Glasser;Becca Dingman;Zhaoyang Xia;Dimitris N. Metaxas;C. Neidle;Matt Huenerfauth
通讯作者：
Sooyeon Lee;Abraham Glasser;Becca Dingman;Zhaoyang Xia;Dimitris N. Metaxas;C. Neidle;Matt Huenerfauth

DOI：
{{ item.doi }}
发表时间：
{{ item.publish_year }}
期刊：
{{ item.journal_name }}
影响因子：
{{ item.factor }}
作者：
{{ item.authors }}
通讯作者：
{{ item.author }}

数据更新时间：{{ journalArticles.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ monograph.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ sciAawards.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ conferencePapers.updateTime }}

作者：
{{ item.author }}

数据更新时间：{{ patent.updateTime }}

Dimitris Metaxas其他文献

A frame-based model for large manufacturing databases

DOI：
10.1007/bf01471336
发表时间：
1991-02-01
期刊：
JOURNAL OF INTELLIGENT MANUFACTURING
影响因子：
7.400
作者：
Dimitris Metaxas;Timos Sellis
通讯作者：
Timos Sellis

Algorithmic issues in modeling motion

运动建模中的算法问题

DOI：
10.1145/592642.592647
发表时间：
2002
期刊：
ACM Comput. Surv.
影响因子：
0
作者：
Pankaj K. Agarwal;Leonidas J. Guibas;H. Edelsbrunner;Jeff Erickson;M. Isard;Sariel Har;J. Hershberger;Christian Jensen;L. Kavraki;Patrice Koehl;Ming Lin;Dinesh Manocha;Dimitris Metaxas;Brian Mirtich;David Mount;S. Muthukrishnan;Dinesh Pai;E. Sacks;J. Snoeyink;Subhash Suri;Ouri E. Wolfson;Merl Mirtich@merl Com
通讯作者：
Merl Mirtich@merl Com

A combustion-based technique for fire animation and visualization

DOI：
10.1007/s00371-007-0162-3
发表时间：
2007-06-28
期刊：
VISUAL COMPUTER
影响因子：
2.900
作者：
Kyungha Min;Dimitris Metaxas
通讯作者：
Dimitris Metaxas

Multi-Stage Feature Fusion Network for Video Super-Resolution

用于视频超分辨率的多级特征融合网络

DOI：
10.1109/tip.2021.3056868
发表时间：
2021-02
期刊：
IEEE Transactions on Image Processing
影响因子：
10.6
作者：
Huihui Song;Wenjie Xu;Dong Liu;Bo Liu;Qingshan Liu;Dimitris Metaxas
通讯作者：
Dimitris Metaxas

The Traffic Calming Effect of Delineated Bicycle Lanes

划定自行车道的交通平静效果

DOI：
发表时间：
2024
期刊：
Journal of Urban Mobility
影响因子：
0
作者：
Hannah Younes;Clinton Andrews;Robert B. Noland;Jiahao Xia;Song Wen;Wenwen Zhang;Dimitris Metaxas;Leigh Ann Von Hagen;Jie Gong
通讯作者：
Jie Gong