权益分类	功能权益	普通用户	{{item.name}}会员
{{category.name}}	{{benefitItem.name}}

Object State Recognition via Multi-Modal Analysis of Videos and Video Caption Sequences

通过视频和视频字幕序列的多模态分析进行对象状态识别

基本信息

批准号：
22K21296
负责人：
八木拓真
金额：
$ 1.83万
依托单位：
The University of Tokyo
依托单位国家：
日本
项目类别：
Grant-in-Aid for Research Activity Start-up
财政年份：
2022
资助国家：
日本
起止时间：
2022-08-31 至 2024-03-31
项目状态：
已结题

项目摘要

2022年度は、 (i) 状態記述キャプションの収集に関する検討（テーマA）および (ii) 大規模言語モデルを用いた状態の自動列挙による状態認識手法の開発（テーマB）を行った。まず(i)では、数名のアノテータにより映像に対して自由記述文にて状態変化に着目した文章（状態記述文）を記述し、その実現可能性を検討した。その結果、状態記述文の記述以前の状態語（raw、boiledなど）の選定にすでに高い自由度があり、素朴な教師あり学習では訓練に失敗する懸念が見いだされた。また、映像中の物体がある状態であることの注釈付けを行う際、当該状態に今まさになろうとしている/当該状態から今まさに別の状態になろうとしている等の中間状態が必要である等のルール策定を行い、状態語の自由度を許しながら一貫した評価を行うための整理を行った。上記の背景に基づき、(ii)においてインターネット上の大量の文章によって訓練された大規模言語言語モデル（LLM）が含む言語的な状態の知識を利用し、特定の物体に対して考えられる状態語を列挙し、それらの存在可能性を既存の画像-言語マッチングモデルと組み合わせることにより任意の状態語に対して状態認識を行える枠組みを提案した。事前に人手である物体が取りうる状態語を用意する代わりにLLMを用いたスケーラブルな自動列挙を行い、状態認識を動画と物体名+状態語のマッチングを取る問題として定式化することにより、より自由度が高く実用的な状態認識モデルの実現が期待できる。本研究の成果は現在国内研究会に投稿中である。

In 2022,(i) status description, collection, discussion, and (ii) development of status recognition techniques for large-scale speech collection, automatic listing of status, and (iii) development of status recognition techniques. (i) To discuss the possibility of the realization of a number of items in a free description article. The result, the status description, the description of the previous status (raw, boiled), the selection, the high degree of freedom, the simplicity, the teacher, the failure, the suspense, the uncertainty, the uncertainty. When an object in an image is in a state, when it is in a state, when it is in an intermediate state, when it is in a different state, when it is in an intermediate state, when it is in a state, when it is in an intermediate state, when The background of the above note is: (ii) a large number of articles on language development, training, large-scale speech language (LLM), knowledge of the state of speech, specific objects, etc. The possibility of existence of a picture exists, and the possibility of existence of a picture exists. Before hand, the object is selected, the state language is used, the LLM is automatically listed, the state recognition is animated, the object name + the state language is selected, the problem is formulated, the degree of freedom is high, the state recognition is used, and the expectation is realized. The results of this study are now submitted to the National Research Association.