Fine-grained Action Recognition, Captions, and Long Video Question-Answering
Participated in developing a large video dataset of everyday home actions consisting of a balanced distribution of videos over 100 action categories. Through the analysis of the dataset, we characterized the dataset against existing datasets and identified challenging action categories in this dataset. We showed how human action recognition can be achieved with this dataset by experiments using 3DCNN models for video (Python, PyTorch), Released on https://actions.stair.center. This project is supported by NEDO (New Energy and Industrial Technology Development Organization), Japan.