Scene Understanding including Unknown Objects/Events

Summary

When we humans see an unknown object, we can recognize it as some kind of object even if we don't know what it is, but robots can only detect objects that their object detectors have learned about and cannot estimate the relationship with other objects. We are researching the topic, Open-set recognition, which enables robot for detecting unknown objects, and Open-vocabulary recognition, which enables robot for detecting objects specified with new words. Not only unknown "Object", but we are also working on recognizing unknown action/event. (This work was/is supported in part by the MEXT (Ministry of Education, Culture, Sports, Science and Technology, JAPAN) through Grant-in-Aid for Scientific Research under Grant JP21H03519 and JP24H00733.)

Publications

  • M. Sonogashira et al., Relationship-Aware Unknown Object Detection for Open-Set Scene Graph Generation, IEEE Access, 2024.
  • T. T. Nguyen et al., One-stage open-vocabulary temporal action detection leveraging temporal multi-scale and action label features, FG2024.
  • T. T. Nguyen et al., Zero-Shot Pill-Prescription Matching With Graph Convolutional Network and Contrastive Learning, IEEE Access, 2024.
  • M. Sonogashira & Y. Kawanishi, Towards Open-Set Scene Graph Generation with Unknown Objects, IEEE Access, 2022.

Scene Understanding including Unknown Objects/Events

Summary

In order to achieve a higher level of scene understanding, we are focusing on the task of scene graph generation, which involves not only detecting objects but also estimating and describing the relationships between them. Additionally, we are applying this approach to various applications on summarizing the contents of multiple images and generating captions. (This work was/is supported in part by the MEXT (Ministry of Education, Culture, Sports, Science and Technology, JAPAN) through Grant-in-Aid for Scientific Research under Grant JP21H03519.)

Publications

  • I. Phueaksri et al., Toward Visual Storytelling using Scene-Graph Contexts, MIRU2024, 2024.
  • I. Phueaksri et al., Image-Collection Summarization using Scene-Graph Generation with External Knowledge, IEEE Access, 2024.
  • I. Phueaksri et al., An Approach to Generate a Caption for an Image Collection using Scene Graph Generation, IEEE Access, 2023.

Multimodal Recognition Framework

Summary

We humans use our eyes and ears to obtain information from surroundings. We are working on multimodal recognition framework, which recognizes things based on multiple modalities. In particular, we are working on the missing modality problem where some of the modalities are not available.

Publications

  • V. John and Y. Kawanishi, Progressive Learning of a Multimodal Classifier Accounting for Different Modality Combinationsi, Sensors, 2023.
  • V. John and Y. Kawanishi, A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition, arxiv:2210.10972, 2022.
  • V. John and Y. Kawanishi, Combining Knowledge Distillation and Transfer Learning for Sensor Fusion in Visible and Thermal Camera-based Person Classification, MVA2023.
  • V. John and Y. Kawanishi, Multimodal Cascaded Framework with Metric Learning Robust to Missing Modalities for Person Classification, ACM MMSys', 2023.
  • V. John and Y. Kawanishi, Audio-Visual Sensor Fusion Framework using Person Attributes Robust to Missing Visual Modality for Person Recognition, MMM2023.
  • V. John and Y. Kawanishi, A Multimodal Sensor Fusion Framework Robust to Missing Modalities for Person Recognition, ACM MM Asia 2022.
  • V. John and Y. Kawanishi, Audio and Video-Based Emotion Recognition Using Multimodal Transformers, ICPR2022.

Human Recognition using a Low-resolution FIR Sensor

Summary

To watch over the elderly, we need sensing that can be recognized even at night, with privacy in mind. What we are focusing on is extremely low-resolution far-infrared images. Since they are extremely low-resolution, privacy is not an issue, and since they capture far-infrared light, we can use them for nighttime sensing. We are researching human tracking, gesture recognition, behavior recognition, and pose estimation using these images.

Publications

  • S. Iwata et al., LFIR2Pose: Pose Estimation from an Extremely Low-Resolution FIR Image Sequence, ICPR2020.
  • Y. Kawanishi et al., Voting-based Hand-Waving Gesture Spotting from a Low-Resolution Far-Infrared Image Sequence, VCIP2018.
  • T. Kawashima et al., Action Recognition from Extremely Low-Resolution Thermal Image Sequence, AVSS2017.

Object Pose Estimation from Depth Images

Summary

When a robot grasps or manipulates an object, it needs to recognize the object's pose correctly. Recently, depth image sensors have been installed in many robots. Therefore, we are working on object pose estimation using depth images.

Publications

  • H. Tatemichi et al., Category-level Object Pose Estimation in Heavily Cluttered Scenes by Generalized Two-stage Shape Reconstructor, IEEE Access, 2024.
  • N. M. Z. Hashim et al., Best Next-Viewpoint Recommendation by Selecting Minimum Pose Ambiguity for Category-level Object Pose Estimation, JSPE, 2021.
  • H. Tatemichi et al., Median-Shape Representation Learning for Category-Level Object Pose Estimation in Cluttered Environments, ICPR2020.
  • H. Ninomiya et al., Deep Manifold Embedding for 3D Object Pose Estimation, VISAPP2017.

Gaze Analysis of a Crowd of People

Summary

Knowing what a large number of people are paying attention to is useful for estimating the interests of people watching sports or attending a live music event. Therefore, from a video observing many people simultaneously, we are working on a research project to estimate the target most people in a video are paying attention to.

Publications

  • Y. Kodama et al., Localizing the Gaze Target of a Crowd of People, ACCV2018 Workshop.

Action Recognition from a Human Skeleton Sequence

Summary

By focusing on how a person's skeleton changes, we can estimate what the person is doing and what state the person is in. Recently, technologies that can accurately estimate a person's skeleton from images have become available, and we are working on applying the estimated skeleton to various recognition and prediction tasks.

Publications

  • M. Mizuno et al., Subjective Baggage-Weight Estimation based on Human Walking Behavior, IEEE Access, 2024.
  • M. Mizuno et al., Subjective Baggage-Weight Estimation from Gait ---Can you estimate how heavy the person feels?---, VISAPP2023.
  • T. Fujita, Future Pose Prediction from 3D Human Skeleton Sequence with Surrounding Situation, Sensors, 2023.
  • T. Fujita et al., Human Pose Prediction by Progressive Generation in Multi-scale Frequency Domain, MVA2023.
  • T. Fujita et al., Toward Surroundings-aware Temporal Prediction of 3D Human Skeleton Sequence, ICPR2022 Workshop (T-CAP).
  • N. Nishida et al., SOANets: Encoder-Decoder based Skeleton Orientation Alignment Network for White Cane User Recognition from 2D Human Skeleton Sequence, VISAPP2020.
  • O. Temuroglu et al, Occlusion-Aware Skeleton Trajectory Representation for Abnormal Behavior Detection, IW-FCV2020.

People Tracking

Summary

In techniques such as people flow analysis and lost child search, it is important to know the trajectories of people walking around a large area. We are working on tracking and re-identifying people within/across camera views in order to know which person went where by passing which paths, in a situation where a large area is being observed by multiple cameras.

Publications

  • Y. Kawanishi, Label-Based Multiple Object Ensemble Tracking with Randomized Frame Dropping, ICPR2022.
  • Y. Kawanishi et al., Trajectory Ensemble: Multiple Persons Consensus Tracking across Non-overlapping Multiple Cameras over Randomly Dropped Camera Networks, CVPR2017 Workshop.

Image Recognition Applications to Other Research Fields

Summary

Image recognition techniques are expected to be used in various fields such as life science. We are also working on the application of image recognition technology in the fields of astronomy and archaeology. (Detection of star forming region in Astronomy field, Automatic origin identification of earthenware in Archaeology field.)

Publications

  • Y. Kawae, et al., 3D Survey of the Menkaure Pyramid, Virtual Annual Meeting, American Research Center in Egypt, 2024.
  • Y. Shimajiri et al., Predicting reliable H2 column density maps from molecular line data using machine learning, MNRAS, 2023.
  • S. Fujita et al., Distance determination of molecular clouds in the first quadrant of the Galactic plane using deep learning, Protostars and Planets VII, 2023.
  • S. Fujita et al., Distance determination of molecular clouds in the first quadrant of the Galactic plane using deep learning: I. Method and results, ASJ, 2023.
  • S. Nishimoto et al., Development of a high-speed identification model for infrared-ring structures using deep learningi, SPIE Astronomical Telescopes + Instrumentation 2022.
  • S. Ueda et al., Identification of infrared-ring structures by convolutional neural network, SPIE Astronomical Telescopes + Instrumentation 2020