IROS 2018 Report

Cover Photo (Left: Reza, Data Scientist / Taka, Chief Digital Officer / Haruna, Project Manager)


The 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2018: is the flagship conference in the field of robotics and the biggest international event for researchers, companies, and end-users. It was held for the first time in Spain in the lively capital city of Madrid, on October 1-5, 2018. The conference included plenary and short keynote presentations, contributed papers sessions in a mixed oral/interactive format, workshops and tutorials, numerous robotic challenges, exhibition with live demos, and several forums


Since IROS contains rich amount of research contributions on creating 3D modeling and 3D object classification, it helps us to identify the state-of-the-art on these topics so that we can use recent research to improve the accuracy and computational efficiency in logistics industry. In short, in line with our developing products attending sessions on topics listed below were the main motivation behind attending IROS’18.

  • 3D model reconstruction
  • 3D shape completion in a cluttered environment
  • Depth camera online calibration
  • Grasp configuration using 3-D mode
  • 3D Point Clouds classification using CNN


  • Learning to Segment Generic Handheld Objects Using Class–Agnostic Deep Comparison and Segmentation Network

Their method segments handheld objects in real time using class-agnostic deep comparison and segmentation network. The inputs to the network are RGB–D data of known object template and a search space and it outputs a pixel-wise label of the object and an objectness score. The score indicates the likelihood that the same object is present in both the inputs.

  • A 3D Convolutional Neural Network Towards Real-time Amodal 3D Object Detection

3D object detection: To predict object locations, dimensions, poses and categories in the real world. They introduce a 3D Convolutional Neural Network that takes a volumetric representation of an indoor scene as input and predicts 3D object bounding boxes, object categories, and orientations. NYUv2 RGBD dataset and the SUN RGBD dataset. Detection and recognition are treated as one regression problem in one elegant 3D CNN. They propose to encode the 3D space in a volumetric voxel grid representation for the input of their 3D CNN It failed detections of small objects their point cloud data is relatively too coarse to preserve enough features for detection, especially under strong occlusions and tight arrangements.

  • Detect Globally, Label Locally: Learning Accurate 6-DOF Object Pose Estimation by Joint Segmentation and Coordinate Regression

The underlying idea is to train a system that can regress the 3D coordinates of an object, given an input RGB or RGB-D image and known object geometry, followed by a robust procedure such as RANSAC to optimize the object pose. These coordinate regression-based approaches exhibit state-of-the-art performance by using pixel-level cues to model the probability distribution of object parts within the image. However, they fail to capture global information at the object level to learn accurate foreground/background segmentation. They showed that combining global features for object segmentation and local features for coordinate regression results in pixel-accurate object boundary detections and consequently a substantial reduction in outliers and an increase in overall performance.

  • Learning a Local Feature Descriptor for 3D LiDAR Scans

Robust data association is necessary for virtually every SLAM system and finding corresponding points is typically a preprocessing step for scan alignment algorithms. Traditionally, handcrafted feature descriptors were used for these problems but recently learned descriptors have been shown to perform more robustly. In this work, the authors propose a local feature descriptor for 3D LiDAR scans. The descriptor is learned using a Convolutional Neural Network (CNN).

  • SOS: Stereo Matching in O(1) with Slanted Support Windows

Despite recent advances on depth from stereo, algorithms usually trade-off accuracy for speed. In particular, efficient methods rely on fronto-parallel assumptions to reduce the search space and keep computation low. This research present SOS (Slanted O(1) Stereo), the first algorithm capable of leveraging slanted support windows without sacrificing speed or accuracy. They use an active stereo configuration, where an illuminator textures the scene.

  • Fully Convolutional Grasp Detection Network with Oriented Anchor Box

This research presents a real-time approach to predict multiple grasping poses for a parallel-plate robotic gripper using RGB images. A model with oriented anchor box mechanism is proposed and a new matching strategy is used during the training process. An end-to-end fully convolutional neural network is employed in their work. The network consists of two parts: the feature extractor and multi-grasp predictor. The feature extractor is a deep convolutional neural network. The multi-grasp predictor regresses grasp rectangles from predefined oriented rectangles, called oriented anchor boxes, and classifies the rectangles into graspable and ungraspable.

  • Keyframe-based Photometric Online Calibration and Color Correction

Finding the parameters of a vignetting function for a camera currently involves the acquisition of several images in a given scene under very controlled lighting conditions, a cumbersome and error-prone task where the end result can only be confirmed visually. Many computer vision algorithms assume photoconsistency, the constant intensity between scene points in different images, and tend to perform poorly if this assumption is violated. They present a real-time online vignetting and response calibration with additional exposure estimation for global-shutter color cameras. Their method does not require uniformly illuminated surfaces, known texture or specific geometry. The only assumptions are that the camera is moving, the illumination is static and reflections are Lambertian.

  • Hybrid Bayesian Eigenobjects: Combining Linear Subspace and Deep Network Methods for 3D Robot Vision

They introduce Hybrid Bayesian Eigenobjects (HBEOs), a novel representation for 3D objects designed to allow a robot to jointly estimate the pose, class, and full 3D geometry of a novel object observed from a single viewpoint in a single practical framework. By combining both linear subspace methods and deep convolutional prediction, HBEOs efficiently learn nonlinear object representations without directly regressing into high-dimensional space.

Photo (Taka talking to Robot)

GROUND株式会社's job postings
C5ce83c2 3467 4abc a4e9 9fac4a6f7722
Cbfdc2a0 4edd 4af6 b971 ff0a2251e736
1934715 1038906832838190 228971411215428589 n
C5ce83c2 3467 4abc a4e9 9fac4a6f7722
Cbfdc2a0 4edd 4af6 b971 ff0a2251e736
1934715 1038906832838190 228971411215428589 n

Weekly ranking

Show other rankings

Page top icon