Paper Link

Summary of Contributions

This paper presents a new method for active learning for scene completion with the following primary contributions:

  1. The paper formulates an unsupervised learning objective where the agent must acquire small number of observations to be able to predict all object of interest from all possible view points.

  2. Presents a reinforcement learning solution for efficient exploration that outperforms previous baselines.

  3. The exploratory policy learned by their model transfers to unknown tasks and environments.

Detailed Comments

In this paper the authors present a new approach to actively look around minimally to gain understanding of the 3D world. The problem is formulated as follows: The agent is given the freedom to choose a viewpoint at every timestep t. The agent has to propose an action every timesteps which moves the viewpoint and is rewarded at the end of episode. The maximum horizon is T. At the end of the episode the agent uses all the collected images from its chosen viewpoints to predict how the object would look like from all other viewpoints, which form the view point grid. The reward for the agent is the MSE distance between the images predicted from all the viewpoints and the target ground truth images. The agent is comprised of a recurrent neural network which takes as input the 2D image from the current viewpoint and the agent azimuth location (proprioceptive information) and maintains a hidden state. This hidden state is provided as an input to a decoder module which outputs the images from all possible viewpoints. The decoder is incentivized to produce images with lower MSE at every timestep. The same hidden state from the RNN is passed to a reinforcement learning agent to predict the action at the current state. The authors choose to use REINFORCE for training the RL agent. The authors also extend this method to work in the setting of unsupervised policy transfer where they transfer a policy that is trained without supervision on target tasks unseen by the policy actor.

This method is tested on the SUN260 AND ModelNet dataset. On SUN260 the agent aims to complete an omni-directional scene whereas on the ModelNet dataset the agents manipulates the object to complete image-based shape of the object. The compare against self-designed baselines which either uses 1-view, random views, views that are farthest apart and views which are most salient. Their methods outperform the baseline by a significant margin and have more improvements on more difficult datasets. In the unsupervised policy transfer setting, their method performs competitive with the Lookahead active RNN approach which is trained end to end while beating the baselines and this is surprising as their method is trained unsupervised for a separate task. The experiments provide a thorough analysis for their method but I encourage the authors to provide more discussion about the choice of the reward function and an ablation of how per timestep loss helps. Overall this paper is clear to read and presents an interesting analysis for active learning in the setting of object scene completion.