This paper presents a new approach to unsupervised learning of controllable keypoints called Transporter with the following contributions:
The authors present a new method that learns keypoints robust to varying number, size and motion of objects in an completely unsupervised fashion.
They demonstrate that the learned keypoints can significantly boost efficiency of RL by learning relevant features.
They also show that since the keypoints learned are controllable, they are able to explore in the space of keypoints leading to much efficient action exploration.
In this work, authors presents a new approach to learning keypoint in an completely unsupervised way called "Transporter". Transport takes in a pair of images in input which differ in object position or obtained as a result of an applied control. Transporter network produces two outputs in the form of: a feature head and a keypoint head. The keypoint head predicts a fixed set of keypoints and the feature head is the features obtained via CNN layers. Now, the loss works in three steps: a. The area around the keypoints of both source and target image are suppressed by setting to zero in the source image. To this updated source image, features from target image are added at the corresponding points of target keypoints. The output is regressed to be equal to the target image with the MSE reconstruction loss. The intuition is that the keypoints should be accurate and have sufficient unordered correspondence such that source image can be transformed to target image by cleaning features of source image that are located at any of keypoints and adding features at target keypoints. The idea is very intuitive and I appreciate the simplicity of the method.
The extracted keypoints can be input to a RL algorithm in the form of a heatmap where the transporter network is pretrained using random exploration. The transporter weights are not changed during the RL algorithm training. Authors also discuss another application of keypoints. By designing intrinsic motivation rewards for an RL agent in the form of displacement of each keypoint leads to an agent that explores in the space of keypoints. The authors train Kx4 agents for each keypoint to move in 4 possible directions. They use this agent to explore by following a particular agent for a number of timesteps before switching to another agent leading to temporally extended exploration.
The authors demonstrate that the Keypoints learned by transporter networks are effective at long term tracking. They compare the precision and recall with baselines which encode a keypoint bottleneck and show that they can outperform them. They also demonstrate the utility of learned keypoints in sample efficient RL by pretraining transporter and training RL agent on Atari environment and beating the state of the art model-based and model-free algorithms by a significant margin. Another experiment demonstrate the controlling keypoints lead to deep exploration behavior in Atari compared to a random exploration baseline. I encourage the authors to add further baselines for the RL tasks comparing to method the focus of feature representation learning combined with RL as opposed to pure RL baselines. Additionally, I also suggest the authors to add exploration baselines like - Planning to explore, Psuedocount based exploration for their exploration agent in keypoint space. Overall this paper presents a very intuitive idea in a clear way and has a number of supporting experiments to validate the idea.