Summary of Contributions

This paper extends PointNet for 3D data processing, with the following contributions:

Paper presents a new architecture/design on how to extend Pointnet to address its shortcoming by: a. Providing a hierarchical structure to reason with larger receptive fields and reason about generalizability. The authors present a way to partition points in local neighborhoods then process them to produce features for next levels somewhat akin to CNN.
The authors demonstrate the utility of their method in volumetric clouds that have spatially varying density. This is an important shortcoming of previous works and is a phenomena that occurs frequently in practice.
PointNet++ is shown to achieve state of the art results in classification and segmentation task

Detailed Comments

This paper presents an extension to the work PointNet which formulated a way to represent universal set functions that is invariant to permutation. PointNet provides an efficient way to learn functions on set which the authors leverage in this work. PointNet++ is an hierarchical structure consisting of 3 steps in each level: a. A number of centroids are found in the point set such that they cover the entire point set. They use a heuristic method to achieve this which is Iterative Farthest Point sampling. b. The points in local overlapping regions are grouped to their respective centroids. Thus we have a number of local point sets c. Each point set is passed to a pointNet learner which is shared across the layer. This produces a compact feature description of the local set.

The three steps above downsample points in each layer and learn a succinct feature representation with increasing resolution in further layers. A outstanding problem in this architecture is the non-uniform density of point clouds obtained by sensors as some places can have very low density and the features learned at those places can be corrupted. The authors present two solutions for this: a. MSG: where they concatenate features at multiple resolutions at each local set but this can be slow as number of points increases. b. MRG: where they concatenate 2 feature vectors: one obtained via pointnet on the local regions and one obtained by concatenating raw features from nearby local sets. These modifications are claimed to help them be robust to non-uniform sampling of point density. It would be didactic if authors include an ablation/analysis as to why the hierarchical structure of Pointnet++ is not automatically able to learn robustly to non-uniform density (since features are grouped anyway) and why MSG/MRG is needed.

PointNet++ is evaluated on 4 datasets and achieves competitive or state-of-the-art results on all of them and the baselines are reasonably diverse. In the classification setting, they provide an experiment showing effect on performance when points are downsampled. It is somewhat unclear in Fig. 4 whether MSG/MRG provide any benefits since Dropout seems to be crucial in improving the robustness w.r.t point density. In the segmentation setting, PointNet++ outperforms the baselines with a large margin. However for the robustness experiments to point density, I would recommend the authors to include (PointNet+DP, PointNet++ with DP without MSG, PointNet++ with DP without MRG) as the baselines as well to disambiguate the effects of dropout vs multiscale learning to point density robustness. The authors further show an experiment on non-euclidean metric space where to infer correctly, a notion of geodesic distance is more accurate. Pointnet++ can easily acomodate different metric spaces and achieves state of the art results. I appreciate authors showing feature representation visually to provide more insight. Overall, I think this is a good work presenting a well reasoned architecture for 3D Pointcloud processing and produces state of the art results.