1 School of Computer Science and Engineering, Sun Yat-sen University, China
2 Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, China
† corresponding author
For 6-DoF grasp detection, simulated data is expandable to train more powerful model, but it faces the challenge of the large gap between simulation and real world. Previous works bridge this gap with a sim-to-real way. However, this way explicitly or implicitly forces the simulated data to adapt to the noisy real data when training grasp detectors, where the positional drift and structural distortion within the camera noise will harm the grasp learning. In this work, we propose a Real-to-Sim framework for 6-DoF Grasp detection, named R2SGrasp, with the key insight of bridging this gap in a real-to-sim way, which directly bypasses the camera noise in grasp detector training through an inference-time real-to-sim adaption. To achieve this real-to-sim adaptation, our R2SGrasp designs the Real-to-Sim Data Repairer (R2SRepairer) to mitigate the camera noise of real depth maps in data-level, and the Real-to-Sim Feature Enhancer (R2SEnhancer) to enhance real features with precise simulated geometric primitives in feature-level. To endow our framework with the generalization ability, we construct a large-scale simulated dataset cost-efficiently to train our grasp detector, which includes 64,000 RGB-D images with 14.4 million grasp annotations. Sufficient experiments show that R2SGrasp is powerful and our real-to-sim perspective is effective. To our knowledge, we are the first using simulated data to surpass the methods trained with real-world data in 6-DoF grasping. The real-world experiments also show great generalization ability of R2SGrasp.
Real-to-sim way is able to train a robust grasp detector on noiseless simulated data to avoid the interference of noise in grasping skills learning, thereby achieving stronger grasping capabilities. To apply the precise grasping ability learned from simulation to the real world, the R2SGrasp framework, including a Real-to-Sim Data Repairer (R2SRepairer) and a Real-to-Sim Feature Enhancer (R2SEnhancer) is designed to achieve real-to-sim adaptation at data and feature level respectively.
In inference phase, R2SRepairer repairs depth map from RGB-D input, then a feature extractor extracts local features from the single-view point cloud which is transformed from the repaired depth map. Then R2SEnhancer enhances the real features using the stored simulated structural features and finally predicts the grasp poses. In training phase, we train the R2SRepairer on twin datasets and train the grasp detector with R2SEnhancer on our R2Sim dataset.
If you have any questions, please feel free to contact us: