Mamba Policy: Towards Efficient 3D Diffusion Policy with
Hybrid Selective State Models

1The Hong Kong University of Science and Technology (Guangzhou)
2The Hong Kong University of Science and Technology
3Beijing Innovation Center of Humanoid Robotics 4Center for X-Mechanics, Zhejiang University

✶ indicates equal contribution

Abstract

Diffusion models have been widely employed in the field of 3D manipulation due to their efficient capability to learn distributions, allowing for precise prediction of action trajectories. However, diffusion models typically rely on large parameter UNet backbones as policy networks, which can be challenging to deploy on resource-constrained devices. Recently, the Mamba model has emerged as a promising solution for efficient modeling, offering low computational complexity and strong performance in sequence modeling. In this work, we propose the Mamba Policy, a lighter but stronger policy that reduces the parameter count by over 80% compared to the original policy network while achieving superior performance. Specifically, we introduce the XMamba Block, which effectively integrates input information with conditional features and leverages a combination of Mamba and Attention mechanisms for deep feature extraction. Extensive experiments demonstrate that the Mamba Policy excels on the Adroit, Dexart, and MetaWorld datasets, requiring significantly fewer computational resources. Additionally, we highlight the Mamba Policy's enhanced robustness in long-horizon scenarios compared to baseline methods and explore the performance of various Mamba variants within the Mamba Policy framework.

Model



Overview of Mamba Policy. Our proposed model takes the noised action and the condition as inputs, the latter of which is composed of three parts: point cloud perception embedding, robot state embedding, and time embedding. Each of these components is processed through its respective encoder. The X-Mamba UNet is then employed to process these inputs and ultimately return the predicted noise. During training, the model is updated using MSE loss with the label noise. For validation, the model leverages DDIM to reconstruct the original action, which is then used to interact with the environment and execute different tasks.


Experiment Results


Efficiency Analysis


Efficieny Comparisons. Compared with DP3, our proposed Mamba Policy not only achieves superior results but also gains up to 90% and 80% computational savings regarding FLOPs and parameters number .


Main Results


Quantitative Comparisons of different baselines in simulation environments. We compare our Mamba Policy with IBC, BCRNN, 3D Diffusion Policy, and Diffusion Policy in Adroit, MetaWorld and DexArt datasets. † denotes our reproduced results for fair comparison. Our proposed Mamba Policy achieves superior results across all domains.

Ablations on SSM Variants


Mamba Policy with different SSM variants. We include Mamba, Mamba2, bidirectional SSM, and Hydra for comparisons, where the V1-based and Hydra-based policy achieves good performances.



Demo

Robotic Manipulation
We perform our experiments using the Adroit, MetaWorld, and DexArt benchmarks. These tasks span from simple pick or push tasks to more complex challenges like dexterous manipulation, ensuring the model's effectiveness across a wide range of scenarios.

BibTeX

@article{cao2024mamba,
  title={Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models},
  author={Cao, Jiahang and Zhang, Qiang and Sun, Jingkai and Wang, Jiaxu and Cheng, Hao and Li, Yulin and Ma, Jun and Shao, Yecheng and Zhao, Wen and Han, Gang and others},
  journal={arXiv preprint arXiv:2409.07163},
  year={2024}
}