DM-VLP-Grasp: Diffusion Model-Based Grasp Planning with Visual-Language Pretraining for Unknown Object Manipulation
Abstract
This paper proposes an unknown object grasping algorithm (DM-VLP-Grasp) based on diffusion model and visual language pre-training, aiming to improve the grasping performance of robots in complex environments. By improving the visual language pre-training model, the image and text information are integrated to accurately extract the object grasping features; the diffusion model is used to generate a reliable grasping strategy, and efficient grasping is achieved through iterative optimization. On a self-built dataset containing 8000 samples, the results show that the grasping success rate of DM-VLPGrasp reaches 93.6%, and the single strategy generation time is 0.78 seconds, showing high stability and computational efficiency. The grasping stability is measured by the root mean square value (RMS) of the object shaking amplitude and the grasping force fluctuation range, both of which show excellent performance. The experimental results verify the effectiveness and innovation of the algorithm in the unknown object grasping task, and provide a new solution for robot automated grasping.References
Wang, S., Zhou, Z., & Kan, Z. (2022). When transformer meets robotic grasping: Exploits context for efficient grasp detection. IEEE robotics and automation letters, 7(3), 8170-8177.
Liu, Q. C., Zhang, X. Y., Fan, R., Liu, W. M., & Xue, J. F. (2024). A Method for Industrial Robots to Grasp and Detect Instrument Parts under 3D Visual Guidance. Journal of Computers, 35(1), 167-175.
Huang, B., Han, S. D., Yu, J., & Boularias, A. (2021). Visual foresight trees for object retrieval from clutter with nonprehensile rearrangement. IEEE Robotics and Automation Letters, 7(1), 231-238.
Knights, E., Mansfield, C., Tonin, D., Saada, J., Smith, F. W., & Rossit, S. (2021). Hand-selective visual regions represent how to grasp 3D tools: Brain decoding during real actions. Journal of Neuroscience, 41(24), 5263-5273.
Wandelt, S. K., Kellis, S., Bjånes, D. A., Pejsa, K., Lee, B., Liu, C., & Andersen, R. A. (2022). Decoding grasp and speech signals from the cortical grasp circuit in a tetraplegic human. Neuron, 110(11), 1777-1787.
Chen, Y., Wu, Y., Zhang, Z., Miao, Z., Zhong, H., Zhang, H., & Wang, Y. (2022). Image-based visual servoing of unmanned aerial manipulators for tracking and grasping a moving target. IEEE Transactions on Industrial Informatics, 19(8), 8889-8899.
Zhang, S., Chen, Y., Zhang, L., Gao, X., & Chen, X. (2022). Study on robot grasping system of SSVEP-BCI based on augmented reality stimulus. Tsinghua Science and Technology, 28(2), 322-329.
Harrak, M. H., Heurley, L. P., Morgado, N., Mennella, R., & Dru, V. (2022). The visual size of graspable objects is needed to potentiate grasping behaviors even with verbal stimuli. Psychological Research, 86(7), 2067-2082.
Song, K., Wang, J., Bao, Y., Huang, L., & Yan, Y. (2022). A novel visible-depth-thermal image dataset of salient object detection for robotic visual perception. IEEE/ASME Transactions on Mechatronics, 28(3), 1558-1569.
Gong, Z., Qiu, C., Tao, B., Bai, H., Yin, Z., & Ding, H. (2021). Tracking and grasping of a moving target based on an accelerated geometric particle filter on a colored image. Science China Technological Sciences, 64(4), 755-766.
Xu, R., Chu, F. J., & Vela, P. A. (2022). GKNet: Grasp keypoint network for candidate detection. The International Journal of Robotics Research, 41(4), 361-389.
De Farias, C., Marturi, N., Stolkin, R., & Bekiroglu, Y. (2021). Simultaneous tactile exploration and grasp refinement for unknown objects. IEEE Robotics and Automation Letters, 6(2), 3349-3356.
Marwan, Q. M., Chua, S. C., & Kwek, L. C. (2021). Comprehensive review on the reaching and grasping of objects in robotics. Robotica, 39(10), 1849-1882.
Scheikl, P. M., Tagliabue, E., Gyenes, B., Wagner, M., Dall'Alba, D., Fiorini, P., & Mathis-Ullrich, F. (2022). Sim-to-real transfer for visual reinforcement learning of deformable object manipulation for robot-assisted surgery. IEEE Robotics and Automation Letters, 8(2), 560-567.
Jiang, J., Cao, G., Butterworth, A., Do, T. T., & Luo, S. (2022). Where shall I touch? Vision-guided tactile poking for transparent object grasping. IEEE/ASME Transactions on Mechatronics, 28(1), 233-244.
Cheng, H., Wang, Y., & Meng, M. Q. H. (2022). A vision-based robot grasping system. IEEE Sensors Journal, 22(10), 9610-9620.
Hassanin, M., Khan, S., & Tahtali, M. (2021). Visual affordance and function understanding: A survey. ACM Computing Surveys (CSUR), 54(3), 1-35.
Ze, Y., Hansen, N., Chen, Y., Jain, M., & Wang, X. (2023). Visual reinforcement learning with self-supervised 3d representations. IEEE Robotics and Automation Letters, 8(5), 2890-2897.
Orban, G. A., Sepe, A., & Bonini, L. (2021). Parietal maps of visual signals for bodily action planning. Brain Structure and Function, 226(9), 2967-2988.
Song, Y., Wen, J., Liu, D., & Yu, C. (2022). Deep robotic grasping prediction with hierarchical RGB-D fusion. International Journal of Control, Automation and Systems, 20(1), 243-254.
Rolls, E. T., Deco, G., Huang, C. C., & Feng, J. (2023). Multiple cortical visual streams in humans. Cerebral Cortex, 33(7), 3319-3349.
Costanzo, M., De Maria, G., Lettera, G., & Natale, C. (2021). Can robots refill a supermarket shelf?Motion planning and grasp control. IEEE Robotics & Automation Magazine, 28(2), 61-73.
DOI:
https://doi.org/10.31449/inf.v49i29.9000Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







