Task-Context-Aware Diffusion Policy with Language Guidance for Multi-task Disassembly

Apr 1, 2025·

Jeon Ho Kang

Sagar Joshi

Neel Dhanaraj

Satyandra K. Gupta

· 0 min read

PDF Cite Code Project Video

Image credit: IEEE

Abstract

Diffusion-based policy learning frameworks excel in learning diverse tasks and achieving high success rates. However, in manufacturing settings, success rate alone is insufficient for real-world deployment. Tasks must be executed efficiently, minimizing idle time while maintaining precision. Additionally, in assembly and disassembly settings, a single scene often contains multiple task goals that need to be completed—such as picking up an engine while simultaneously securing a suspension—requiring the robot to reason over multiple objectives within the same observation space. In human-robot collaboration, enabling humans to specify task preferences is crucial for flexible and intuitive interaction. In this paper, we address two key challenges : (1) improving task execution efficiency by structuring tasks into distinct sub-task modes via language, and (2) enabling human operators to select tasks using natural language commands. Additionally, we introduce adaptive parameter selection framework and reliance on different sensory modalities depending on these sub-task modes. We evaluate our approach on the NIST Task Board, a representative benchmark of real-world tasks where multiple task goals exist within the same scene. Our method improves execution speed by 57% and show 19% improvement in task success rates. Demonstration videos are available at Project Website

Type

Conference paper

Publication

2025 IEEE 20th International Conference on Automation Science and Engineering (CASE)

Last updated on Apr 8, 2025

Language-Guided Diffusion Policy

Authors

Jeon Ho Kang

Ph.D. Student in Robotics

Robotic Compliant Object Prying Using Diffusion Policy Guided by Vision and Force Observations Mar 1, 2025 →