Task-Context-Aware Diffusion Policy with Language Guidance for Multi-task Disassembly

Apr 1, 2025·
Jeon Ho Kang
Jeon Ho Kang
,
Sagar Joshi
,
Neel Dhanaraj
,
Satyandra K. Gupta
· 0 min read
Image credit: IEEE
Abstract
Diffusion-based policy learning frameworks excel in learning diverse tasks and achieving high success rates. However, in manufacturing settings, success rate alone is insufficient for real-world deployment. Tasks must be executed efficiently, minimizing idle time while maintaining precision. Additionally, in assembly and disassembly settings, a single scene often contains multiple task goals that need to be completed—such as picking up an engine while simultaneously securing a suspension—requiring the robot to reason over multiple objectives within the same observation space. In human-robot collaboration, enabling humans to specify task preferences is crucial for flexible and intuitive interaction. In this paper, we address two key challenges : (1) improving task execution efficiency by structuring tasks into distinct sub-task modes via language, and (2) enabling human operators to select tasks using natural language commands. Additionally, we introduce adaptive parameter selection framework and reliance on different sensory modalities depending on these sub-task modes. We evaluate our approach on the NIST Task Board, a representative benchmark of real-world tasks where multiple task goals exist within the same scene. Our method improves execution speed by 57% and show 19% improvement in task success rates. Demonstration videos are available at Project Website
Type
Publication
2025 IEEE 20th International Conference on Automation Science and Engineering (CASE)