Abstract
Achieving object grasping in everyday environments by leveraging the powerful generalization capabilities of foundational general models while enhancing their deployment efficiency within robotic control systems represents a key challenge for service robots. To address the application environments and hardware resource constraints of household robots, a Three-step Pipeline Grasping Framework (TPGF) is proposed for zero-shot object grasping. The framework operates on the principle of "object perception-object point cloud extraction-grasping pose determination" and requires no training or fine-tuning. We integrate advanced foundational models into the Object Perception Module (OPM) to maximize zero-shot generalization and develop a novel Point Cloud Extraction Method (PCEM) based on Depth Information Suppression (DIS) to enable targeted grasping from complex scenes. Furthermore, to significantly reduce hardware overhead and accelerate deployment, a Saturated Truncation strategy based on relative information entropy is introduced for high-precision quantization, resulting in the highly efficient model, EntQ-EdgeSAM. Experimental results on public datasets demonstrate the superior inspection generalization of the combined foundational models compared to task-specific baselines. The proposed Saturated Truncation strategy achieves 3-21% higher quantization accuracy than symmetric uniform quantization, leading to 3.5% model file compression and 95% faster inference speed for EntQ-EdgeSAM. Grasping experiments confirm that the TPGF achieves robust recognition accuracy and high grasping success rates in zero-shot object grasping tasks within replicated everyday environments, proving its practical value and efficiency for real-world robotic deployment.