Edge guided single depth image super resolution
Jun Xie, Rogerio Schmidt Feris, et al.
ICIP 2014
Real-time, on-device segmentation is critical for latency-sensitive and privacy-aware applications like smart glasses and IoT devices. We introduce PicoSAM2, a lightweight (1.3M parameters, 336M MACs) promptable visual segmentation model optimized for edge and in-sensor execution, including the Sony IMX500. It builds on a depthwise separable U-Net, with knowledge distillation and fixed-point prompt encoding to learn from the Segment Anything Model 2 (SAM2). On COCO and LVIS, it achieves 51.9% and 44.9% mIoU, respectively. The quantized model (1.22MB) runs at 14.3ms on the IMX500— achieving ~86 MACs/cycle making it the only model meeting both memory and compute constraints for in-sensor deployment. Distillation boosts LVIS performance by +3.5% mIoU and +5.1% mAP. These results demonstrate that efficient, promptable segmentation is feasible directly on-camera, enabling privacy-preserving vision without cloud or host processing.
Jun Xie, Rogerio Schmidt Feris, et al.
ICIP 2014
Ritendra Datta, Jianying Hu, et al.
ICPR 2008
Eugene H. Ratzlaff
ICDAR 2001
Nicholas Mastronarde, Deepak S. Turaga, et al.
ICIP 2006