Abstract: Vision-language models (VLMs) offer flexible object detection through natural language prompts but suffer from performance variability depending on prompt phrasing. In this paper, we ...
A research team led by Professor Wang Hongzhi from the Hefei Institute of Physical Science of the Chinese Academy of Sciences has developed a multi-stage, dual-domain, progressive network with ...