About 30 results
Open links in new tab
  1. Abstract Conditional image synthesis is a crucial task with broad applications, such as artistic creation and virtual real-ity. However, current generative methods are often task-oriented with …

  2. Abstract Contrastive Language-Image Pre-training (CLIP) [37] has emerged as a pivotal model in computer vision and multi-modal learning, achieving state-of-the-art performance at aligning …

  3. Abstract This paper addresses the task of video question answer-ing (videoQA) via a decomposed multi-stage, modular rea-soning framework. Previous modular methods have …

  4. Figure 1. Modular Customization of Diffusion Models. Given a large set of individual concepts (left), the goal of Modular Customization is to enable independent customization (fine-tuning) …

  5. Mod-Squad: Designing Mixtures of Experts As Modular Multi-Task Learners Zitian Chen1, Yikang Shen2, Mingyu Ding3, Zhenfang Chen2, Hengshuang Zhao3, Erik Learned-Miller1, Chuang …

  6. Abstract We propose a simple but effective modular approach MOPA (Modular ObjectNav with PointGoal agents) to sys-tematically investigate the inherent modularity of the object …

  7. Physical reasoning remains a significant challenge for Vision-LanguageModels(VLMs). Thislimitationarisesfrom an inability to translate learned knowledge into predictions about …

  8. Modular Blind Video Quality Assessment Wen Wen1, Mu Li2, Yabin Zhang3, Yiting Liao3, Junlin Li3, Li Zhang3, and Kede Ma1*

  9. Figure 1. Our novel modular transfer learning approach for seman-tic visual navigation learns a general purpose semantic search pol-icy by finding image views sampled randomly in the …

  10. Abstract One of the hallmarks of human intelligence is the ability to compose learned knowledge into novel concepts which can be recognized without a single training example. contrast, …