Menu

Set-of-Vision Prompting

This paper introduces Set-of-Vision (SoV) prompting, which enhances emotion recognition in Vision Large Language Models by using spatial visual cues like bounding boxes, numbers, and facial landmarks to precisely identify and analyze facial expressions while preserving image context.

Read More
Scroll

GeoDANO Accepted by EMNLP 2025

Our paper on geometry problem solving with large vision-language models has been accepted by EMNLP Findings 2025. I'm grateful to have collaborated with an amazing team: Seunghyuk, Yang, Youngbin, Seungbeom, and Dongwoo.

Read More

LMOD Accepted by NAACL 2025

Our paper on multimodal large models and ophthalmology benchmarks has been accepted by NAACL Findings 2025. I'm grateful to have collaborated with an amazing team: Yu, Dylan, Xuansheng, Ke, Yih-Chung, Ninghao, Xiuzhen, and Qingyu.

Read More