Vision-threatening eye diseases pose a major global health burden, affecting more than 2.2 billion people
worldwide.
In the United States alone, over 90 million people are at high risk for vision loss, yet many remain
undiagnosed
or are diagnosed too late for effective treatment. Up to 50% of patients with diabetic retinopathy do not
receive
timely eye examinations, highlighting critical gaps in screening and management.
While artificial intelligence offers promising solutions through multimodal large language models (MLLMs),
a major challenge is the lack of unified, comprehensive benchmarks for ophthalmology. Most existing
benchmarks
were designed for earlier CNN-based models or focus on text-only tasks, failing to reflect real-world
ophthalmic
practice where medical imaging is indispensable.
This work presents LMOD+, a significantly enhanced version of our large-scale multimodal
ophthalmology benchmark,
comprising 32,633 images with multi-granular annotations across 12 common ophthalmic
conditions and
5 imaging modalities. Our key contributions include:
- Comprehensive Dataset: Our dataset encompasses 32,633 high-quality images, featuring an
extensive
collection of color fundus photographs that covers diverse pathological conditions
- Diverse Tasks: Comprehensive evaluation across 12 binary eye condition diagnosis tasks,
multi-class disease diagnosis, severity classification, and demographic prediction to assess potential
bias
- Extensive Evaluation: Systematic assessment of 24 state-of-the-art MLLMs,
including recent advanced models such as InternVL, Qwen, and DeepSeek series
- Public Resources: Full dataset release with dynamic leaderboard and evaluation pipeline to
support ongoing benchmarking and model development