
Performance comparison of state-of-the-art large vision-language models on the LMOD benchmark, evaluating their capabilities in anatomical recognition and diagnosis analysis. The best-performing model in each metric is highlighted in bold. Finetuned results were obtained via fine-tuning a LLaVA-Med model.