Medical Vision Language Model

Junmo Kim • TEAM LEADER •

Keunsoo Heo​

[Scholar] [Github]

Soyeon Bak​

[CV]

Junmo Kim​

[Scholar]

Yubin Jeong

[Github]

About team VLM

Responsive image

● Mission: ​
Advance multimodal learning in Medical AI by creating VLMs that seamlessly fuse medical imaging, physiological signals, clinical text, and structured knowledge.

● Scope: ​
Tackle real-world clinical tasks such as diagnosis, prognosis, explanation, question answering, and more.​​

● Goal: ​
Deliver robust, clinically deployable AI that supports healthcare professionals and ultimately improves patient care.​​​​​

Available internship topics

● Self-correcting-based Image-Text Alignment for Radiology Report Generation​​

● Mitigating Hallucinations in Medical VLMs for Medical VQA.​​

Our research topics

Optimizing medical VLMs via clinical-aware preference alignment

● We improve trustworthiness in medical VLMs by mitigating diverse hallucination types in complex medical reasoning tasks through clinical-aware image-text alignment.

● We develop optimization methods based on Direct Preference Optimization (DPO), RLH/AIF to align model outputs with clinical image-text evidence.

Related publications

[CVPR'25] DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation

Developing clinical knowledge-guided refinement frameworks with retrieval-augmented generation (RAG)

● We seek to enhance the factual consistency and clinical validity of medical VLM outputs by grounding them in case-based and knowledge-based evidence.

● We develop a retrieval-augmented generation framework that fuses retrieved images and clinical texts using a knowledge-guided refinement module.

Generating pathology-aligned radiology reports

● We aim to generate radiology reports that are visually grounded and free from omissions or hallucinations.

● We design a self-correction pipeline that retrieves pathology-matched reports and iteratively aligns each sentence with the source image to refine the output.

Related publications

[CVPR'25] DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation

Designing interpretable reasoning frameworks for medical VLMs

● We aim to equip VLMs with expert-level reasoning capabilities for complex diagnostic tasks involving multiple images or modalities.

● We develop interpretable modules based on chain-of-thought prompting, cross-image evidence aggregation, and self-verifying rationales to support transparent and auditable inference.