Medical AI Projects

1. Core Vision

์˜๋ฃŒ ์˜์ƒ์—์„œ ๋‹จ์ˆœ ๋ถ„๋ฅ˜(Classification)๊ฐ€ ์•„๋‹ˆ๋ผ, ์งˆ๋ณ‘์„ ์ •๋Ÿ‰ํ™”(Quantification)ํ•˜๊ณ , ๊ตฌ์กฐํ™”๋œ ์ž„์ƒ ํ‘œํ˜„(structured clinical representations)์„ ์ƒ์„ฑํ•˜์—ฌ LLM ๊ธฐ๋ฐ˜์˜ ์ž„์ƒ ์ถ”๋ก ์œผ๋กœ ์—ฐ๊ฒฐ๋˜๋Š” "End-to-End Clinical Reasoning Pipeline"์„ ๊ตฌ์ถ•ํ•˜๋Š” ๊ฒƒ์ด ์ „์ฒด์ ์ธ ์—ฐ๊ตฌ ๋น„์ „์ž…๋‹ˆ๋‹ค.

์ด ๋น„์ „์€ ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ•ต์‹ฌ ๋ชฉํ‘œ๋ฅผ ๊ฐ€์ง‘๋‹ˆ๋‹ค:

  • ์˜๋ฃŒ ์˜์ƒ โ†’ ๋ณ‘๋ณ€ ๋ถ„ํ• (segmentation) โ†’ ์ˆ˜์น˜ํ™”(quantification) โ†’ ์ •๋Ÿ‰ ์ง€ํ‘œ ๊ธฐ๋ฐ˜ ์˜ˆ์ธก ๋ชจ๋ธ
  • ์ •๋Ÿ‰ ์ง€ํ‘œ + ์ด๋ฏธ์ง€ + ํ…์ŠคํŠธ ๋ฆฌํฌํŠธ โ†’ Multimodal LLM ๊ธฐ๋ฐ˜ AI Doctor Assistant
  • ์•ˆ์ „์„ฑยท์‹ ๋ขฐ์„ฑ ๊ฐ•ํ™”: ๋ชจ๋ธ ๊ฒ€์ฆ(Formal Verification), ์•ˆ์ „ ๋‰ด๋Ÿฐ ๋ถ„์„, mechanistic interpretability

2. Research Theme A: Medical Image Quantification & Disease Modeling

์ „ํ†ต์ ์ธ CNN ๊ธฐ๋ฐ˜ ๋ถ„๋ฅ˜๋ฅผ ๋„˜์–ด, ์งˆ๋ณ‘์˜ ์ง„ํ–‰ ์ •๋„๋ฅผ ์ˆ˜์น˜ํ™”ํ•˜๊ณ  ์ž„์ƒ์ ์œผ๋กœ ํ•ด์„ ๊ฐ€๋Šฅํ•œ continuous biomarker๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์—ฐ๊ตฌ ๋ผ์ธ์ž…๋‹ˆ๋‹ค.

A.1. Ophthalmology (์•ˆ๊ณผ ์˜์ƒ ๊ธฐ๋ฐ˜ ์ •๋Ÿ‰ํ™”)

  • Epiretinal Membrane(ERM), Diabetic Retinopathy, Macular Disease ๋“ฑ
  • Fundus / OCT-B-scans ๊ธฐ๋ฐ˜:
    • ๋ณ‘๋ณ€ segmentation (Membrane, Retina layers, cyst regions)
    • Thickness map, curvature, reflectance profile ๋“ฑ ์ •๋Ÿ‰ ์ง€ํ‘œ ์ƒ์„ฑ
  • ์ •๋Ÿ‰ ์ง€ํ‘œ ๊ธฐ๋ฐ˜ disease staging, progression prediction ๋ชจ๋ธ

A.2. Gait / Orthopedics (์ •ํ˜•์™ธ๊ณผ ๋ณดํ–‰ ๋ถ„์„)

  • Markerless video (pose estimation) โ†’ biomechanical features โ†’ gait anomaly quantification
  • Clinical grading ๋Œ€์‹  ์ •๋Ÿ‰ feature๋ฅผ ํ™œ์šฉํ•œ ์ง„๋‹จ ๋ฐ progression ๋ชจ๋ธ
  • Pediatric, elderly imbalance assessment ๋“ฑ ํ™•์žฅ ๊ฐ€๋Šฅ

A.3. Multi-modal structured data integration

์˜์ƒ, ์ •๋Ÿ‰ feature, EMR, ๊ฒ€์‚ฌ ์ˆ˜์น˜(labs) ๋“ฑ ํ†ตํ•ฉ.
์ตœ์ข… ๋ชฉ์ : disease progression world model ๊ตฌ์ถ•.


3. Research Theme B: Domain-Specialized Medical LLMs (Ophtimus-V2 ๊ณ„์—ด)

์‚ฌ์šฉ์ž๊ฐ€ ์ง์ ‘ ๊ฐœ๋ฐœํ•œ Ophthalmology ํŠนํ™” LLM(Ophtimus-V2-Tx) ์—ฐ๊ตฌ ๋ผ์ธ์ž…๋‹ˆ๋‹ค.

B.1. Clinical reasoning ๋ชจ๋ธ

  • ์ผ€์ด์Šค ๋ฆฌํฌํŠธ ๊ธฐ๋ฐ˜ fine-tuning
  • ์ฆ์ƒโ€“์˜์ƒโ€“์ง„๋‹จโ€“์น˜๋ฃŒ๋กœ ์ด์–ด์ง€๋Š” "์ž„์ƒ ์ง€์‹ ๊ฒฝ๋กœ(clinical knowledge pathway)" ํ•™์Šต
  • hallucination ๊ฐ์†Œ ๋ฐ ์•ˆ์ „์„ฑ ๊ฐ•ํ™” ๋ชฉ์ ์˜ LoRA ๋ฐ structured LoRA ์‹คํ—˜

B.2. Multi-modal ์ž…๋ ฅ ํ™•์žฅ

  • Fundus / OCT(B-scan) embedding + structured quantification + textual description
  • ๋‚˜์•„๊ฐ€ ์˜๋ฃŒ์šฉ World Model๊ณผ ๊ฒฐํ•ฉํ•˜์—ฌ progression simulator ์—ฐ๋™ ๊ฐ€๋Šฅ

B.3. Safety & Trustworthiness

  • "Safety Neurons" ๋ถ„์„
  • Mechanistic interpretability (circuit-level patterns in reasoning)
  • Clinically harmful output ๊ฒ€์ถœ ๋ฐ unlearning

4. Research Theme C: Formal Verification + AI Safety for Medical AI

์˜๋ฃŒ AI์˜ ์‹ ๋ขฐ์„ฑ๊ณผ ๊ทœ์ œ ๋Œ€์‘(์˜๋ฃŒ๊ธฐ๊ธฐ ์ธํ—ˆ๊ฐ€ ๋“ฑ)์„ ์œ„ํ•ด
์ •ํ˜• ๊ธฐ๋ฒ•(Formal Methods) + AI Safety๋ฅผ ๊ฒฐํ•ฉํ•œ ๋…์ž์  ์—ฐ๊ตฌ ๋ผ์ธ.

C.1. Verified Environment Models

  • Timed Automata ๊ธฐ๋ฐ˜ ์˜๋ฃŒ ํ”„๋กœ์„ธ์Šค ๋ชจ๋ธ
  • Model checking(PCTL, CTL, TCTL)์„ ํ†ตํ•œ ์•ˆ์ „ ์ œ์•ฝ ์กฐ๊ฑด ๊ฒ€์ฆ
  • ๊ฐ•ํ™”ํ•™์Šต ๋˜๋Š” AI inference๊ฐ€ ์ด ์ œ์•ฝ์„ ์œ„๋ฐ˜ํ•˜์ง€ ์•Š๋„๋ก control shield ์ œ๊ณต

C.2. Verified AI Controllers

  • Medical AI inference pipeline์— safety property ๊ฐ•์ œ
  • "์–ธ์ œ ์–ด๋–ค ์ž…๋ ฅ์—์„œ ์œ„ํ—˜ํ•œ ์ถœ๋ ฅ์ด ๋ฐœ์ƒ ๊ฐ€๋Šฅํ•œ๊ฐ€"๋ฅผ ๊ฒ€์ฆํ•˜๋Š” ๋ถ„์„
  • Verification-aware fine-tuning ๋˜๋Š” pruning

C.3. Trustworthy Data & Contamination Check

  • Crowd annotation์—์„œ LLM-cheating ํƒ์ง€(peer prediction ๊ธฐ๋ฐ˜)
  • ์˜๋ฃŒ ๋ฐ์ดํ„ฐ ๋ผ๋ฒจ์˜ ์‹ ๋ขฐ์„ฑ ํ™•๋ณด

5. Research Theme D: Medical World Models & Embodied AI

NeurIPS 2025์˜ ํ•ต์‹ฌ ํŠธ๋ Œ๋“œ("World Models", "Embodied AI for Healthcare")์™€ ์ง์ ‘์ ์œผ๋กœ ์ •๋ ฌ๋˜๋Š” ์—ฐ๊ตฌ ๋ฐฉํ–ฅ.

D.1. Disease Progression World Model

  • Retina / ERM progression dynamics๋ฅผ ๋ชจ๋ธ๋งํ•˜๋Š” generative world model
  • OCT/B-scan ์—ฐ์† ์˜์ƒ ๊ธฐ๋ฐ˜ temporal latent dynamics
  • "๋งŒ์•ฝ ํ™˜์ž์˜ ์ƒํƒœ๊ฐ€ X๋ผ๋ฉด, 6๊ฐœ์›” ํ›„์˜ OCT๋Š” ์–ด๋–ป๊ฒŒ ๋ณ€ํ• ๊นŒ?" ๊ฐ™์€ counterfactual simulation ๊ฐ€๋Šฅ

D.2. Multi-modal Clinical Simulator

  • ์ด๋ฏธ์ง€, ์ •๋Ÿ‰ biomarker, ํ…์ŠคํŠธ ๋ฆฌํฌํŠธ, ์น˜๋ฃŒ ์ด๋ ฅ ํฌํ•จ
  • LLM์—๊ฒŒ ๊ตฌ์กฐํ™”๋œ ์ž„์ƒ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ปจํ…์ŠคํŠธ ์ œ๊ณต
  • ์ž„์ƒ ๊ฒฐ์ •์ง€์›(Decision Support) ์ตœ๋Œ€ ๊ฐ•ํ™”

D.3. Reinforcement Learning in Verified Clinical Simulation

  • ์‹ค์„ธ๊ณ„ ์˜๋ฃŒ๋ฅผ ์ง์ ‘ ํ•™์Šต์‹œํ‚ค๋Š” ๊ฒƒ์ด ๊ธˆ์ง€๋˜๋Š” ๊ฒฝ์šฐ
  • Verified world model ๊ธฐ๋ฐ˜ safe RL ์ ์šฉ ๊ฐ€๋Šฅ
  • Treatment planning ๋˜๋Š” screening ์ •์ฑ… ์ตœ์ ํ™” ์—ฐ๊ตฌ๋กœ ํ™•์žฅ ๊ฐ€๋Šฅ

6. Research Theme E: Foundations for AI-Driven Clinical Decision Support

์œ„์˜ ๋ชจ๋“  ์ถ•(A~D)๋ฅผ ํ†ตํ•ฉํ•˜์—ฌ ์ž„์ƒ ์ถ”๋ก  ์ž๋™ํ™”๋ผ๋Š” ๊ถ๊ทน์ ์ธ ์˜๋ฃŒ AI ๋ชฉ์ ์„ ์ง€์›.

E.1. Image โ†’ Biomarker โ†’ Reasoner โ†’ Recommendation

  • ์™„์ „ํžˆ end-to-end ์—ฐ๊ฒฐ ๊ฐ€๋Šฅํ•œ pipeline ๊ตฌ์ถ•
  • ์˜์ƒ ๊ธฐ๋ฐ˜ quantification์ด LLM reasoning์˜ ์ž…๋ ฅ ๊ตฌ์กฐ๋กœ ์—ฐ๊ฒฐ๋จ

E.2. Multi-lingual / Multi-institution Generalization

  • ํ•œ๊ตญ, ๋ฏธ๊ตญ(UPenn), ๊ธฐํƒ€ ๊ธฐ๊ด€ ๋ฐ์ดํ„ฐ ํ˜‘๋ ฅ ๊ธฐ๋ฐ˜
  • Robustness, distribution shift ์—ฐ๊ตฌ ์ˆ˜ํ–‰

E.3. Regulatory-readiness

  • ์‹ ๋ขฐ์„ฑ ํ‰๊ฐ€ ์ง€ํ‘œ(specificity, sensitivity, FN-critical tasks)
  • "Safety case" ๊ตฌ์กฐ๋ฅผ ๊ฐ–์ถ˜ ์˜๋ฃŒ AI ๋ฌธ์„œํ™” ๊ฐ€๋Šฅ

7. ์ „์ฒด ํ…Œ๋งˆ ์š”์•ฝ (One-page Executive Summary)

์‚ฌ์šฉ์ž์˜ Medical AI ์—ฐ๊ตฌ๋Š” ๋‹จ์ˆœํ•œ ์ด๋ฏธ์ง€ ๋ถ„๋ฅ˜๋ฅผ ๋„˜์–ด์„œ ๋‹ค์Œ์˜ ํ†ตํ•ฉ์  ์—ฐ๊ตฌ ์ƒํƒœ๊ณ„๋ฅผ ๊ตฌ์ถ•ํ•˜๋Š” ๊ฒƒ์— ์ดˆ์ ์„ ๋‘”๋‹ค.

  1. ์งˆ๋ณ‘ ์ •๋Ÿ‰ํ™” ๊ธฐ์ˆ 
    • ์˜์ƒ ๊ธฐ๋ฐ˜ ๋ณ‘๋ณ€ ๋ถ„์„, ์ˆ˜์น˜ํ™”, progression modeling
  2. ์ž„์ƒ ํŠนํ™” LLM ๊ฐœ๋ฐœ(Ophtimus-V2-Tx)
    • Ophthalmology ์ „๋ฌธ reasoning ๋ชจ๋ธ
    • Multi-modal (OCT/Fundus + EMR + biomarkers) ์ฒ˜๋ฆฌ
  3. AI Safety & Formal Verification ์ ์šฉ
    • ์˜๋ฃŒ AI๋ฅผ ์œ„ํ•œ safety constraints ๋ณด์žฅ
    • Verified environment + verified inference
  4. World Model ๊ธฐ๋ฐ˜ ์ž„์ƒ ์‹œ๋ฎฌ๋ ˆ์ด์…˜
    • ์งˆ๋ณ‘ ์ง„ํ–‰ ์‹œ๋ฎฌ๋ ˆ์ด์…˜
    • LLM์˜ clinical decision reasoning์„ ์œ„ํ•œ foundation
  5. ์ „๋ฐ˜์  ์˜๋ฃŒ ์˜์‚ฌ๊ฒฐ์ • ์ง€์› ์‹œ์Šคํ…œ ๊ตฌ์ถ•
    • Data โ†’ Image โ†’ Quantification โ†’ LLM โ†’ Decision๊นŒ์ง€ end-to-end

Key Themes

  • Domain-specialized LLMs for ophthalmology (e.g., Ophtimus-V2-Tx)
  • Noise-robust medical image analysis and quantification
  • Reliable mapping from model outputs to clinical coding systems
  • Evaluation frameworks for safety, robustness, and explainability

Selected Projects

Ophtimus: Ophthalmology-specific LLM

GitHub Repository

Python PyTorch Transformers LangChain Streamlit FastAPI

๐Ÿค— Models and Datasets  |  ๐Ÿ“• AAAI 2025 workshop Paper

Introduction

Ophtimus is an open-source large language model (LLM) specialized in ophthalmology, built with 8 billion parameters based on the LLaMA architecture. It was trained on carefully curated ophthalmology-specific data, including medical papers, textbooks, and research reports. Through filtering, summarization, and preprocessing, only the most relevant and high-quality information was retained.

Designed to be both lightweight and high-performing, Ophtimus is suitable for real-world applications such as clinical decision support, medical education, and patient communication. The model and its training pipeline are fully open-sourced, providing a practical reference for developing similar domain-specific LLMs in other areas of medicine.

GitHub Repository: github.com/jinkimh/Ophtimus-Ophthalmology-LLM

Ophtimus Overall Architecture

Dataset Details

Note: All datasets were either newly constructed or adapted for this project. Pre-training datasets were curated from open-source ophthalmology materials, while instruction-tuning and evaluation datasets were built by extracting only ophthalmology-relevant samples from broader medical corpora. All data underwent preprocessing steps including deduplication, language filtering (English only), and removal of any personally identifiable information (PII).

Dataset name Source Size Purpose Key Features
Ophthalmology-pubmed-corpus [Link] Ophthalmology paper 18.4M Tokens Pre-Training โ€ข Map-reduce method summary
โ€ข Broad ophthalmic keywords
Ophthalmology-textbook-corpus [Link] Ophthalmology textbook 4M Tokens Pre-Training โ€ข Trusted medical sources
โ€ข Rich in diagnostic cases
Ophthalmology MCQA Inst dataset [Link] Ophthalmology Docs 51.7k QAs Inst-Tuning โ€ข Diverse multiple-choice formats
โ€ข Reasoning included
โ€ข Variety of ophthalmic topics
Ophthalmology EQA Inst dataset [Link] Ophthalmology Docs 49.3k QAs Inst-Tuning โ€ข Variety of ophthalmic topics
Ophtimus-Eval-Dataset [Link] Medical platform data 2,153 QAs Evaluation โ€ข expert-verified data
โ€ข MCQA dataset
PubMedQA-ophthal-Dataset [Link] PubMedQA 297 QAs Evaluation โ€ข Ophthalmology domain filtered
โ€ข True/False MCQA dataset
MedMCQA-Ophthal-Dataset [Link] MedMCQA 6,932 QAs Evaluation โ€ข Ophthalmology domain filtered
โ€ข MCQA dataset
EQAEval-Dataset [Link] MedQuAD, Others 1,389 QAs Evaluation โ€ข Diverse open-source datasets
โ€ข Ophthalmology domain filtered
โ€ข Essay QA

Model Details

Note: The "pre-training" and "fine-tuning" columns in the table refer to the training performed in this project. The base models had already undergone pre-training and/or fine-tuning prior to this project, and we applied transfer learning using those models.

Model name Base model Parameters Pre-training Instruction-tuning
Ophtimus-Base [Link] Llama-3.1-8B 8B โœ… โŒ
Ophtimus-Llama-1B [Link] Llama-3.2-1B-Instruct 1B โŒ โœ…
Ophtimus-Llama-3B [Link] Llama-3.2-3B-Instruct 3B โŒ โœ…
Ophtimus-Llama-8B [Link] Llama-3.1-8B-Instruct 8B โŒ โœ…
Ophtimus-Instruct-8B [Link] Ophtimus-Base 8B โœ… โœ…

Performance

Note: Multi-Choice QA: Ophtimus-Eval, MedMCQA, PubMedQA | Essay QA: MedQuAD, Medical Flashcards, Medical Wikidoc
Ophtimus-Eval is a proprietary dataset collected from a medical platform. The others are established medical benchmark datasets, from which only ophthalmology-related QA pairs were extracted for evaluation.

Model Multi-Choice Question Essay Question
Ophtimus Eval MedMCQA (Ophth) PubmedQA (Ophth) RougeL BLEU METEOR SemScore
OpenAI GPT-4o 71.95% 81.95% 89.90% 0.193 0.082 0.341 0.761
Llama-3-8B-Instrct 48.60% 74.02% 63.97% 0.193 0.064 0.244 0.684
Llama-3.1-8B-Instrct 39.78% 57.96% 83.84% 0.177 0.054 0.215 0.641
Eye-Llama 32.56% 59.43% 66.11% 0.183 0.062 0.211 0.686
PMC-Llama-13B 48.28% 63.45% 72.48% 0.223 0.082 0.288 0.714
Ophtimus-Llama-1B 41.45% 45.74% 61.95% 0.219 0.076 0.217 0.711
Ophtimus-Llama-3B 52.70% 62.10% 69.36% 0.224 0.077 0.225 0.726
Ophtimus-Llama-8B 60.78% 68.25% 69.70% 0.226 0.083 0.230 0.733
Ophtimus-Instruct-8B 63.85% 71.51% 72.73% 0.222 0.079 0.224 0.735

Quickstart

Install Dependencies

cd Ophtimus-Ophthalmology-LLM
pip install -r requirements.txt

Ophtimus Inference

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

# model_name example : BaekSeungJu/Ophtimus-Instruct-8B or Ophtimus-Llama-1B or Ophtimus-Llama-3B or Ophtimus-Llama-8B
model_name = "BaekSeungJu/Ophtimus-Instruct-8B"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True,
).to("cuda")

tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left")
tokenizer.pad_token = tokenizer.eos_token

system_instruction = (
    "You are an expert ophthalmologist. Please provide accurate and "
    "medically sound answers to the user's ophthalmology-related question."
)

# Enter your questions in the list
questions = [
    "Please describe the symptoms and treatment of epiretinal membrane.",
    "What's good for eyes?"
]

prompts = []
for question in questions:
    row_json = [
        {"role": "system", "content": system_instruction},
        {"role": "user", "content": question}
    ]
    prompt = tokenizer.apply_chat_template(row_json, add_generation_prompt=True, tokenize=False)
    prompts.append(prompt)

input_ids = tokenizer(
    prompts,
    padding=True,
    return_tensors="pt",
)["input_ids"].to("cuda")

model.eval()
with torch.no_grad():
    outputs = model.generate(
        input_ids,
        max_new_tokens=1024,
        do_sample=False,
    )

decoded = tokenizer.batch_decode(outputs, skip_special_tokens=False)
for i, text in enumerate(decoded):
    print(f"------------------------\nAnswer for question {i+1}:\n{text}")

For more details, visit the GitHub repository.

Ophtimus-V2-TX

To be Updated

SD-OCT-based Epiretinal Membrane Diagnostic Assistant System

Python PyTorch OpenCV YOLO Pillow

Introduction

This project presents a low-cost and efficient method for detecting and quantifying Epiretinal Membranes (ERM) using Spectral-Domain Optical Coherence Tomography (SD-OCT). By applying deep learning techniquesโ€”specifically, YOLO object detectionโ€”we generate en face "ERM Projection Images" from B-scan data, enabling intuitive visualization and accurate measurement of ERM areas. The method also introduces a novel approach to quantify the association between ERM and retinal thickness, enhancing clinical decision-making. Our approach aims to bridge the diagnostic performance gap between SD-OCT and Swept-Source OCT (SS-OCT) while maintaining accessibility and reducing diagnostic burden.

ERM System Architecture

Overall pipeline architecture for ERM detection & quantification

YOLO Model Evaluation

We evaluated three YOLO-based models (v5, v8, v11) for ERM detection using SD-OCT B-scan images.
Each model was trained on two datasets (2,200 images for Full, 1,100 images for Half) and tested on 650 expert-labeled images.

Model Size Params (M) Precision Recall mAP@50 mAP@50:95 Dataset Size
YOLOv5S7.020.7520.7030.7220.423Full
0.6940.6420.6640.376Half
M20.870.7830.7340.7520.444Full
0.7230.6850.7010.396Half
L46.140.8130.7620.7840.463Full
0.7450.7040.7260.414Half
X86.220.8360.7840.8020.485Full
0.7630.7250.7430.437Half
YOLOv8S11.140.7810.7360.7640.447Full
0.7230.6760.7010.393Half
M25.860.8130.7620.7910.466Full
0.7480.7050.7240.412Half
L43.630.8440.7920.8230.482Full
0.7740.7310.7540.436Half
X68.150.8670.8140.8420.504Full
0.7930.7520.7720.454Half
YOLOv11S9.430.8040.7520.7830.468Full
0.7460.6920.7140.417Half
M20.050.8460.7940.8210.493Full
0.7740.7360.7570.443Half
L25.310.8730.8230.8540.524Full
0.8070.7730.7930.476Half
X56.870.9020.8570.8820.556Full
0.8360.8030.8260.507Half

GitHub repository: github.com/jinkimh/SD-OCT-ERM-Quantification

Gait Anomaly Detection

To be Updated