์๋ฃ ์์์์ ๋จ์ ๋ถ๋ฅ(Classification)๊ฐ ์๋๋ผ, ์ง๋ณ์ ์ ๋ํ(Quantification)ํ๊ณ , ๊ตฌ์กฐํ๋ ์์ ํํ(structured clinical representations)์ ์์ฑํ์ฌ LLM ๊ธฐ๋ฐ์ ์์ ์ถ๋ก ์ผ๋ก ์ฐ๊ฒฐ๋๋ "End-to-End Clinical Reasoning Pipeline"์ ๊ตฌ์ถํ๋ ๊ฒ์ด ์ ์ฒด์ ์ธ ์ฐ๊ตฌ ๋น์ ์ ๋๋ค.
์ด ๋น์ ์ ๋ค์๊ณผ ๊ฐ์ ํต์ฌ ๋ชฉํ๋ฅผ ๊ฐ์ง๋๋ค:
์ ํต์ ์ธ CNN ๊ธฐ๋ฐ ๋ถ๋ฅ๋ฅผ ๋์ด, ์ง๋ณ์ ์งํ ์ ๋๋ฅผ ์์นํํ๊ณ ์์์ ์ผ๋ก ํด์ ๊ฐ๋ฅํ continuous biomarker๋ฅผ ์์ฑํ๋ ์ฐ๊ตฌ ๋ผ์ธ์ ๋๋ค.
์์, ์ ๋ feature, EMR, ๊ฒ์ฌ ์์น(labs) ๋ฑ ํตํฉ.
์ต์ข
๋ชฉ์ : disease progression world model ๊ตฌ์ถ.
์ฌ์ฉ์๊ฐ ์ง์ ๊ฐ๋ฐํ Ophthalmology ํนํ LLM(Ophtimus-V2-Tx) ์ฐ๊ตฌ ๋ผ์ธ์ ๋๋ค.
์๋ฃ AI์ ์ ๋ขฐ์ฑ๊ณผ ๊ท์ ๋์(์๋ฃ๊ธฐ๊ธฐ ์ธํ๊ฐ ๋ฑ)์ ์ํด
์ ํ ๊ธฐ๋ฒ(Formal Methods) + AI Safety๋ฅผ ๊ฒฐํฉํ ๋
์์ ์ฐ๊ตฌ ๋ผ์ธ.
NeurIPS 2025์ ํต์ฌ ํธ๋ ๋("World Models", "Embodied AI for Healthcare")์ ์ง์ ์ ์ผ๋ก ์ ๋ ฌ๋๋ ์ฐ๊ตฌ ๋ฐฉํฅ.
์์ ๋ชจ๋ ์ถ(A~D)๋ฅผ ํตํฉํ์ฌ ์์ ์ถ๋ก ์๋ํ๋ผ๋ ๊ถ๊ทน์ ์ธ ์๋ฃ AI ๋ชฉ์ ์ ์ง์.
์ฌ์ฉ์์ Medical AI ์ฐ๊ตฌ๋ ๋จ์ํ ์ด๋ฏธ์ง ๋ถ๋ฅ๋ฅผ ๋์ด์ ๋ค์์ ํตํฉ์ ์ฐ๊ตฌ ์ํ๊ณ๋ฅผ ๊ตฌ์ถํ๋ ๊ฒ์ ์ด์ ์ ๋๋ค.
๐ค Models and Datasets | ๐ AAAI 2025 workshop Paper
Ophtimus is an open-source large language model (LLM) specialized in ophthalmology, built with 8 billion parameters based on the LLaMA architecture. It was trained on carefully curated ophthalmology-specific data, including medical papers, textbooks, and research reports. Through filtering, summarization, and preprocessing, only the most relevant and high-quality information was retained.
Designed to be both lightweight and high-performing, Ophtimus is suitable for real-world applications such as clinical decision support, medical education, and patient communication. The model and its training pipeline are fully open-sourced, providing a practical reference for developing similar domain-specific LLMs in other areas of medicine.
GitHub Repository: github.com/jinkimh/Ophtimus-Ophthalmology-LLM
Note: All datasets were either newly constructed or adapted for this project. Pre-training datasets were curated from open-source ophthalmology materials, while instruction-tuning and evaluation datasets were built by extracting only ophthalmology-relevant samples from broader medical corpora. All data underwent preprocessing steps including deduplication, language filtering (English only), and removal of any personally identifiable information (PII).
| Dataset name | Source | Size | Purpose | Key Features |
|---|---|---|---|---|
| Ophthalmology-pubmed-corpus [Link] | Ophthalmology paper | 18.4M Tokens | Pre-Training |
โข Map-reduce method summary โข Broad ophthalmic keywords |
| Ophthalmology-textbook-corpus [Link] | Ophthalmology textbook | 4M Tokens | Pre-Training |
โข Trusted medical sources โข Rich in diagnostic cases |
| Ophthalmology MCQA Inst dataset [Link] | Ophthalmology Docs | 51.7k QAs | Inst-Tuning |
โข Diverse multiple-choice formats โข Reasoning included โข Variety of ophthalmic topics |
| Ophthalmology EQA Inst dataset [Link] | Ophthalmology Docs | 49.3k QAs | Inst-Tuning | โข Variety of ophthalmic topics |
| Ophtimus-Eval-Dataset [Link] | Medical platform data | 2,153 QAs | Evaluation |
โข expert-verified data โข MCQA dataset |
| PubMedQA-ophthal-Dataset [Link] | PubMedQA | 297 QAs | Evaluation |
โข Ophthalmology domain filtered โข True/False MCQA dataset |
| MedMCQA-Ophthal-Dataset [Link] | MedMCQA | 6,932 QAs | Evaluation |
โข Ophthalmology domain filtered โข MCQA dataset |
| EQAEval-Dataset [Link] | MedQuAD, Others | 1,389 QAs | Evaluation |
โข Diverse open-source datasets โข Ophthalmology domain filtered โข Essay QA |
Note: The "pre-training" and "fine-tuning" columns in the table refer to the training performed in this project. The base models had already undergone pre-training and/or fine-tuning prior to this project, and we applied transfer learning using those models.
| Model name | Base model | Parameters | Pre-training | Instruction-tuning |
|---|---|---|---|---|
| Ophtimus-Base [Link] | Llama-3.1-8B | 8B | โ | โ |
| Ophtimus-Llama-1B [Link] | Llama-3.2-1B-Instruct | 1B | โ | โ |
| Ophtimus-Llama-3B [Link] | Llama-3.2-3B-Instruct | 3B | โ | โ |
| Ophtimus-Llama-8B [Link] | Llama-3.1-8B-Instruct | 8B | โ | โ |
| Ophtimus-Instruct-8B [Link] | Ophtimus-Base | 8B | โ | โ |
Note: Multi-Choice QA: Ophtimus-Eval, MedMCQA, PubMedQA | Essay QA: MedQuAD, Medical Flashcards, Medical Wikidoc
Ophtimus-Eval is a proprietary dataset collected from a medical platform. The others are established medical benchmark datasets, from which only ophthalmology-related QA pairs were extracted for evaluation.
| Model | Multi-Choice Question | Essay Question | |||||
|---|---|---|---|---|---|---|---|
| Ophtimus Eval | MedMCQA (Ophth) | PubmedQA (Ophth) | RougeL | BLEU | METEOR | SemScore | |
| OpenAI GPT-4o | 71.95% | 81.95% | 89.90% | 0.193 | 0.082 | 0.341 | 0.761 |
| Llama-3-8B-Instrct | 48.60% | 74.02% | 63.97% | 0.193 | 0.064 | 0.244 | 0.684 |
| Llama-3.1-8B-Instrct | 39.78% | 57.96% | 83.84% | 0.177 | 0.054 | 0.215 | 0.641 |
| Eye-Llama | 32.56% | 59.43% | 66.11% | 0.183 | 0.062 | 0.211 | 0.686 |
| PMC-Llama-13B | 48.28% | 63.45% | 72.48% | 0.223 | 0.082 | 0.288 | 0.714 |
| Ophtimus-Llama-1B | 41.45% | 45.74% | 61.95% | 0.219 | 0.076 | 0.217 | 0.711 |
| Ophtimus-Llama-3B | 52.70% | 62.10% | 69.36% | 0.224 | 0.077 | 0.225 | 0.726 |
| Ophtimus-Llama-8B | 60.78% | 68.25% | 69.70% | 0.226 | 0.083 | 0.230 | 0.733 |
| Ophtimus-Instruct-8B | 63.85% | 71.51% | 72.73% | 0.222 | 0.079 | 0.224 | 0.735 |
cd Ophtimus-Ophthalmology-LLM
pip install -r requirements.txt
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
# model_name example : BaekSeungJu/Ophtimus-Instruct-8B or Ophtimus-Llama-1B or Ophtimus-Llama-3B or Ophtimus-Llama-8B
model_name = "BaekSeungJu/Ophtimus-Instruct-8B"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
).to("cuda")
tokenizer = AutoTokenizer.from_pretrained(model_name, padding_side="left")
tokenizer.pad_token = tokenizer.eos_token
system_instruction = (
"You are an expert ophthalmologist. Please provide accurate and "
"medically sound answers to the user's ophthalmology-related question."
)
# Enter your questions in the list
questions = [
"Please describe the symptoms and treatment of epiretinal membrane.",
"What's good for eyes?"
]
prompts = []
for question in questions:
row_json = [
{"role": "system", "content": system_instruction},
{"role": "user", "content": question}
]
prompt = tokenizer.apply_chat_template(row_json, add_generation_prompt=True, tokenize=False)
prompts.append(prompt)
input_ids = tokenizer(
prompts,
padding=True,
return_tensors="pt",
)["input_ids"].to("cuda")
model.eval()
with torch.no_grad():
outputs = model.generate(
input_ids,
max_new_tokens=1024,
do_sample=False,
)
decoded = tokenizer.batch_decode(outputs, skip_special_tokens=False)
for i, text in enumerate(decoded):
print(f"------------------------\nAnswer for question {i+1}:\n{text}")
For more details, visit the GitHub repository.
To be Updated
This project presents a low-cost and efficient method for detecting and quantifying Epiretinal Membranes (ERM) using Spectral-Domain Optical Coherence Tomography (SD-OCT). By applying deep learning techniquesโspecifically, YOLO object detectionโwe generate en face "ERM Projection Images" from B-scan data, enabling intuitive visualization and accurate measurement of ERM areas. The method also introduces a novel approach to quantify the association between ERM and retinal thickness, enhancing clinical decision-making. Our approach aims to bridge the diagnostic performance gap between SD-OCT and Swept-Source OCT (SS-OCT) while maintaining accessibility and reducing diagnostic burden.
Overall pipeline architecture for ERM detection & quantification
We evaluated three YOLO-based models (v5, v8, v11) for ERM detection using SD-OCT B-scan images.
Each model was trained on two datasets (2,200 images for Full, 1,100 images for Half) and tested on 650 expert-labeled images.
| Model | Size | Params (M) | Precision | Recall | mAP@50 | mAP@50:95 | Dataset Size |
|---|---|---|---|---|---|---|---|
| YOLOv5 | S | 7.02 | 0.752 | 0.703 | 0.722 | 0.423 | Full |
| 0.694 | 0.642 | 0.664 | 0.376 | Half | |||
| M | 20.87 | 0.783 | 0.734 | 0.752 | 0.444 | Full | |
| 0.723 | 0.685 | 0.701 | 0.396 | Half | |||
| L | 46.14 | 0.813 | 0.762 | 0.784 | 0.463 | Full | |
| 0.745 | 0.704 | 0.726 | 0.414 | Half | |||
| X | 86.22 | 0.836 | 0.784 | 0.802 | 0.485 | Full | |
| 0.763 | 0.725 | 0.743 | 0.437 | Half | |||
| YOLOv8 | S | 11.14 | 0.781 | 0.736 | 0.764 | 0.447 | Full |
| 0.723 | 0.676 | 0.701 | 0.393 | Half | |||
| M | 25.86 | 0.813 | 0.762 | 0.791 | 0.466 | Full | |
| 0.748 | 0.705 | 0.724 | 0.412 | Half | |||
| L | 43.63 | 0.844 | 0.792 | 0.823 | 0.482 | Full | |
| 0.774 | 0.731 | 0.754 | 0.436 | Half | |||
| X | 68.15 | 0.867 | 0.814 | 0.842 | 0.504 | Full | |
| 0.793 | 0.752 | 0.772 | 0.454 | Half | |||
| YOLOv11 | S | 9.43 | 0.804 | 0.752 | 0.783 | 0.468 | Full |
| 0.746 | 0.692 | 0.714 | 0.417 | Half | |||
| M | 20.05 | 0.846 | 0.794 | 0.821 | 0.493 | Full | |
| 0.774 | 0.736 | 0.757 | 0.443 | Half | |||
| L | 25.31 | 0.873 | 0.823 | 0.854 | 0.524 | Full | |
| 0.807 | 0.773 | 0.793 | 0.476 | Half | |||
| X | 56.87 | 0.902 | 0.857 | 0.882 | 0.556 | Full | |
| 0.836 | 0.803 | 0.826 | 0.507 | Half |
GitHub repository: github.com/jinkimh/SD-OCT-ERM-Quantification
To be Updated