CHIL 2026  ยท  Conference on Health, Inference, and Learning

H-AdminSim

A Multi-Agent Simulator for Realistic Hospital Administrative Workflows with FHIR Integration

Jun-Min Lee1    Meong Hi Son2    Edward Choi1
1KAIST 2Samsung Medical Center
Data Synthesis
Care level-specific patient profile generation with 194 disease-symptom pairs
Primary Care Secondary Care Tertiary Care
Multi-Agent Simulation
LLM-driven admin staff & outpatient agents with FHIR R5 integration
Patient Intake Appointment Scheduling FHIR Integration
Evaluation
Rubric-based quantitative assessment across 5 diverse LLM backends
GPT-5 Gemini 2.5 Open-Source

H-AdminSim generates realistic hospital administrative workflows by synthesizing patient data across care levels, simulating multi-agent interactions with FHIR integration, and evaluating LLM performance with detailed rubrics.

Summary

Hospital administrative operations represent one of the most demanding components of modern healthcare infrastructure. Large hospitals must process over 10,000 administrative requests daily — spanning patient intake, appointment scheduling, insurance verification, and inter-departmental coordination — yet existing benchmarks focus narrowly on patient-physician dialogues or isolated subtasks, leaving administrative complexity largely unstudied.

We present H-AdminSim, a multi-agent simulation framework for modeling realistic hospital administrative workflows at scale. H-AdminSim synthesizes care level-specific patient profiles across primary, secondary, and tertiary hospitals using 194 disease-symptom pairs spanning 9 internal medicine departments, then simulates multi-turn interactions between LLM-driven administrative staff agents and first-visit outpatient agents. Optional FHIR R5 integration enables standardized interoperability with real hospital information systems. We evaluate 5 diverse LLM backends using detailed rubric-based scoring, establishing H-AdminSim as a principled testbed for systematic evaluation of LLM-driven administrative automation.

Key Contributions

End-to-End First-Visit Simulation

A complete pipeline covering the full first-visit outpatient journey — from patient data synthesis, multi-agent administrative workflow simulation to quantitative evaluation — the first framework to model end-to-end first-visit hospital administrative operations at scale.

Realistic Data Generation

Care level-specific patient profile synthesis using 194 disease-symptom pairs across 9 internal medicine departments and 3 hospital care levels, with configurable patient demographics and preference types.

FHIR Integration & Model Flexibility

Optional FHIR R5-compatible output for deployment in real hospital information systems, with multi-backend LLM support (GPT, Gemini, open-source via vLLM) and concurrent simulation capabilities.

User-Defined Custom Data

Beyond the built-in 194 disease-symptom pairs, users can supply their own patient data and disease profiles, enabling simulation in custom hospital settings and specialty domains outside the default 9 internal medicine departments.

Flexible Simulation Conditions

Users can configure simulation-level parameters such as the simulation period, number of patients, and scheduling constraints, enabling controlled experiments and stress-testing under diverse operational conditions.

Method

1

Care Level-Specific Data Synthesis

H-AdminSim generates realistic patient profiles tailored to each care level of the healthcare institution. A hierarchical synthesis pipeline constructs a virtual hospital setting — including time system, departments, physicians, and patient profiles — using 194 disease-symptom pairs spanning 9 internal medicine departments.

Care Level: Primary Care Secondary Care Tertiary Care
Supported Specialties: Gastroenterology Cardiology Pulmonology Endocrinology / Metabolism Nephrology Hematology / Oncology Allergy Infectious Disease Rheumatology

In addition to the built-in default dataset, user-defined disease-symptom pairs and departments data are also supported, enabling simulation in custom hospital settings or specialty domains.

2

Administrative Workflow Simulation

The framework simulates two core administrative tasks — patient intake and appointment scheduling — between LLM-driven staff agents and first-visit outpatient agents through dialogue-based interactions. A virtual time flow is implemented to support time-dependent tasks such as scheduling, enabling realistic simulation of temporal workflows.

TaskDetails
 Patient Intake Via multi-turn dialogue, the staff agent collects patient demographics, chief complaints, and prior diagnosis history, then recommends the most appropriate treatment department and structures the collected information into a patient record.
 Appointment Scheduling The staff agent schedules appointments according to the patient's preference (ASAP, Physician Preference, or Date Preference). Rescheduling with waiting list management and cancellation are also simulated. Scheduling task can be performed via tool-calling or pure LLM reasoning.
 HIS Upload via FHIR Uses Practitioner, PractitionerRole, Schedule, Slot, Patient, and Appointment FHIR R5 resources. Patient and Appointment records are created on task completion and updated on cancellation or rescheduling. FHIR integration is fully optional — simulations can run with or without it.
3

LLM Backend Integration

H-AdminSim is model-agnostic and supports multiple LLM backends, enabling systematic cross-model benchmarking. Different models can be assigned per agent role, with support for GPT and Gemini series models as well as open-source models via vLLM.

Evaluated: GPT-5 Mini GPT-5 Nano Gemini 2.5 Flash Llama 3.3 70B Qwen 3 32B
4

Rubric-Based Evaluation Framework

Each simulation run is evaluated using detailed rubrics that assess LLM performance across key administrative dimensions, enabling reproducible and standardized comparison across models and hospital care levels.

RQ1
Intake Quality

Evaluates the accuracy of the staff agent in department assignment, patient information extraction, and data structuring, as well as the fidelity of the patient agent in faithfully simulating the assigned patient profile.

RQ2
Scheduling Accuracy

Correctness of scheduled appointments relative to patient preference type and available FHIR Slot resources, evaluated separately for tool-calling and pure LLM reasoning-based strategies.

RQ3
Interaction Quality

Multi-turn dialogue assessed by human evaluators across four criteria: naturalness, response appropriateness, dialogue flow, and overall quality (avg. 4.11 / 5).

5

Ongoing Development

H-AdminSim is under active development. Future releases aim to extend the simulation scope within the MEdWorld platform, including a dedicated patient simulator with configurable personas and support for follow-up visit workflows beyond the current first-visit scope.

Framework Overview

9
Internal Medicine Departments
194 disease-symptom pairs for realistic patient intake simulation
3
Care Levels Evaluated
Primary (516), secondary (769), and tertiary (5,052) patient profiles
5
LLMs Evaluated
GPT-5 Mini/Nano, Gemini 2.5 Flash, Llama 3.3 70B, Qwen 3 32B
v1.2.2
Current PyPI Release
Actively maintained
Top results: Gemini 2.5 Flash achieves 88.9% on primary care patient intake. Tool-based scheduling reaches near-perfect accuracy (~99.8%). Human evaluators rate interaction quality at 4.11 / 5 — demonstrating the framework's ability to drive rigorous LLM benchmarking for real-world hospital administrative tasks.

Installation

H-AdminSim is available as a Python package on PyPI. Install it with pip to start simulating hospital administrative workflows immediately.

PropertyValue
Packageh-adminsim
Version1.2.2
Python≥3.11, <3.13
LicenseApache 2.0
AuthorJun-Min Lee

Install via pip

Install the package and all dependencies with a single command.

pip install h-adminsim

Live Demo

MEdWorld — Hospital Administration Simulator

Try H-AdminSim interactively. Configure staff and patient counts, provide an OpenAI API key, and watch the multi-agent administrative simulation unfold in real time.

Launch Demo

Citation

@inproceedings{lee2026hadminsimmultiagentsimulatorrealistic,
  title         = {H-AdminSim: A Multi-Agent Simulator for Realistic Hospital Administrative Workflows with FHIR Integration}, 
  author        = {Jun-Min Lee and Meong Hi Son and Edward Choi},
  booktitle     = {Proceedings of the Conference on Health, Inference, and Learning (CHIL)},
  year          = {2026},
  eprint        = {2602.05407},
  archivePrefix = {arXiv},
  primaryClass  = {cs.AI},
  url           = {https://arxiv.org/abs/2602.05407}, 
}