MedLA: A Multi-Agent Framework for Knowledge-Intensive Medical Reasoning

A scalable agent architecture for MedQA, PubMedQA, MMLU medical subsets, MedDdx, BioASQ-style tasks, and knowledge-grounded cases.

GitHub Quickstart

Core idea: independent specialized agents + LogicAgent critique + DecisionMakersAgent arbitration → robust final answers.

中文页面 (Chinese)

Key Features

Configurable debates: control --num_agents and --num_rounds.
Logic layer: LogicAgent synthesizes structured logic reports and flags inconsistencies.
Arbitration: DecisionMakersAgent consolidates candidates when consensus fails.
Baseline mode: --baseline_bool 1 for single-agent direct answering.
Multi-dataset: medqa / medxpert / mmlu / pubmedqa / bioasq / medddx.
Pluggable backends: OpenAI, DeepSeek, vLLM, ZhipuAI, SiliconFlow, etc.
Experiment tracking: Weights & Biases metrics and diagnostics.
Threaded execution: service configuration via environment variables.

Repository Layout

adam_baseline/              # classic single-agent baselines & analysis
adama_main_new_prompt/     # main multi-agent experimental pipeline
  main.py                  # entry point (baseline + multi-agent)
  util/                    # agents / prompts / orchestration
  sweep/                   # YAML configs for parameter sweeps
  data/                    # datasets & knowledge stores
adama_main_publish/        # slimmed release (this project)
docs/                      # project page (what you're viewing)
output/, logic_output/, logic_extrac/  # run artifacts & diagnostics

Installation

git clone https://github.com/alexander2618/MedLA.git MedLA
cd MedLA
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# For full reproducibility:
pip install -r requirements_environment.txt

# Optional: local/service backends
export ADAMA_PORT=8000        # vLLM HTTP port
export ADAMA_NUM_THREADS=8    # override --num_threads

Data Preparation

Place files under data/<new_dataset>/.
Extend --dataset choices in get_args() within main.py.
Add loader logic in util/base.py (parsing and normalization).

Usage

Multi-agent collaboration

python main.py \
  --dataset medqa \
  --model deepseek \
  --llm_name deepseek-chat \
  --num_agents 8 \
  --num_rounds 3 \
  --few_shots 1 \
  --tag exp_medqa_v1

Single-agent baseline

python main.py --dataset medqa --model deepseek --llm_name deepseek-chat \
  --baseline_bool 1 --tag baseline_v1

vLLM backend

export ADAMA_PORT=8000
python main.py --dataset mmlu --model vllm --llm_name llama3.3_4096 --tag vllm_test

Debug quick run

python main.py --dataset medqa --debug

Selected sweep configs: sweep/ · test_llama.yaml · test_r1_baseline.yaml

Main Arguments (main.py)

--dataset: dataset id
--model: backend (deepseek/openai/vllm/siliconflow/zhipuai...)
--llm_name: concrete model name
--num_agents: number of agents
--num_rounds: debate iterations
--few_shots: number of in-context exemplars

--baseline_bool: 1=baseline, 0=multi-agent
--num_samples: limit (-1=all)
--num_threads: parallelism
--temp: sampling temperature
--tag: run label
--debug: lightweight run

Env vars: ADAMA_PORT (vLLM port), ADAMA_NUM_THREADS (override threads).

Architecture Overview

Instantiate LogicAgent and N MedAgent_Eliminate instances.
Agents independently produce (answer, reasoning).
LogicAgent evaluates consistency → structured logic_report.
If disagreement persists, run elimination/refinement cycles.
Remaining conflict → DecisionMakersAgent consolidates and selects.
Metrics persisted and visualized via WandB.

LogicAgent
logical critique & coherence

MedAgent_Eliminate
medical reasoning + self-elimination

DecisionMakersAgent
arbitration & synthesis

base (util/base.py)
data orchestration & pipeline

Results & Analysis

JSONL outputs and diagnostics live in output/, logic_output/, and logic_extrac/. Use adam_baseline/analysis.py for accuracy curves, depth distributions, and comparative evaluation.

WandB Metrics

accuracy: cumulative accuracy
talkTimes: debate rounds per item
retry_count / retry_pro: backend/parse retries
make_decision / make_decision_num: arbitration flags and proportion
num_right / numn_total: correct / total counts

HTML Gallery

Browse all HTML examples in docs/html/html/ using a carousel.

Open in new tab

Citation

@inproceedings{AdamaAAAI2026,
  title={MedLA: ALogic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language Models},
  author={Siqi Ma, Jiajie Huang, Fan Zhang, Jinlin Wu, Yue Shen, Guohui Fan, Zhu Zhang, Zelin Zang},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2026},
  note={Oral Presentation}
}

GitHub repository
Datasets under data/; sweep configs in sweep/.
Contributions welcome via Issues and PRs.

Deploy on GitHub Pages

Push the repository to GitHub.
Open Settings → Pages.
Set Source to branch main and folder /docs, then save.
Wait a few minutes and open the provided Pages URL.

This site is already placed under docs/ and is Pages-ready.