MedLA: A Multi-Agent Framework for Knowledge-Intensive Medical Reasoning

A scalable agent architecture for MedQA, PubMedQA, MMLU medical subsets, MedDdx, BioASQ-style tasks, and knowledge-grounded cases.

Core idea: independent specialized agents + LogicAgent critique + DecisionMakersAgent arbitration → robust final answers.

中文页面 (Chinese)

Key Features

  • Configurable debates: control --num_agents and --num_rounds.
  • Logic layer: LogicAgent synthesizes structured logic reports and flags inconsistencies.
  • Arbitration: DecisionMakersAgent consolidates candidates when consensus fails.
  • Baseline mode: --baseline_bool 1 for single-agent direct answering.
  • Multi-dataset: medqa / medxpert / mmlu / pubmedqa / bioasq / medddx.
  • Pluggable backends: OpenAI, DeepSeek, vLLM, ZhipuAI, SiliconFlow, etc.
  • Experiment tracking: Weights & Biases metrics and diagnostics.
  • Threaded execution: service configuration via environment variables.

Repository Layout

adam_baseline/              # classic single-agent baselines & analysis
adama_main_new_prompt/     # main multi-agent experimental pipeline
  main.py                  # entry point (baseline + multi-agent)
  util/                    # agents / prompts / orchestration
  sweep/                   # YAML configs for parameter sweeps
  data/                    # datasets & knowledge stores
adama_main_publish/        # slimmed release (this project)
docs/                      # project page (what you're viewing)
output/, logic_output/, logic_extrac/  # run artifacts & diagnostics
          

Installation

git clone https://github.com/alexander2618/MedLA.git MedLA
cd MedLA
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# For full reproducibility:
pip install -r requirements_environment.txt

# Optional: local/service backends
export ADAMA_PORT=8000        # vLLM HTTP port
export ADAMA_NUM_THREADS=8    # override --num_threads
          

Data Preparation

  1. Place files under data/<new_dataset>/.
  2. Extend --dataset choices in get_args() within main.py.
  3. Add loader logic in util/base.py (parsing and normalization).

Usage

Multi-agent collaboration

python main.py \
  --dataset medqa \
  --model deepseek \
  --llm_name deepseek-chat \
  --num_agents 8 \
  --num_rounds 3 \
  --few_shots 1 \
  --tag exp_medqa_v1
          

Single-agent baseline

python main.py --dataset medqa --model deepseek --llm_name deepseek-chat \
  --baseline_bool 1 --tag baseline_v1
          

vLLM backend

export ADAMA_PORT=8000
python main.py --dataset mmlu --model vllm --llm_name llama3.3_4096 --tag vllm_test
          

Debug quick run

python main.py --dataset medqa --debug
          

Selected sweep configs: sweep/ · test_llama.yaml · test_r1_baseline.yaml

Main Arguments (main.py)

  • --dataset: dataset id
  • --model: backend (deepseek/openai/vllm/siliconflow/zhipuai...)
  • --llm_name: concrete model name
  • --num_agents: number of agents
  • --num_rounds: debate iterations
  • --few_shots: number of in-context exemplars
  • --baseline_bool: 1=baseline, 0=multi-agent
  • --num_samples: limit (-1=all)
  • --num_threads: parallelism
  • --temp: sampling temperature
  • --tag: run label
  • --debug: lightweight run

Env vars: ADAMA_PORT (vLLM port), ADAMA_NUM_THREADS (override threads).

Architecture Overview

  1. Instantiate LogicAgent and N MedAgent_Eliminate instances.
  2. Agents independently produce (answer, reasoning).
  3. LogicAgent evaluates consistency → structured logic_report.
  4. If disagreement persists, run elimination/refinement cycles.
  5. Remaining conflict → DecisionMakersAgent consolidates and selects.
  6. Metrics persisted and visualized via WandB.
LogicAgent
logical critique & coherence
MedAgent_Eliminate
medical reasoning + self-elimination
DecisionMakersAgent
arbitration & synthesis
base (util/base.py)
data orchestration & pipeline
MedLA Architecture Diagram

Results & Analysis

JSONL outputs and diagnostics live in output/, logic_output/, and logic_extrac/. Use adam_baseline/analysis.py for accuracy curves, depth distributions, and comparative evaluation.

WandB Metrics

  • accuracy: cumulative accuracy
  • talkTimes: debate rounds per item
  • retry_count / retry_pro: backend/parse retries
  • make_decision / make_decision_num: arbitration flags and proportion
  • num_right / numn_total: correct / total counts

Citation

@inproceedings{AdamaAAAI2026,
  title={MedLA: ALogic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language Models},
  author={Siqi Ma, Jiajie Huang, Fan Zhang, Jinlin Wu, Yue Shen, Guohui Fan, Zhu Zhang, Zelin Zang},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2026},
  note={Oral Presentation}
}
          

Deploy on GitHub Pages

  1. Push the repository to GitHub.
  2. Open Settings → Pages.
  3. Set Source to branch main and folder /docs, then save.
  4. Wait a few minutes and open the provided Pages URL.

This site is already placed under docs/ and is Pages-ready.