MedLA: A Multi-Agent Framework for Knowledge-Intensive Medical Reasoning
A scalable agent architecture for MedQA, PubMedQA, MMLU medical subsets, MedDdx, BioASQ-style tasks, and knowledge-grounded cases.
Core idea: independent specialized agents + LogicAgent critique + DecisionMakersAgent arbitration → robust final answers.
Key Features
- Configurable debates: control
--num_agentsand--num_rounds. -
Logic layer:
LogicAgentsynthesizes structured logic reports and flags inconsistencies. - Arbitration:
DecisionMakersAgentconsolidates candidates when consensus fails. - Baseline mode:
--baseline_bool 1for single-agent direct answering. -
Multi-dataset:
medqa / medxpert / mmlu / pubmedqa / bioasq / medddx. - Pluggable backends: OpenAI, DeepSeek, vLLM, ZhipuAI, SiliconFlow, etc.
- Experiment tracking: Weights & Biases metrics and diagnostics.
- Threaded execution: service configuration via environment variables.
Repository Layout
adam_baseline/ # classic single-agent baselines & analysis
adama_main_new_prompt/ # main multi-agent experimental pipeline
main.py # entry point (baseline + multi-agent)
util/ # agents / prompts / orchestration
sweep/ # YAML configs for parameter sweeps
data/ # datasets & knowledge stores
adama_main_publish/ # slimmed release (this project)
docs/ # project page (what you're viewing)
output/, logic_output/, logic_extrac/ # run artifacts & diagnostics
Installation
git clone https://github.com/alexander2618/MedLA.git MedLA
cd MedLA
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
# For full reproducibility:
pip install -r requirements_environment.txt
# Optional: local/service backends
export ADAMA_PORT=8000 # vLLM HTTP port
export ADAMA_NUM_THREADS=8 # override --num_threads
Data Preparation
- Place files under
data/<new_dataset>/. - Extend
--datasetchoices inget_args()withinmain.py. - Add loader logic in
util/base.py(parsing and normalization).
Usage
Multi-agent collaboration
python main.py \
--dataset medqa \
--model deepseek \
--llm_name deepseek-chat \
--num_agents 8 \
--num_rounds 3 \
--few_shots 1 \
--tag exp_medqa_v1
Single-agent baseline
python main.py --dataset medqa --model deepseek --llm_name deepseek-chat \
--baseline_bool 1 --tag baseline_v1
vLLM backend
export ADAMA_PORT=8000
python main.py --dataset mmlu --model vllm --llm_name llama3.3_4096 --tag vllm_test
Debug quick run
python main.py --dataset medqa --debug
Selected sweep configs: sweep/ · test_llama.yaml · test_r1_baseline.yaml
Main Arguments (main.py)
--dataset: dataset id--model: backend (deepseek/openai/vllm/siliconflow/zhipuai...)--llm_name: concrete model name--num_agents: number of agents--num_rounds: debate iterations--few_shots: number of in-context exemplars
--baseline_bool: 1=baseline, 0=multi-agent--num_samples: limit (-1=all)--num_threads: parallelism--temp: sampling temperature--tag: run label--debug: lightweight run
Env vars: ADAMA_PORT (vLLM port), ADAMA_NUM_THREADS (override threads).
Architecture Overview
- Instantiate LogicAgent and N MedAgent_Eliminate instances.
- Agents independently produce (answer, reasoning).
- LogicAgent evaluates consistency → structured
logic_report. - If disagreement persists, run elimination/refinement cycles.
- Remaining conflict → DecisionMakersAgent consolidates and selects.
- Metrics persisted and visualized via WandB.
logical critique & coherence
medical reasoning + self-elimination
arbitration & synthesis
data orchestration & pipeline
Results & Analysis
JSONL outputs and diagnostics live in output/, logic_output/, and
logic_extrac/. Use adam_baseline/analysis.py for accuracy curves, depth
distributions, and comparative evaluation.
WandB Metrics
- accuracy: cumulative accuracy
- talkTimes: debate rounds per item
- retry_count / retry_pro: backend/parse retries
- make_decision / make_decision_num: arbitration flags and proportion
- num_right / numn_total: correct / total counts
HTML Gallery
Browse all HTML examples in docs/html/html/ using a carousel.
Citation
@inproceedings{AdamaAAAI2026,
title={MedLA: ALogic-Driven Multi-Agent Framework for Complex Medical Reasoning with Large Language Models},
author={Siqi Ma, Jiajie Huang, Fan Zhang, Jinlin Wu, Yue Shen, Guohui Fan, Zhu Zhang, Zelin Zang},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2026},
note={Oral Presentation}
}
More
- GitHub repository
- Datasets under
data/; sweep configs in sweep/. - Contributions welcome via Issues and PRs.
Deploy on GitHub Pages
- Push the repository to GitHub.
- Open Settings → Pages.
- Set Source to branch
mainand folder/docs, then save. - Wait a few minutes and open the provided Pages URL.
This site is already placed under docs/ and is Pages-ready.