Rogue Open-Sourced: End-to-End Framework for Testing and Auditing Agentic AI

What Rogue is and why it matters

Agentic systems behave differently from deterministic software: they are stochastic, context-dependent, and constrained by policies. Traditional QA—unit tests, static prompts, or single-score LLM judgments—miss multi-turn vulnerabilities and often leave poor audit trails. Development teams need protocol-accurate conversations, explicit policy checks, and machine-readable evidence to confidently gate releases.

Qualifire AI has open-sourced Rogue, a Python framework built to evaluate AI agents over the Agent-to-Agent (A2A) protocol. Rogue transforms business policies into executable scenarios, runs multi-turn interactions against a target agent, and produces deterministic reports suitable for CI/CD pipelines and compliance reviews.

Quick start and prerequisites

Before you run Rogue, make sure you have:

Use the uvx automated installer to get up and running quickly:

# TUI
uvx rogue-ai
# Web UI
uvx rogue-ai ui
# CLI / CI/CD
uvx rogue-ai cli

Manual installation

(a) Clone the repository:

git clone https://github.com/qualifire-dev/rogue.git
cd rogue

(b) Install dependencies:

If you are using uv:

uv sync

Or, if you are using pip:

pip install -e .

(c) OPTIONAL: Set up your environment variables. Create a .env file in the project root and add your API keys. Rogue uses LiteLLM and can accept keys for multiple providers:

OPENAI_API_KEY="sk-..."
ANTHROPIC_API_KEY="sk-..."
GOOGLE_API_KEY="..."

Running Rogue

Rogue uses a client-server architecture: the core evaluation logic runs on a backend server while multiple clients can connect to it (TUI, Web UI, CLI).

Running the default uvx command without a mode will start the server in the background and launch the TUI client:

uvx rogue-ai

Available modes for different use cases:

Example command signatures:

uvx rogue-ai server [OPTIONS]

Options for server mode typically include host/port and debug flags:

uvx rogue-ai tui [OPTIONS]
uvx rogue-ai ui [OPTIONS]

Common UI options include --rogue-server-url, --port, --workdir, and --debug.

Example: testing the T-Shirt store agent

The repository includes a simple example agent (a T-shirt store) that you can use to see Rogue in action.

Install example dependencies:

If you are using uv:

uv sync --group examples

or, if you are using pip:

pip install -e .[examples]

(a) Start the example agent server in a separate terminal:

If you are using uv:

uv run examples/tshirt_store_agent

If not:

python examples/tshirt_store_agent

This will start the agent on http://localhost:10001.

(b) Configure Rogue in the UI to point to the example agent. Example settings:

(c) Run the evaluation and watch Rogue test the T-Shirt agent’s policies. You can use either the TUI (uvx rogue-ai) or the Web UI (uvx rogue-ai ui).

Where Rogue fits in your workflow

Rogue provides practical testing for multiple domains:

How Rogue works

Rogue synthesizes business context and risk into structured tests with clear objectives, tactics, and success criteria. The EvaluatorAgent runs protocol-correct conversations in single-turn or deep multi-turn adversarial modes. Use your own model or let Rogue employ Qualifire’s SLM judges to drive tests. Rogue produces streaming observability and deterministic artifacts: live transcripts, pass/fail verdicts, rationales tied to transcript spans, timing information, and model/version lineage metadata.

Architecture and interfaces

This separation enables flexible deployments where the server runs independently and multiple clients connect concurrently.

Practical outcome

Rogue lets developer teams test agent behavior as it runs in production. It turns written policies into concrete scenarios, exercises those scenarios over A2A, and records auditable transcripts. The output provides a repeatable signal you can use in CI/CD to catch policy breaks and regressions before shipping.

Where to find it

Rogue is available on GitHub under the Qualifire organization. Thanks to the Qualifire team for their leadership and resources supporting this project.