Skip to Content
ConceptsTest Runner

Test Runner

The Lamdis test runner (lamdis-runs) is an open-source engine for testing AI assistants and agents. It executes test suites against your chatbots, copilots, RAG systems, or workflow agents.

Overview

The test runner supports:

  • Multi-turn conversations — Exercise complex, multi-step dialogues with your assistant
  • LLM-based judging — Use semantic evaluation to check assistant responses
  • HTTP request steps — Create or verify data via API calls during tests
  • Variable interpolation — Pass data between steps dynamically
  • Multiple execution channels — HTTP chat, OpenAI Chat, or AWS Bedrock

Architecture

┌─────────────────────────────────────────────────────────┐ │ Test Suite │ │ ┌─────────────────────────────────────────────────┐ │ │ │ Tests (messages, steps, assertions) │ │ │ └─────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────┐ │ │ │ Test Runner Engine │ │ │ │ - Step execution │ │ │ │ - Variable bag management │ │ │ │ - Transcript tracking │ │ │ └─────────────────────────────────────────────────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ Target Assistant │ │ LLM Judge │ │ │ │ (Your AI) │ │ (OpenAI/Bedrock) │ │ │ └──────────────────┘ └──────────────────┘ │ └─────────────────────────────────────────────────────────┘

Storage Providers

The test runner supports three storage modes:

ModeDB_PROVIDERRequired Env VarBest For
Locallocal(none)CLI runs, CI pipelines
MongoDBmongoMONGO_URLPersistent storage
PostgreSQLpostgresDATABASE_URLEnterprise setups

Auto-Detection

When DB_PROVIDER is not set, the runner automatically detects the storage mode:

  • DATABASE_URL starting with postgres → PostgreSQL
  • MONGO_URL set → MongoDB
  • Otherwise → Local (in-memory/JSON)

Execution Channels

The test runner can communicate with your assistant via:

HTTP Chat (http_chat)

Standard HTTP endpoint. The runner POSTs to your assistant’s /chat endpoint:

{ "message": "User's message", "transcript": [...previous messages...], "persona": "Optional persona text" }

Expected response:

{ "reply": "Assistant's response" }

OpenAI Chat (openai_chat)

Direct OpenAI Chat API integration. Requires OPENAI_API_KEY.

Bedrock Chat (bedrock_chat)

AWS Bedrock integration. Requires AWS credentials and region.

Judge Providers

The LLM judge evaluates assistant responses against your rubrics:

ProviderEnvironment VariableModel Variable
OpenAI (default)OPENAI_API_KEYOPENAI_MODEL
AWS BedrockJUDGE_PROVIDER=bedrockBEDROCK_JUDGE_MODEL_ID

Using Different Models

You can use a faster model for chat simulation and a stronger model for judging:

export JUDGE_PROVIDER=bedrock export BEDROCK_CHAT_MODEL_ID=anthropic.claude-3-haiku-20240307-v1:0 export BEDROCK_JUDGE_MODEL_ID=anthropic.claude-3-sonnet-20240229-v1:0

File Organization

When using JSON-based test definitions:

configs/ ├── auth/ # Authentication configurations ├── requests/ # Reusable HTTP request definitions ├── personas/ # End-user persona definitions ├── assistants/ # Assistant endpoint configurations ├── tests/ # Individual test files └── suites/ # Test suite groupings

Running Tests

Via CLI (Local Development)

# Run a single test file npm run run-file -- tests/my-tests.json # Run a suite npm run run-file -- suites/my-suite.json

Via API (CI/CD)

curl -X POST "$LAMDIS_RUNS_URL/internal/runs/start" \ -H "x-api-token: $LAMDIS_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "mode": "json", "suites": ["legal-tests"], "webhookUrl": "https://ci.example.com/webhook" }'

Environment Variables

Required

VariableDescription
LAMDIS_API_TOKENToken to protect /internal endpoints
OPENAI_API_KEYOpenAI API key (when using OpenAI judge)

Optional

VariableDescriptionDefault
PORTHTTP port3101
DB_PROVIDERStorage modeauto-detect
JUDGE_PROVIDERJudge provideropenai
OPENAI_MODELOpenAI model for judginggpt-4o-mini
BEDROCK_MODEL_IDBedrock modelanthropic.claude-3-haiku-*

See the full configuration reference in Environment Variables.

Next Steps

Last updated on