Test Runner

The Lamdis test runner (lamdis-runs) is an open-source engine for testing AI assistants and agents. It executes test suites against your chatbots, copilots, RAG systems, or workflow agents.

Overview

The test runner supports:

Multi-turn conversations — Exercise complex, multi-step dialogues with your assistant
LLM-based judging — Use semantic evaluation powered by AWS Bedrock to check assistant responses
HTTP request steps — Create or verify data via API calls during tests
Variable interpolation — Pass data between steps dynamically
Multiple execution channels — HTTP chat or AWS Bedrock

Architecture


┌─────────────────────────────────────────────────────────┐
│                    Test Suite                            │
│  ┌─────────────────────────────────────────────────┐   │
│  │  Tests (messages, steps, assertions)             │   │
│  └─────────────────────────────────────────────────┘   │
│                        │                                 │
│                        ▼                                 │
│  ┌─────────────────────────────────────────────────┐   │
│  │  Test Runner Engine                              │   │
│  │  - Step execution                                │   │
│  │  - Variable bag management                       │   │
│  │  - Transcript tracking                           │   │
│  └─────────────────────────────────────────────────┘   │
│              │                    │                      │
│              ▼                    ▼                      │
│   ┌──────────────────┐  ┌──────────────────┐           │
│   │ Target Assistant │  │   LLM Judge      │           │
│   │ (Your AI)        │  │ (AWS Bedrock)    │           │
│   └──────────────────┘  └──────────────────┘           │
└─────────────────────────────────────────────────────────┘

Storage Providers

The test runner supports three storage modes:

Mode	`DB_PROVIDER`	Required Env Var	Best For
Local	`local`	(none)	CLI runs, CI pipelines
MongoDB	`mongo`	`MONGO_URL`	Persistent storage
PostgreSQL	`postgres`	`DATABASE_URL`	Enterprise setups

Auto-Detection

When DB_PROVIDER is not set, the runner automatically detects the storage mode:

DATABASE_URL starting with postgres → PostgreSQL
MONGO_URL set → MongoDB
Otherwise → Local (in-memory/JSON)

Execution Channels

The test runner can communicate with your assistant via:

HTTP Chat (`http_chat`)

Standard HTTP endpoint. The runner POSTs to your assistant’s /chat endpoint:


{
  "message": "User's message",
  "transcript": [...previous messages...],
  "persona": "Optional persona text"
}

Expected response:


{
  "reply": "Assistant's response"
}

This is the most common integration method and works with any assistant that exposes an HTTP API.

Bedrock Chat (`bedrock_chat`)

Direct AWS Bedrock integration for testing Bedrock-hosted models. Requires AWS credentials and region.

LLM Judge

The LLM judge uses AWS Bedrock (Claude) to evaluate assistant responses against your rubrics.

Configuration

Environment Variable	Description	Default
`BEDROCK_MODEL_ID`	Model for judging	`anthropic.claude-3-haiku-20240307-v1:0`
`BEDROCK_JUDGE_MODEL_ID`	Override model specifically for judging	Uses `BEDROCK_MODEL_ID`
`BEDROCK_JUDGE_TEMPERATURE`	Temperature for judge calls	`0.3`
`AWS_REGION`	AWS region with Bedrock access	Required

Example Configuration


export AWS_REGION=us-east-1
export BEDROCK_MODEL_ID=anthropic.claude-3-haiku-20240307-v1:0

File Organization

When using JSON-based test definitions:


configs/
├── auth/           # Authentication configurations
├── requests/       # Reusable HTTP request definitions
├── personas/       # End-user persona definitions
├── assistants/     # Assistant endpoint configurations
├── tests/          # Individual test files
└── suites/         # Test suite groupings

Running Tests

Via CLI (Local Development)


# Run a single test file
npm run run-file -- tests/my-tests.json
 
# Run a suite
npm run run-file -- suites/my-suite.json

Via Dashboard

Go to Testing → Suites
Select a suite and click Run Now
Monitor progress on the live run page
View results in Testing → Results

Via API (CI/CD)


curl -X POST "$LAMDIS_RUNS_URL/internal/runs/start" \
  -H "x-api-token: $LAMDIS_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "mode": "json",
    "suites": ["my-suite"],
    "webhookUrl": "https://ci.example.com/webhook"
  }'

Environment Variables

Required

Variable	Description
`LAMDIS_API_TOKEN`	Token to protect `/internal` endpoints
`AWS_REGION`	AWS region with Bedrock access

Optional

Variable	Description	Default
`PORT`	HTTP port	`3101`
`DB_PROVIDER`	Storage mode	auto-detect
`BEDROCK_MODEL_ID`	Bedrock model for judging	`anthropic.claude-3-haiku-*`
`BEDROCK_JUDGE_TEMPERATURE`	Judge temperature	`0.3`

Next Steps

Create tests in the dashboard
Set up CI/CD integration for automated testing
Configure connections to your assistant