Test Runner
The Lamdis test runner (lamdis-runs) is an open-source engine for testing AI assistants and agents. It executes test suites against your chatbots, copilots, RAG systems, or workflow agents.
Overview
The test runner supports:
- Multi-turn conversations — Exercise complex, multi-step dialogues with your assistant
- LLM-based judging — Use semantic evaluation powered by AWS Bedrock to check assistant responses
- HTTP request steps — Create or verify data via API calls during tests
- Variable interpolation — Pass data between steps dynamically
- Multiple execution channels — HTTP chat or AWS Bedrock
Architecture
┌─────────────────────────────────────────────────────────┐
│ Test Suite │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Tests (messages, steps, assertions) │ │
│ └─────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────────────────────────────┐ │
│ │ Test Runner Engine │ │
│ │ - Step execution │ │
│ │ - Variable bag management │ │
│ │ - Transcript tracking │ │
│ └─────────────────────────────────────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Target Assistant │ │ LLM Judge │ │
│ │ (Your AI) │ │ (AWS Bedrock) │ │
│ └──────────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────┘Storage Providers
The test runner supports three storage modes:
| Mode | DB_PROVIDER | Required Env Var | Best For |
|---|---|---|---|
| Local | local | (none) | CLI runs, CI pipelines |
| MongoDB | mongo | MONGO_URL | Persistent storage |
| PostgreSQL | postgres | DATABASE_URL | Enterprise setups |
Auto-Detection
When DB_PROVIDER is not set, the runner automatically detects the storage mode:
DATABASE_URLstarting withpostgres→ PostgreSQLMONGO_URLset → MongoDB- Otherwise → Local (in-memory/JSON)
Execution Channels
The test runner can communicate with your assistant via:
HTTP Chat (http_chat)
Standard HTTP endpoint. The runner POSTs to your assistant’s /chat endpoint:
{
"message": "User's message",
"transcript": [...previous messages...],
"persona": "Optional persona text"
}Expected response:
{
"reply": "Assistant's response"
}This is the most common integration method and works with any assistant that exposes an HTTP API.
Bedrock Chat (bedrock_chat)
Direct AWS Bedrock integration for testing Bedrock-hosted models. Requires AWS credentials and region.
LLM Judge
The LLM judge uses AWS Bedrock (Claude) to evaluate assistant responses against your rubrics.
Configuration
| Environment Variable | Description | Default |
|---|---|---|
BEDROCK_MODEL_ID | Model for judging | anthropic.claude-3-haiku-20240307-v1:0 |
BEDROCK_JUDGE_MODEL_ID | Override model specifically for judging | Uses BEDROCK_MODEL_ID |
BEDROCK_JUDGE_TEMPERATURE | Temperature for judge calls | 0.3 |
AWS_REGION | AWS region with Bedrock access | Required |
Example Configuration
export AWS_REGION=us-east-1
export BEDROCK_MODEL_ID=anthropic.claude-3-haiku-20240307-v1:0File Organization
When using JSON-based test definitions:
configs/
├── auth/ # Authentication configurations
├── requests/ # Reusable HTTP request definitions
├── personas/ # End-user persona definitions
├── assistants/ # Assistant endpoint configurations
├── tests/ # Individual test files
└── suites/ # Test suite groupingsRunning Tests
Via CLI (Local Development)
# Run a single test file
npm run run-file -- tests/my-tests.json
# Run a suite
npm run run-file -- suites/my-suite.jsonVia Dashboard
- Go to Testing → Suites
- Select a suite and click Run Now
- Monitor progress on the live run page
- View results in Testing → Results
Via API (CI/CD)
curl -X POST "$LAMDIS_RUNS_URL/internal/runs/start" \
-H "x-api-token: $LAMDIS_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"mode": "json",
"suites": ["my-suite"],
"webhookUrl": "https://ci.example.com/webhook"
}'Environment Variables
Required
| Variable | Description |
|---|---|
LAMDIS_API_TOKEN | Token to protect /internal endpoints |
AWS_REGION | AWS region with Bedrock access |
Optional
| Variable | Description | Default |
|---|---|---|
PORT | HTTP port | 3101 |
DB_PROVIDER | Storage mode | auto-detect |
BEDROCK_MODEL_ID | Bedrock model for judging | anthropic.claude-3-haiku-* |
BEDROCK_JUDGE_TEMPERATURE | Judge temperature | 0.3 |
Next Steps
- Create tests in the dashboard
- Set up CI/CD integration for automated testing
- Configure connections to your assistant