Skip to Content
ConceptsTest Runner

Test Runner

The Lamdis test runner (lamdis-runs) is an open-source engine for testing AI assistants and agents. It executes test suites against your chatbots, copilots, RAG systems, or workflow agents.

Overview

The test runner supports:

  • Multi-turn conversations — Exercise complex, multi-step dialogues with your assistant
  • LLM-based judging — Use semantic evaluation powered by AWS Bedrock to check assistant responses
  • HTTP request steps — Create or verify data via API calls during tests
  • Variable interpolation — Pass data between steps dynamically
  • Multiple execution channels — HTTP chat or AWS Bedrock

Architecture

┌─────────────────────────────────────────────────────────┐ │ Test Suite │ │ ┌─────────────────────────────────────────────────┐ │ │ │ Tests (messages, steps, assertions) │ │ │ └─────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌─────────────────────────────────────────────────┐ │ │ │ Test Runner Engine │ │ │ │ - Step execution │ │ │ │ - Variable bag management │ │ │ │ - Transcript tracking │ │ │ └─────────────────────────────────────────────────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌──────────────────┐ ┌──────────────────┐ │ │ │ Target Assistant │ │ LLM Judge │ │ │ │ (Your AI) │ │ (AWS Bedrock) │ │ │ └──────────────────┘ └──────────────────┘ │ └─────────────────────────────────────────────────────────┘

Storage Providers

The test runner supports three storage modes:

ModeDB_PROVIDERRequired Env VarBest For
Locallocal(none)CLI runs, CI pipelines
MongoDBmongoMONGO_URLPersistent storage
PostgreSQLpostgresDATABASE_URLEnterprise setups

Auto-Detection

When DB_PROVIDER is not set, the runner automatically detects the storage mode:

  • DATABASE_URL starting with postgres → PostgreSQL
  • MONGO_URL set → MongoDB
  • Otherwise → Local (in-memory/JSON)

Execution Channels

The test runner can communicate with your assistant via:

HTTP Chat (http_chat)

Standard HTTP endpoint. The runner POSTs to your assistant’s /chat endpoint:

{ "message": "User's message", "transcript": [...previous messages...], "persona": "Optional persona text" }

Expected response:

{ "reply": "Assistant's response" }

This is the most common integration method and works with any assistant that exposes an HTTP API.

Bedrock Chat (bedrock_chat)

Direct AWS Bedrock integration for testing Bedrock-hosted models. Requires AWS credentials and region.

LLM Judge

The LLM judge uses AWS Bedrock (Claude) to evaluate assistant responses against your rubrics.

Configuration

Environment VariableDescriptionDefault
BEDROCK_MODEL_IDModel for judginganthropic.claude-3-haiku-20240307-v1:0
BEDROCK_JUDGE_MODEL_IDOverride model specifically for judgingUses BEDROCK_MODEL_ID
BEDROCK_JUDGE_TEMPERATURETemperature for judge calls0.3
AWS_REGIONAWS region with Bedrock accessRequired

Example Configuration

export AWS_REGION=us-east-1 export BEDROCK_MODEL_ID=anthropic.claude-3-haiku-20240307-v1:0

File Organization

When using JSON-based test definitions:

configs/ ├── auth/ # Authentication configurations ├── requests/ # Reusable HTTP request definitions ├── personas/ # End-user persona definitions ├── assistants/ # Assistant endpoint configurations ├── tests/ # Individual test files └── suites/ # Test suite groupings

Running Tests

Via CLI (Local Development)

# Run a single test file npm run run-file -- tests/my-tests.json # Run a suite npm run run-file -- suites/my-suite.json

Via Dashboard

  1. Go to Testing → Suites 
  2. Select a suite and click Run Now
  3. Monitor progress on the live run page
  4. View results in Testing → Results 

Via API (CI/CD)

curl -X POST "$LAMDIS_RUNS_URL/internal/runs/start" \ -H "x-api-token: $LAMDIS_API_TOKEN" \ -H "Content-Type: application/json" \ -d '{ "mode": "json", "suites": ["my-suite"], "webhookUrl": "https://ci.example.com/webhook" }'

Environment Variables

Required

VariableDescription
LAMDIS_API_TOKENToken to protect /internal endpoints
AWS_REGIONAWS region with Bedrock access

Optional

VariableDescriptionDefault
PORTHTTP port3101
DB_PROVIDERStorage modeauto-detect
BEDROCK_MODEL_IDBedrock model for judginganthropic.claude-3-haiku-*
BEDROCK_JUDGE_TEMPERATUREJudge temperature0.3

Next Steps

Last updated on