PromptOps

A prompt management and evaluation platform for teams shipping AI products.

Next.jsFastAPIPostgreSQLRedisOpenAI
StatusLive
Year2026
RoleSolo Engineer
Screenshot of PromptOps dashboard showing prompt versioning and evaluation workflow

Problem

Changing prompts in production is risky. Small tweaks that fix one edge case break others. Teams need a way to version, test, and compare prompt changes before shipping them, without building custom eval infrastructure.

Approach

Built a version, generate scenarios, replay, compare, promote workflow. Users version their prompts, generate test scenarios with AI, replay real traces against new versions, and compare results with an LLM-as-judge that scores quality and flags regressions.

Key Features

  • AI-generated test scenarios from natural language descriptions
  • Time-travel replay: same inputs through new prompts
  • LLM-as-judge with randomized A/B position to prevent bias
  • Interactive judge chat for improvement suggestions
  • Magic link auth, multi-user workspaces
  • API keys for programmatic integration

Architecture

Frontend (Next.js on Vercel) connects to a FastAPI backend (Railway) backed by PostgreSQL for persistence and Redis with Celery for async scenario generation and replay. 5-service Docker Compose for local development.

Visit PromptOps