Introduction to Rezmo
Rezmo is your AI-powered SRE teammate. It connects to your observability stack to automatically investigate alerts, hypothesize root causes, and verify them—just like a senior engineer would.
Unlike simple "chatbot" overlays, Rezmo uses a sophisticated Reasoning Graph to iteratively troubleshoot issues, minimizing Mean Time To Resolution (MTTR).
Quick Start
You can deploy Rezmo locally using Docker Compose or on Kubernetes via Helm.
Docker Compose
git clone https://github.com/rezmo/antigravity.git
cd antigravity
docker-compose up -d
Helm (Kubernetes)
Self-Hosted Installation
For enterprise environments requiring full data sovereignty, Rezmo can be deployed in a self-hosted configuration within your own Kubernetes cluster.
1. Authentication
Accessing the private Rezmo Helm registry requires an OCI-compliant login. Use your provided organization credentials:
export REZMO_TOKEN="your-access-token"
echo $REZMO_TOKEN | helm registry login ghcr.io -u your-username --password-stdin
2. Installation
Deploy the RCA system using the versioned OCI package. You can customize the deployment for local or
cloud infrastructure via --set flags.
# Basic installation
helm install rezmo oci://ghcr.io/your-org/charts/rca-system --version 1.1.x \
--namespace rca-system --create-namespace
3. Configuration Profiles
Rezmo supports highly flexible storage and database configurations suited for different enterprise needs:
- Storage: Toggle between automated
StorageClassprovisioning for cloud orHostPathfor on-premise VM clusters. - Database: Use the internal high-performance MySQL pod or connect to an external managed database (e.g., Azure MySQL, AWS RDS).
Architecture
Rezmo consists of two primary components:
- AI Backend (Brain): Powered by
LangGraph, this component handles the reasoning logic, state management, and LLM interactions. - Frontend (Dashboard): A clean interface for effective visualization of Incident Reports and system status.
Automated RCA
Rezmo isn't just a passive observer. When an alert arrives (e.g., from Prometheus or Slack), Rezmo triggers an Automated Root Cause Analysis workflow.
The Reasoning Graph
At the core of Rezmo is the LangGraphRCASystem. This directed cyclic graph (DAG) models the
investigative process:
- Analyze Alert: Understand the incoming signal.
- Generate Hypothesis: Brainstorm possible causes based on topology.
- Verify: Execute queries against OpenSearch/Prometheus.
- Conclusion: Generate a final report.
Dynamic Prompting
To ensure high accuracy, Rezmo uses dynamic prompt engineering. Context-aware prompts are injected into the LLM based on the specific type of incident (Database vs. Network vs. Application), ensuring the AI focuses on relevant metrics.
Integrations
Rezmo "plays nice" with your existing stack.
Data Sources
- Prometheus: for time-series metrics.
- OpenSearch / Elasticsearch: for log aggregation.
- PostgreSQL: for stored configuration and history.
Slack Bot
Add the Rezmo Bot to your incident channels. You can trigger investigations directly from Slack:
@Rezmo investigate alert-123