Introduction to Rezmo

Rezmo is your AI-powered SRE teammate. It connects to your observability stack to automatically investigate alerts, hypothesize root causes, and verify them—just like a senior engineer would.

Unlike simple "chatbot" overlays, Rezmo uses a sophisticated Reasoning Graph to iteratively troubleshoot issues, minimizing Mean Time To Resolution (MTTR).

Quick Start

You can deploy Rezmo locally using Docker Compose or on Kubernetes via Helm.

Docker Compose

git clone https://github.com/rezmo/antigravity.git
cd antigravity
docker-compose up -d

Helm (Kubernetes)

Self-Hosted Installation

For enterprise environments requiring full data sovereignty, Rezmo can be deployed in a self-hosted configuration within your own Kubernetes cluster.

1. Authentication

Accessing the private Rezmo Helm registry requires an OCI-compliant login. Use your provided organization credentials:

export REZMO_TOKEN="your-access-token"
echo $REZMO_TOKEN | helm registry login ghcr.io -u your-username --password-stdin

2. Installation

Deploy the RCA system using the versioned OCI package. You can customize the deployment for local or cloud infrastructure via --set flags.

# Basic installation
helm install rezmo oci://ghcr.io/your-org/charts/rca-system --version 1.1.x \
  --namespace rca-system --create-namespace

3. Configuration Profiles

Rezmo supports highly flexible storage and database configurations suited for different enterprise needs:

Storage: Toggle between automated StorageClass provisioning for cloud or HostPath for on-premise VM clusters.
Database: Use the internal high-performance MySQL pod or connect to an external managed database (e.g., Azure MySQL, AWS RDS).

Architecture

Rezmo consists of two primary components:

AI Backend (Brain): Powered by LangGraph, this component handles the reasoning logic, state management, and LLM interactions.
Frontend (Dashboard): A clean interface for effective visualization of Incident Reports and system status.

Automated RCA

Rezmo isn't just a passive observer. When an alert arrives (e.g., from Prometheus or Slack), Rezmo triggers an Automated Root Cause Analysis workflow.

Feature Highlight: Rezmo doesn't just "guess". It formulates a hypothesis (e.g., "Latency is due to DB lock") and then queries your metrics to verify it.

The Reasoning Graph

At the core of Rezmo is the LangGraphRCASystem. This directed cyclic graph (DAG) models the investigative process:

Analyze Alert: Understand the incoming signal.
Generate Hypothesis: Brainstorm possible causes based on topology.
Verify: Execute queries against OpenSearch/Prometheus.
Conclusion: Generate a final report.

Dynamic Prompting

To ensure high accuracy, Rezmo uses dynamic prompt engineering. Context-aware prompts are injected into the LLM based on the specific type of incident (Database vs. Network vs. Application), ensuring the AI focuses on relevant metrics.

Integrations

Rezmo "plays nice" with your existing stack.

Data Sources

Prometheus: for time-series metrics.
OpenSearch / Elasticsearch: for log aggregation.
PostgreSQL: for stored configuration and history.

Slack Bot

Add the Rezmo Bot to your incident channels. You can trigger investigations directly from Slack:

@Rezmo investigate alert-123