Skip to content

rahulbsw/streamforge

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

166 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

StreamForge

Selective replication for Kafka, with Redpanda as a compatibility target. Filter, transform, redact, and route data between topics and clusters without Kafka Connect.

License Rust Version Kafka Redpanda CI


StreamForge helps data teams move only the records and fields downstream systems actually need. Instead of mirroring whole topics, StreamForge lets you filter, reshape, redact, and route messages before they land in analytics, lake, or lower-trust environments.

Why Teams Use StreamForge

  • Replicate only analytics-safe fields instead of whole topics
  • Split one source topic into multiple downstream topics
  • Hash or drop PII before data crosses trust boundaries
  • Keep the deployment surface small with a single binary, operator, and Helm chart

Watch StreamForge Deploy on Minikube

StreamForge UI demo on Minikube

This UI-driven demo shows the path most teams actually want to see first: install with Helm, create a pipeline in the browser, review the generated YAML, deploy the CRD, then verify transformed output on Kafka.

UI Demo | 5-Minute CLI Demo | Examples | Compatibility | Documentation Index


When to Use StreamForge

Use StreamForge when you need:

  • selective replication to analytics or data lake pipelines
  • PII-safe replication across environments
  • topic fan-out with payload shaping
  • a smaller operational footprint than Kafka Connect

Do not position StreamForge as:

  • a full replacement for MirrorMaker 2 active-active or offset-sync workflows
  • a general-purpose stateful stream processor

For concrete usage patterns and configs, see docs/USAGE.md and examples/README.md.

5-Minute Demo

  1. Start the local Redpanda demo broker:
    docker compose -f examples/redpanda/docker-compose.yml up -d
  2. Validate the selective replication config:
    cargo run --quiet --bin streamforge-validate -- examples/redpanda/selective-replication.yaml
  3. Run StreamForge with the same config:
    CONFIG_FILE=examples/redpanda/selective-replication.yaml \
      cargo run --release --bin streamforge
    Leave StreamForge running in this terminal.
  4. Open a second terminal and follow docs/QUICKSTART.md to create the demo topics, produce a sample order, and inspect analytics-orders and pii-safe-orders.

If you want the Kubernetes + UI path instead of the local CLI path, use docs/UI_MINIKUBE_DEMO.md.

Production Trust Signals

  • At-least-once delivery with retry and DLQ support
  • Native Prometheus metrics and lag monitoring
  • Kubernetes operator, Helm chart, and web UI
  • Kafka-first examples for standalone configs and Kubernetes pipelines

Compatibility

StreamForge is built for Kafka-compatible brokers. Kafka is the primary target in current docs and examples, and Redpanda is documented here as a compatibility target for the selective replication workflows covered in this repo.


Core Capabilities

  • Content-based filtering across payload, key, headers, and timestamps
  • Field extraction, reshaping, and PII hashing before downstream delivery
  • Topic fan-out from one source topic to multiple destination topics
  • At-least-once delivery with retry, DLQ handling, and observability hooks
  • Standalone binary and Kubernetes operator deployment modes

Example Pipelines

Deploy and Operate

Learn More

Contributing

Contribution and development setup are documented in docs/CONTRIBUTING.md.

License

Apache License 2.0. See LICENSE for details.