Skip to content

Emit live calibration telemetry for dataset development dashboard #16

@MaxGhenis

Description

@MaxGhenis

Goal

Microplex should emit live calibration and build telemetry to a shared store so dataset development can be inspected in real time, whether the run is local, in CI, or on a remote worker.

This belongs in core microplex: the writer abstraction, event schema, local/Supabase sinks, and incognito behavior are generic. Country packs such as microplex-us should add domain-specific target metadata, target-family classifiers, and PolicyEngine-specific adapters.

The old Streamlit repo PolicyEngine/policyengine-us-calibration-validation has been archived. The replacement should be a live Microplex/Arch dashboard pipeline, not a stale standalone app.

Desired behavior

  • Each build gets a stable run_id and writes progress snapshots as it runs.
  • Local, CI, and remote/cloud runs write the same telemetry schema.
  • A dashboard can show:
    • current stage and stage timings
    • calibration epoch progress
    • objective/loss by epoch
    • target-level estimates, target values, relative errors, and weighted loss terms
    • target metadata: source, geography, target family, split/train/holdout, in_loss_function, support status
    • solver config: backend, epochs/max_iter, L0/L2 settings, gates on/off, learning rate/optimizer where relevant
    • dataset/build manifest metadata: spines vs donors, row counts, nonzero weights, ESS, output artifact references
  • Default behavior should upload to a central PolicyEngine store, likely Supabase.
  • There must be an incognito / opt-out mode for purely local experiments where telemetry is written only locally and is not uploaded.
  • No secrets or restricted source microdata should be uploaded; emit aggregates, metadata, and artifact pointers only.

Suggested core architecture

  • Add a small telemetry writer abstraction in core microplex, for example:
    • LocalTelemetryWriter: writes JSONL/Parquet snapshots under the artifact root.
    • SupabaseTelemetryWriter: upserts the same events/tables to Supabase.
    • NullTelemetryWriter: for disabled/incognito runs.
  • Define generic event dataclasses/types for:
    • run lifecycle
    • stage lifecycle
    • calibration epoch metrics
    • target diagnostics
    • artifact references
  • Wire the writer into generic calibration loops and build/stage utilities.
  • Make country packs register optional enrichers for domain-specific target metadata.
  • Use append-only events for progress, plus upserted summary rows for current dashboard state.

Minimal event/table shape

  • runs
    • run_id, build_id, engine, period, created_at, code_ref, config_hash, incognito, status
  • run_stages
    • run_id, stage, status, started_at, completed_at, elapsed_seconds, rss_mb, notes
  • calibration_epochs
    • run_id, calibration_id, epoch, objective, data_loss, l0_penalty, l2_penalty, nonzero_weights, ess, timestamp
  • calibration_targets
    • run_id, calibration_id, epoch_or_final, target_name, family, split, source, geography, target_value, estimate, relative_error, weighted_term, in_loss_function, support_status
  • artifacts
    • run_id, artifact_kind, path_or_uri, sha256, size_bytes, created_at

Country-pack responsibilities

For microplex-us, separately wire:

Acceptance criteria

  • A local build can emit telemetry locally with upload disabled.
  • A CI run can upload progress to Supabase without requiring user secrets in logs.
  • The dashboard can display progress while calibration is still running, not only after final dataset export.
  • incognito mode prevents remote writes and is clearly visible in local artifacts.
  • Unit tests cover local writer behavior and assert restricted row-level source data is not emitted.
  • A smoke/integration path exists for Supabase upserts, gated behind env vars.
  • At least one country-pack integration, likely microplex-us, demonstrates target metadata enrichment without putting the generic telemetry machinery in the country pack.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions