diff --git a/AGENTS.md b/AGENTS.md
index 764ed25..2e9ddac 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -58,8 +58,8 @@ When reviewing recent changes here, check:
For a quick review, read:
-1. [`/Users/maxghenis/CosilicoAI/microplex/AGENTS.md`](/Users/maxghenis/CosilicoAI/microplex/AGENTS.md)
-2. [`/Users/maxghenis/CosilicoAI/microplex/_WORKSPACE.md`](/Users/maxghenis/CosilicoAI/microplex/_WORKSPACE.md)
-3. [`/Users/maxghenis/CosilicoAI/microplex/_BUILD_LOG.md`](/Users/maxghenis/CosilicoAI/microplex/_BUILD_LOG.md)
+1. [`/Users/maxghenis/PolicyEngine/microplex/AGENTS.md`](/Users/maxghenis/PolicyEngine/microplex/AGENTS.md)
+2. [`/Users/maxghenis/PolicyEngine/microplex/_WORKSPACE.md`](/Users/maxghenis/PolicyEngine/microplex/_WORKSPACE.md)
+3. [`/Users/maxghenis/PolicyEngine/microplex/_BUILD_LOG.md`](/Users/maxghenis/PolicyEngine/microplex/_BUILD_LOG.md)
Then inspect changed files and return findings first.
diff --git a/LICENSE b/LICENSE
index 93a7b5d..0bc753e 100644
--- a/LICENSE
+++ b/LICENSE
@@ -1,6 +1,6 @@
MIT License
-Copyright (c) 2024 Cosilico
+Copyright (c) 2024 PolicyEngine
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
diff --git a/QRF_BENCHMARK_SUMMARY.md b/QRF_BENCHMARK_SUMMARY.md
index a38be4c..67f85b7 100644
--- a/QRF_BENCHMARK_SUMMARY.md
+++ b/QRF_BENCHMARK_SUMMARY.md
@@ -29,7 +29,7 @@ This benchmark compares **microplex** (normalizing flows with two-stage zero-inf
### Recommendation
-**Transition from Sequential QRF to microplex for PolicyEngine/Cosilico production use.**
+**Transition from Sequential QRF to microplex for PolicyEngine/PolicyEngine production use.**
microplex provides superior statistical fidelity while being significantly faster, making it ideal for:
- CPS/ACS income imputation
@@ -186,7 +186,7 @@ microplex provides superior statistical fidelity while being significantly faste
### Run the Benchmark
```bash
-cd /Users/maxghenis/CosilicoAI/micro
+cd /Users/maxghenis/PolicyEngine/micro
# Install dependencies
pip install scikit-learn>=1.3 matplotlib seaborn
@@ -218,7 +218,7 @@ Results saved to `benchmarks/results/`:
## Conclusion
-**microplex is the superior choice for PolicyEngine/Cosilico microdata enhancement.**
+**microplex is the superior choice for PolicyEngine/PolicyEngine microdata enhancement.**
The benchmarks demonstrate:
1. ✅ **5.5x better marginal fidelity** - Critical for accurate policy estimates
diff --git a/README.md b/README.md
index 0068a28..403a8e8 100644
--- a/README.md
+++ b/README.md
@@ -3,8 +3,8 @@
Multi-source microdata synthesis and survey reweighting.
[](https://pypi.org/project/microplex/)
-[](https://github.com/CosilicoAI/microplex/actions/workflows/test.yml)
-[](https://cosilicoai.github.io/microplex)
+[](https://github.com/PolicyEngine/microplex/actions/workflows/test.yml)
+[](https://policyengine.github.io/microplex)
## Overview
@@ -145,11 +145,11 @@ print(f"Using {stats['n_nonzero']} of {stats['n_records']} records")
## Documentation
-Full documentation at [cosilicoai.github.io/microplex](https://cosilicoai.github.io/microplex)
+Full documentation at [policyengine.github.io/microplex](https://policyengine.github.io/microplex)
-- [Tutorial](https://cosilicoai.github.io/microplex/tutorial.html)
-- [API Reference](https://cosilicoai.github.io/microplex/api.html)
-- [Benchmarks](https://cosilicoai.github.io/microplex/benchmarks.html)
+- [Tutorial](https://policyengine.github.io/microplex/tutorial.html)
+- [API Reference](https://policyengine.github.io/microplex/api.html)
+- [Benchmarks](https://policyengine.github.io/microplex/benchmarks.html)
## Benchmarks
@@ -166,7 +166,7 @@ See [benchmarks/](benchmarks/) for synthesis method comparisons:
author = {Ghenis, Max},
title = {microplex: Multi-source microdata synthesis and survey reweighting},
year = {2025},
- url = {https://github.com/CosilicoAI/microplex}
+ url = {https://github.com/PolicyEngine/microplex}
}
```
diff --git a/_WORKSPACE.md b/_WORKSPACE.md
index d1d1407..57ae980 100644
--- a/_WORKSPACE.md
+++ b/_WORKSPACE.md
@@ -8,8 +8,8 @@ This file is the durable local context for `microplex` core.
Sibling repos:
-- [`/Users/maxghenis/CosilicoAI/microplex-us`](/Users/maxghenis/CosilicoAI/microplex-us)
-- [`/Users/maxghenis/CosilicoAI/microplex-uk`](/Users/maxghenis/CosilicoAI/microplex-uk)
+- [`/Users/maxghenis/PolicyEngine/microplex-us`](/Users/maxghenis/PolicyEngine/microplex-us)
+- [`/Users/maxghenis/PolicyEngine/microplex-uk`](/Users/maxghenis/PolicyEngine/microplex-uk)
## Current shared seams
diff --git a/benchmarks/DISTRIBUTIONAL_METRICS.md b/benchmarks/DISTRIBUTIONAL_METRICS.md
index b6a79cb..84506c2 100644
--- a/benchmarks/DISTRIBUTIONAL_METRICS.md
+++ b/benchmarks/DISTRIBUTIONAL_METRICS.md
@@ -201,32 +201,32 @@ predictions += np.random.normal(0, fixed_noise_scale) # Fixed noise
## Files Created
### Core Metrics Module
-- `/Users/maxghenis/CosilicoAI/micro/benchmarks/metrics.py`
+- `/Users/maxghenis/PolicyEngine/micro/benchmarks/metrics.py`
- All distributional quality metrics
- Comprehensive evaluation function
- Report generation utilities
### Benchmark Scripts
-- `/Users/maxghenis/CosilicoAI/micro/benchmarks/run_distributional_benchmark.py`
+- `/Users/maxghenis/PolicyEngine/micro/benchmarks/run_distributional_benchmark.py`
- Runs full distributional quality benchmark
- Generates visualizations and report
- Compares QRF vs microplex
### Results
-- `/Users/maxghenis/CosilicoAI/micro/benchmarks/results/distributional_quality.md`
+- `/Users/maxghenis/PolicyEngine/micro/benchmarks/results/distributional_quality.md`
- Full analysis report
-- `/Users/maxghenis/CosilicoAI/micro/benchmarks/results/distributional_metrics.json`
+- `/Users/maxghenis/PolicyEngine/micro/benchmarks/results/distributional_metrics.json`
- Raw metrics in JSON format
-- `/Users/maxghenis/CosilicoAI/micro/benchmarks/results/distributional_*.png`
+- `/Users/maxghenis/PolicyEngine/micro/benchmarks/results/distributional_*.png`
- 5 visualization files
## Usage
```bash
-cd /Users/maxghenis/CosilicoAI/micro
+cd /Users/maxghenis/PolicyEngine/micro
# Run full distributional benchmark
-/Users/maxghenis/CosilicoAI/micro/.venv/bin/python benchmarks/run_distributional_benchmark.py
+/Users/maxghenis/PolicyEngine/micro/.venv/bin/python benchmarks/run_distributional_benchmark.py
# Results will be saved to benchmarks/results/
```
@@ -262,4 +262,4 @@ cd /Users/maxghenis/CosilicoAI/micro
---
**Created:** December 25, 2024
-**Location:** `/Users/maxghenis/CosilicoAI/micro/benchmarks/DISTRIBUTIONAL_METRICS.md`
+**Location:** `/Users/maxghenis/PolicyEngine/micro/benchmarks/DISTRIBUTIONAL_METRICS.md`
diff --git a/benchmarks/MULTIVARIATE_METRICS.md b/benchmarks/MULTIVARIATE_METRICS.md
index 449a80b..0698c3d 100644
--- a/benchmarks/MULTIVARIATE_METRICS.md
+++ b/benchmarks/MULTIVARIATE_METRICS.md
@@ -183,7 +183,7 @@ This tests the full joint distribution, not just target variables.
### Running the Benchmark
```bash
-cd /Users/maxghenis/CosilicoAI/micro
+cd /Users/maxghenis/PolicyEngine/micro
source .venv/bin/activate
python benchmarks/run_multivariate_benchmark.py
```
diff --git a/benchmarks/README.md b/benchmarks/README.md
index da9c154..349bc0c 100644
--- a/benchmarks/README.md
+++ b/benchmarks/README.md
@@ -56,6 +56,6 @@ benchmarks/results/
author = {Ghenis, Max},
title = {microplex: Multi-source microdata synthesis and survey reweighting},
year = {2025},
- url = {https://github.com/CosilicoAI/microplex}
+ url = {https://github.com/PolicyEngine/microplex}
}
```
diff --git a/benchmarks/compare_policyengine.py b/benchmarks/compare_policyengine.py
index 3f95a94..dd978fc 100644
--- a/benchmarks/compare_policyengine.py
+++ b/benchmarks/compare_policyengine.py
@@ -945,7 +945,7 @@ def generate_report(
## Reproducibility
```bash
-cd /Users/maxghenis/CosilicoAI/micro
+cd /Users/maxghenis/PolicyEngine/micro
source .venv/bin/activate
python benchmarks/compare_policyengine.py
```
diff --git a/benchmarks/results/BENCHMARK_REPORT.md b/benchmarks/results/BENCHMARK_REPORT.md
index 5a9e3da..f4519bd 100644
--- a/benchmarks/results/BENCHMARK_REPORT.md
+++ b/benchmarks/results/BENCHMARK_REPORT.md
@@ -295,7 +295,7 @@ For PolicyEngine and economic microsimulation applications, **microplex is the r
## Files Generated
-All benchmark artifacts saved to `/Users/maxghenis/CosilicoAI/micro/benchmarks/results/`:
+All benchmark artifacts saved to `/Users/maxghenis/PolicyEngine/micro/benchmarks/results/`:
- `results.csv` - Summary metrics table
- `results.md` - Markdown results
@@ -312,7 +312,7 @@ All benchmark artifacts saved to `/Users/maxghenis/CosilicoAI/micro/benchmarks/r
To reproduce these benchmarks:
```bash
-cd /Users/maxghenis/CosilicoAI/micro
+cd /Users/maxghenis/PolicyEngine/micro
python benchmarks/run_benchmarks.py
```
diff --git a/benchmarks/results/INDEX.md b/benchmarks/results/INDEX.md
index beb01a4..3271881 100644
--- a/benchmarks/results/INDEX.md
+++ b/benchmarks/results/INDEX.md
@@ -200,7 +200,7 @@ If using these results:
```bibtex
@misc{microplex_benchmarks_2024,
title={microplex Benchmark Results: Comparison Against CT-GAN, TVAE, and Copula},
- author={Cosilico},
+ author={PolicyEngine},
year={2024},
note={Results show 3.3x better marginal fidelity, 1.7x better correlation
preservation, and 2.5x better zero-inflation handling}
@@ -212,7 +212,7 @@ If using these results:
All results are fully reproducible:
```bash
-cd /Users/maxghenis/CosilicoAI/micro
+cd /Users/maxghenis/PolicyEngine/micro
python benchmarks/run_benchmarks.py
```
@@ -224,7 +224,7 @@ python benchmarks/run_benchmarks.py
## Contact
For questions about these benchmarks:
-- Open an issue at github.com/CosilicoAI/microplex
+- Open an issue at github.com/PolicyEngine/microplex
- See BENCHMARK_REPORT.md for technical details
- See ISSUES_FOUND.md for known limitations
diff --git a/benchmarks/results/ISSUES_FOUND.md b/benchmarks/results/ISSUES_FOUND.md
index 8771531..b13799c 100644
--- a/benchmarks/results/ISSUES_FOUND.md
+++ b/benchmarks/results/ISSUES_FOUND.md
@@ -311,6 +311,6 @@ result.median_dcr = np.median(distances)
The opportunities identified above would make the benchmarks more comprehensive and demonstrate microplex's value even more clearly, but the current results already show:
- **Clear superiority** across all fidelity metrics
- **Practical performance** for real-world use
-- **Ready for production** deployment in PolicyEngine/Cosilico
+- **Ready for production** deployment in PolicyEngine/PolicyEngine
Main next step: **Test on real microdata** (CPS, ACS) to validate performance claims.
diff --git a/benchmarks/results/README.md b/benchmarks/results/README.md
index fa270ef..d62dc77 100644
--- a/benchmarks/results/README.md
+++ b/benchmarks/results/README.md
@@ -205,22 +205,22 @@ See **ISSUES_FOUND.md** for detailed improvement opportunities:
### Original Benchmarks (vs CT-GAN, TVAE, Copula)
```bash
-cd /Users/maxghenis/CosilicoAI/micro
+cd /Users/maxghenis/PolicyEngine/micro
python benchmarks/run_benchmarks.py
```
### QRF Comparison Benchmarks
```bash
-cd /Users/maxghenis/CosilicoAI/micro
-/Users/maxghenis/CosilicoAI/micro/.venv/bin/python benchmarks/run_qrf_benchmark.py
+cd /Users/maxghenis/PolicyEngine/micro
+/Users/maxghenis/PolicyEngine/micro/.venv/bin/python benchmarks/run_qrf_benchmark.py
```
### Distributional Quality Benchmarks (NEW)
```bash
-cd /Users/maxghenis/CosilicoAI/micro
-/Users/maxghenis/CosilicoAI/micro/.venv/bin/python benchmarks/run_distributional_benchmark.py
+cd /Users/maxghenis/PolicyEngine/micro
+/Users/maxghenis/PolicyEngine/micro/.venv/bin/python benchmarks/run_distributional_benchmark.py
```
All results are deterministic (fixed random seed = 42).
diff --git a/benchmarks/results/cps_benchmark.md b/benchmarks/results/cps_benchmark.md
index d05cee7..a3a75ea 100644
--- a/benchmarks/results/cps_benchmark.md
+++ b/benchmarks/results/cps_benchmark.md
@@ -151,7 +151,7 @@ While QRF+ZI won this benchmark, microplex may still be preferred when:
## Reproducibility
```bash
-cd /Users/maxghenis/CosilicoAI/microplex
+cd /Users/maxghenis/PolicyEngine/microplex
.venv/bin/python benchmarks/run_cps_benchmark.py
```
diff --git a/benchmarks/results/distributional_quality.md b/benchmarks/results/distributional_quality.md
index 93719ef..d225b6d 100644
--- a/benchmarks/results/distributional_quality.md
+++ b/benchmarks/results/distributional_quality.md
@@ -385,7 +385,7 @@ This benchmark tests whether synthetic data methods capture the **full condition
## Reproducibility
```bash
-cd /Users/maxghenis/CosilicoAI/micro
+cd /Users/maxghenis/PolicyEngine/micro
python benchmarks/run_distributional_benchmark.py
```
diff --git a/benchmarks/results/policyengine_comparison.md b/benchmarks/results/policyengine_comparison.md
index c0262e0..a14baa4 100644
--- a/benchmarks/results/policyengine_comparison.md
+++ b/benchmarks/results/policyengine_comparison.md
@@ -113,7 +113,7 @@ Based on these benchmarks:
## Reproducibility
```bash
-cd /Users/maxghenis/CosilicoAI/micro
+cd /Users/maxghenis/PolicyEngine/micro
source .venv/bin/activate
python benchmarks/compare_policyengine.py
```
diff --git a/benchmarks/results/qrf_comparison.md b/benchmarks/results/qrf_comparison.md
index 973a3a0..f4cc97f 100644
--- a/benchmarks/results/qrf_comparison.md
+++ b/benchmarks/results/qrf_comparison.md
@@ -146,7 +146,7 @@
- Joint distribution quality matters (policy analysis, microsimulation)
- You need conditional relationships preserved
- Zero-inflated economic variables are present
-- You're doing production deployment (PolicyEngine/Cosilico)
+- You're doing production deployment (PolicyEngine/PolicyEngine)
## Recommendations for PolicyEngine
@@ -189,7 +189,7 @@ All visualizations saved to `benchmarks/results/`:
## Reproducibility
```bash
-cd /Users/maxghenis/CosilicoAI/micro
+cd /Users/maxghenis/PolicyEngine/micro
python benchmarks/run_qrf_benchmark.py
```
diff --git a/benchmarks/results/tabpfn_comparison.md b/benchmarks/results/tabpfn_comparison.md
index 108ad8a..960a5dd 100644
--- a/benchmarks/results/tabpfn_comparison.md
+++ b/benchmarks/results/tabpfn_comparison.md
@@ -118,7 +118,7 @@ TabPFN+ZI performs best on zero-inflated variables (assets, debt), while micropl
### Recommendations
-1. **For PolicyEngine/Cosilico production**: Continue using microplex
+1. **For PolicyEngine/PolicyEngine production**: Continue using microplex
- Better correlation preservation is critical for policy simulation
- Generation speed matters for interactive applications
- Scales to full CPS/ACS datasets
@@ -135,7 +135,7 @@ TabPFN+ZI performs best on zero-inflated variables (assets, debt), while micropl
## Reproducibility
```bash
-cd /Users/maxghenis/CosilicoAI/micro
+cd /Users/maxghenis/PolicyEngine/micro
source .venv/bin/activate
pip install tabpfn==0.1.11 # Must use v0.1.11 (later versions are gated)
python benchmarks/run_tabpfn_benchmark.py
@@ -153,4 +153,4 @@ Results are deterministic with random seed = 42.
---
**Generated:** December 26, 2024
-**Location:** /Users/maxghenis/CosilicoAI/micro/benchmarks/results/
+**Location:** /Users/maxghenis/PolicyEngine/micro/benchmarks/results/
diff --git a/benchmarks/run_cps_benchmark.py b/benchmarks/run_cps_benchmark.py
index 538c9ef..4bbc2ca 100644
--- a/benchmarks/run_cps_benchmark.py
+++ b/benchmarks/run_cps_benchmark.py
@@ -1108,7 +1108,7 @@ def generate_cps_markdown_report(
f.write("## Reproducibility\n\n")
f.write("```bash\n")
- f.write("cd /Users/maxghenis/CosilicoAI/microplex\n")
+ f.write("cd /Users/maxghenis/PolicyEngine/microplex\n")
f.write("python benchmarks/run_cps_benchmark.py\n")
f.write("```\n\n")
diff --git a/benchmarks/run_distributional_benchmark.py b/benchmarks/run_distributional_benchmark.py
index 4818575..87cce36 100644
--- a/benchmarks/run_distributional_benchmark.py
+++ b/benchmarks/run_distributional_benchmark.py
@@ -425,7 +425,7 @@ def generate_distributional_markdown_report(
f.write("## Reproducibility\n\n")
f.write("```bash\n")
- f.write("cd /Users/maxghenis/CosilicoAI/micro\n")
+ f.write("cd /Users/maxghenis/PolicyEngine/micro\n")
f.write("python benchmarks/run_distributional_benchmark.py\n")
f.write("```\n\n")
diff --git a/benchmarks/run_qrf_benchmark.py b/benchmarks/run_qrf_benchmark.py
index 11d724c..733f18e 100644
--- a/benchmarks/run_qrf_benchmark.py
+++ b/benchmarks/run_qrf_benchmark.py
@@ -420,7 +420,7 @@ def generate_qrf_markdown_report(
f.write("- Joint distribution quality matters (policy analysis, microsimulation)\n")
f.write("- You need conditional relationships preserved\n")
f.write("- Zero-inflated economic variables are present\n")
- f.write("- You're doing production deployment (PolicyEngine/Cosilico)\n\n")
+ f.write("- You're doing production deployment (PolicyEngine/PolicyEngine)\n\n")
f.write("## Recommendations for PolicyEngine\n\n")
@@ -466,7 +466,7 @@ def generate_qrf_markdown_report(
f.write("## Reproducibility\n\n")
f.write("```bash\n")
- f.write("cd /Users/maxghenis/CosilicoAI/micro\n")
+ f.write("cd /Users/maxghenis/PolicyEngine/micro\n")
f.write("python benchmarks/run_qrf_benchmark.py\n")
f.write("```\n\n")
diff --git a/benchmarks/vs_policyengine.py b/benchmarks/vs_policyengine.py
index 153f4e2..6f3b26a 100644
--- a/benchmarks/vs_policyengine.py
+++ b/benchmarks/vs_policyengine.py
@@ -1,8 +1,8 @@
-"""Performance Benchmark: Cosilico vs PolicyEngine
+"""Performance Benchmark: RuleSpec runtime vs PolicyEngine
This benchmark compares microsimulation performance between:
1. PolicyEngine-US: Established microsimulation framework
-2. Cosilico: New vectorized DSL-based approach
+2. RuleSpec runtime: New vectorized DSL-based approach
Benchmarks:
1. Microsimulation speed - Calculate taxes/benefits for N households
@@ -22,20 +22,20 @@
import gc
import json
-import os
import sys
import time
import tracemalloc
from dataclasses import dataclass, field
from pathlib import Path
-from typing import Optional
import numpy as np
import pandas as pd
# Add paths for local imports
sys.path.insert(0, str(Path(__file__).parents[1] / "src"))
-sys.path.insert(0, str(Path(__file__).parents[2] / "cosilico-engine" / "src"))
+rulespec_engine_path = Path.home() / "TheAxiomFoundation" / "axiom-rules-engine" / "python"
+if rulespec_engine_path.exists():
+ sys.path.insert(0, str(rulespec_engine_path))
@dataclass
@@ -48,7 +48,7 @@ class BenchmarkResult:
memory_peak_mb: float
throughput_records_per_sec: float
success: bool = True
- error: Optional[str] = None
+ error: str | None = None
details: dict = field(default_factory=dict)
def to_dict(self) -> dict:
@@ -136,7 +136,6 @@ def benchmark_policyengine_startup() -> BenchmarkResult:
if "policyengine" in mod_name:
del sys.modules[mod_name]
- from policyengine_us import Simulation
elapsed = time.perf_counter() - start
current, peak = tracemalloc.get_traced_memory()
@@ -165,8 +164,8 @@ def benchmark_policyengine_startup() -> BenchmarkResult:
)
-def benchmark_cosilico_startup() -> BenchmarkResult:
- """Measure Cosilico startup/import time."""
+def benchmark_rulespec_startup() -> BenchmarkResult:
+ """Measure RuleSpec runtime startup/import time."""
gc.collect()
tracemalloc.start()
@@ -174,11 +173,10 @@ def benchmark_cosilico_startup() -> BenchmarkResult:
try:
# Force fresh import
for mod_name in list(sys.modules.keys()):
- if "cosilico" in mod_name:
+ if mod_name.startswith("axiom_rules_engine"):
del sys.modules[mod_name]
- from cosilico.vectorized_executor import VectorizedExecutor
- from cosilico.dsl_parser import parse_dsl
+ import axiom_rules_engine # noqa: F401
elapsed = time.perf_counter() - start
current, peak = tracemalloc.get_traced_memory()
@@ -186,7 +184,7 @@ def benchmark_cosilico_startup() -> BenchmarkResult:
return BenchmarkResult(
name="startup",
- framework="cosilico",
+ framework="rulespec",
n_records=0,
execution_time_ms=elapsed * 1000,
memory_peak_mb=peak / 1024 / 1024,
@@ -197,7 +195,7 @@ def benchmark_cosilico_startup() -> BenchmarkResult:
tracemalloc.stop()
return BenchmarkResult(
name="startup",
- framework="cosilico",
+ framework="rulespec",
n_records=0,
execution_time_ms=0,
memory_peak_mb=0,
@@ -314,7 +312,6 @@ def benchmark_policyengine_batch(
This uses PolicyEngine's built-in vectorization where possible.
"""
from policyengine_us import Simulation
- from policyengine_core.simulations import Simulation as CoreSimulation
n_records = len(data)
gc.collect()
@@ -386,25 +383,26 @@ def benchmark_policyengine_batch(
)
-def benchmark_cosilico_microsim(
+def benchmark_rulespec_microsim(
data: pd.DataFrame,
dsl_code: str,
output_variables: list[str],
name: str = "microsim"
) -> BenchmarkResult:
- """Benchmark Cosilico vectorized execution.
+ """Benchmark RuleSpec runtime vectorized execution.
- Cosilico processes all records in parallel using NumPy vectorization.
+ The old vectorized benchmark API no longer exists; the replacement
+ RuleSpec runtime is tracked separately under Axiom.
"""
- from cosilico.vectorized_executor import VectorizedExecutor
-
n_records = len(data)
gc.collect()
tracemalloc.start()
start = time.perf_counter()
try:
- # Convert data to numpy arrays (Cosilico's native format)
+ import axiom_rules_engine # noqa: F401
+
+ # Convert data to numpy arrays (RuleSpec runtime's intended format).
inputs = {
"wages": data["wages"].values,
"salaries": np.zeros(n_records),
@@ -420,48 +418,26 @@ def benchmark_cosilico_microsim(
"earned_income": data["wages"].values + data["self_employment_income"].values,
}
- # Create executor with default parameters
- executor = VectorizedExecutor(
- parameters={
- # EITC 2024 parameters
- "phase_in_rate": {0: 0.0765, 1: 0.34, 2: 0.40, 3: 0.45},
- "earned_income_amount": {0: 7840, 1: 11750, 2: 16510, 3: 16510},
- "max_credit": {0: 600, 1: 3995, 2: 6604, 3: 7430},
- "phaseout_rate": {0: 0.0765, 1: 0.1598, 2: 0.2106, 3: 0.2106},
- "phaseout_start_single": {0: 9800, 1: 21560, 2: 21560, 3: 21560},
- "phaseout_start_joint": {0: 16370, 1: 28120, 2: 28120, 3: 28120},
- }
- )
-
- # Execute vectorized computation
- results = executor.execute(
- code=dsl_code,
- inputs=inputs,
- output_variables=output_variables,
- )
-
elapsed = time.perf_counter() - start
- current, peak = tracemalloc.get_traced_memory()
+ _current, peak = tracemalloc.get_traced_memory()
tracemalloc.stop()
return BenchmarkResult(
name=name,
- framework="cosilico",
+ framework="rulespec",
n_records=n_records,
execution_time_ms=elapsed * 1000,
memory_peak_mb=peak / 1024 / 1024,
- throughput_records_per_sec=n_records / elapsed,
- details={
- "memory_current_mb": current / 1024 / 1024,
- "variables_computed": output_variables,
- "sample_results": {k: float(v[0]) if len(v) > 0 else None for k, v in results.items()}
- }
+ throughput_records_per_sec=0,
+ success=False,
+ error="RuleSpec vectorized execution is not wired into this legacy benchmark after the Axiom migration.",
+ details={"variables_requested": output_variables, "inputs": sorted(inputs)}
)
except Exception as e:
tracemalloc.stop()
return BenchmarkResult(
name=name,
- framework="cosilico",
+ framework="rulespec",
n_records=n_records,
execution_time_ms=0,
memory_peak_mb=0,
@@ -583,7 +559,7 @@ def run_all_benchmarks(sizes: list[str] = None) -> list[BenchmarkResult]:
}
print("=" * 60)
- print("PERFORMANCE BENCHMARK: Cosilico vs PolicyEngine")
+ print("PERFORMANCE BENCHMARK: RuleSpec runtime vs PolicyEngine")
print("=" * 60)
print()
@@ -591,10 +567,10 @@ def run_all_benchmarks(sizes: list[str] = None) -> list[BenchmarkResult]:
print("1. STARTUP TIME BENCHMARKS")
print("-" * 40)
- print(" Benchmarking Cosilico startup...")
- cosilico_startup = benchmark_cosilico_startup()
- results.append(cosilico_startup)
- print(f" Cosilico: {cosilico_startup.execution_time_ms:.1f}ms, {cosilico_startup.memory_peak_mb:.1f}MB")
+ print(" Benchmarking RuleSpec runtime startup...")
+ rulespec_startup = benchmark_rulespec_startup()
+ results.append(rulespec_startup)
+ print(f" RuleSpec: {rulespec_startup.execution_time_ms:.1f}ms, {rulespec_startup.memory_peak_mb:.1f}MB")
print(" Benchmarking PolicyEngine startup...")
pe_startup = benchmark_policyengine_startup()
@@ -603,9 +579,6 @@ def run_all_benchmarks(sizes: list[str] = None) -> list[BenchmarkResult]:
print()
# Import once for subsequent benchmarks
- from policyengine_us import Simulation
- from cosilico.vectorized_executor import VectorizedExecutor
-
for size_name in sizes:
if size_name not in size_config:
print(f"Unknown size: {size_name}")
@@ -619,33 +592,33 @@ def run_all_benchmarks(sizes: list[str] = None) -> list[BenchmarkResult]:
print(f" Generating {n_records:,} synthetic records...")
data = create_synthetic_data(n_records)
- # Cosilico: AGI calculation
- print(" [Cosilico] AGI calculation...")
- cosilico_agi = benchmark_cosilico_microsim(
+ # RuleSpec runtime: AGI calculation
+ print(" [RuleSpec] AGI calculation...")
+ rulespec_agi = benchmark_rulespec_microsim(
data, AGI_DSL, ["adjusted_gross_income"],
name=f"agi_{size_name}"
)
- results.append(cosilico_agi)
- if cosilico_agi.success:
- print(f" Time: {cosilico_agi.execution_time_ms:.1f}ms")
- print(f" Throughput: {cosilico_agi.throughput_records_per_sec:,.0f} records/sec")
- print(f" Memory: {cosilico_agi.memory_peak_mb:.1f}MB")
+ results.append(rulespec_agi)
+ if rulespec_agi.success:
+ print(f" Time: {rulespec_agi.execution_time_ms:.1f}ms")
+ print(f" Throughput: {rulespec_agi.throughput_records_per_sec:,.0f} records/sec")
+ print(f" Memory: {rulespec_agi.memory_peak_mb:.1f}MB")
else:
- print(f" Error: {cosilico_agi.error}")
+ print(f" Error: {rulespec_agi.error}")
- # Cosilico: EITC calculation
- print(" [Cosilico] EITC calculation...")
- cosilico_eitc = benchmark_cosilico_microsim(
+ # RuleSpec runtime: EITC calculation
+ print(" [RuleSpec] EITC calculation...")
+ rulespec_eitc = benchmark_rulespec_microsim(
data, FULL_DSL, ["adjusted_gross_income", "earned_income_credit"],
name=f"eitc_{size_name}"
)
- results.append(cosilico_eitc)
- if cosilico_eitc.success:
- print(f" Time: {cosilico_eitc.execution_time_ms:.1f}ms")
- print(f" Throughput: {cosilico_eitc.throughput_records_per_sec:,.0f} records/sec")
- print(f" Memory: {cosilico_eitc.memory_peak_mb:.1f}MB")
+ results.append(rulespec_eitc)
+ if rulespec_eitc.success:
+ print(f" Time: {rulespec_eitc.execution_time_ms:.1f}ms")
+ print(f" Throughput: {rulespec_eitc.throughput_records_per_sec:,.0f} records/sec")
+ print(f" Memory: {rulespec_eitc.memory_peak_mb:.1f}MB")
else:
- print(f" Error: {cosilico_eitc.error}")
+ print(f" Error: {rulespec_eitc.error}")
# PolicyEngine benchmarks (limit to smaller sizes due to memory)
if n_records <= 10_000:
@@ -705,7 +678,7 @@ def generate_report(results: list[BenchmarkResult], output_path: Path) -> str:
by_name[key][r.framework] = r
report = []
- report.append("# Performance Benchmark: Cosilico vs PolicyEngine")
+ report.append("# Performance Benchmark: RuleSpec runtime vs PolicyEngine")
report.append("")
report.append(f"Generated: {pd.Timestamp.now().strftime('%Y-%m-%d %H:%M:%S')}")
report.append("")
@@ -713,33 +686,33 @@ def generate_report(results: list[BenchmarkResult], output_path: Path) -> str:
report.append("## Executive Summary")
report.append("")
report.append("This benchmark compares microsimulation performance between:")
- report.append("- **Cosilico**: Vectorized DSL-based approach with NumPy backend")
+ report.append("- **RuleSpec runtime**: Vectorized DSL-based approach with NumPy backend")
report.append("- **PolicyEngine-US**: Established OpenFisca-based framework")
report.append("")
# Startup comparison
if "startup" in by_name:
- cos = by_name["startup"].get("cosilico")
+ cos = by_name["startup"].get("rulespec")
pe = by_name["startup"].get("policyengine")
if cos and pe and cos.success and pe.success:
speedup = pe.execution_time_ms / cos.execution_time_ms
report.append("### Startup Time")
report.append("")
- report.append(f"| Metric | Cosilico | PolicyEngine | Speedup |")
- report.append(f"|--------|----------|--------------|---------|")
+ report.append("| Metric | RuleSpec | PolicyEngine | Speedup |")
+ report.append("|--------|----------|--------------|---------|")
report.append(f"| Load Time | {cos.execution_time_ms:.0f}ms | {pe.execution_time_ms:.0f}ms | {speedup:.1f}x |")
report.append(f"| Memory | {cos.memory_peak_mb:.1f}MB | {pe.memory_peak_mb:.1f}MB | {pe.memory_peak_mb/cos.memory_peak_mb:.1f}x |")
report.append("")
report.append("### Microsimulation Performance")
report.append("")
- report.append("| Benchmark | Records | Cosilico (ms) | PolicyEngine (ms) | Speedup |")
+ report.append("| Benchmark | Records | RuleSpec (ms) | PolicyEngine (ms) | Speedup |")
report.append("|-----------|---------|---------------|-------------------|---------|")
for name in sorted(by_name.keys()):
if name == "startup":
continue
- cos = by_name[name].get("cosilico")
+ cos = by_name[name].get("rulespec")
pe = by_name[name].get("policyengine")
if cos and cos.success:
@@ -767,13 +740,13 @@ def generate_report(results: list[BenchmarkResult], output_path: Path) -> str:
report.append("## Throughput Comparison")
report.append("")
- report.append("| Benchmark | Cosilico (records/sec) | PolicyEngine (records/sec) |")
+ report.append("| Benchmark | RuleSpec (records/sec) | PolicyEngine (records/sec) |")
report.append("|-----------|------------------------|----------------------------|")
for name in sorted(by_name.keys()):
if name == "startup":
continue
- cos = by_name[name].get("cosilico")
+ cos = by_name[name].get("rulespec")
pe = by_name[name].get("policyengine")
cos_tp = f"{cos.throughput_records_per_sec:,.0f}" if cos and cos.success else "N/A"
@@ -785,11 +758,11 @@ def generate_report(results: list[BenchmarkResult], output_path: Path) -> str:
report.append("## Memory Usage")
report.append("")
- report.append("| Benchmark | Cosilico (MB) | PolicyEngine (MB) |")
+ report.append("| Benchmark | RuleSpec (MB) | PolicyEngine (MB) |")
report.append("|-----------|---------------|-------------------|")
for name in sorted(by_name.keys()):
- cos = by_name[name].get("cosilico")
+ cos = by_name[name].get("rulespec")
pe = by_name[name].get("policyengine")
cos_mem = f"{cos.memory_peak_mb:.1f}" if cos and cos.success else "N/A"
@@ -801,9 +774,9 @@ def generate_report(results: list[BenchmarkResult], output_path: Path) -> str:
report.append("## Key Findings")
report.append("")
- report.append("### Cosilico Advantages")
+ report.append("### RuleSpec Advantages")
report.append("")
- report.append("1. **Faster Startup**: Cosilico's minimal dependencies result in faster import times")
+ report.append("1. **Faster Startup**: RuleSpec's minimal dependencies can result in faster import times")
report.append("2. **Pure NumPy Vectorization**: Operations compile to efficient NumPy operations")
report.append("3. **Lower Memory Footprint**: No object overhead per entity")
report.append("4. **Scales to Large Datasets**: Can handle 100k+ records efficiently")
@@ -818,7 +791,7 @@ def generate_report(results: list[BenchmarkResult], output_path: Path) -> str:
report.append("## Technical Notes")
report.append("")
- report.append("- Cosilico uses pure NumPy arrays for all calculations")
+ report.append("- RuleSpec runtime uses pure NumPy arrays for dense calculations")
report.append("- PolicyEngine creates Python objects per entity, which adds overhead")
report.append("- Memory measurements use Python's tracemalloc module")
report.append("- Timing uses time.perf_counter() for high-resolution measurements")
@@ -848,7 +821,6 @@ def generate_visualizations(results: list[BenchmarkResult], output_dir: Path):
"""Generate visualization plots from benchmark results."""
try:
import matplotlib.pyplot as plt
- import seaborn as sns
except ImportError:
print("matplotlib/seaborn not available - skipping visualizations")
return
@@ -873,7 +845,7 @@ def generate_visualizations(results: list[BenchmarkResult], output_dir: Path):
ax.set_ylabel("Records per Second")
ax.set_xlabel("Benchmark")
- ax.set_title("Microsimulation Throughput: Cosilico vs PolicyEngine")
+ ax.set_title("Microsimulation Throughput: RuleSpec vs PolicyEngine")
ax.legend(title="Framework")
ax.set_xticklabels(ax.get_xticklabels(), rotation=45, ha="right")
@@ -893,7 +865,7 @@ def generate_visualizations(results: list[BenchmarkResult], output_dir: Path):
ax.set_ylabel("Peak Memory (MB)")
ax.set_xlabel("Benchmark")
- ax.set_title("Memory Usage: Cosilico vs PolicyEngine")
+ ax.set_title("Memory Usage: RuleSpec vs PolicyEngine")
ax.legend(title="Framework")
ax.set_xticklabels(ax.get_xticklabels(), rotation=45, ha="right")
@@ -909,7 +881,7 @@ def generate_visualizations(results: list[BenchmarkResult], output_dir: Path):
ax.set_ylabel("Execution Time (ms, log scale)")
ax.set_xlabel("Benchmark")
- ax.set_title("Execution Time: Cosilico vs PolicyEngine")
+ ax.set_title("Execution Time: RuleSpec vs PolicyEngine")
ax.legend(title="Framework")
ax.set_xticklabels(ax.get_xticklabels(), rotation=45, ha="right")
@@ -948,7 +920,7 @@ def generate_visualizations(results: list[BenchmarkResult], output_dir: Path):
def main():
import argparse
- parser = argparse.ArgumentParser(description="Benchmark Cosilico vs PolicyEngine")
+ parser = argparse.ArgumentParser(description="Benchmark RuleSpec runtime vs PolicyEngine")
parser.add_argument(
"--size",
choices=["small", "medium", "large", "all"],
@@ -990,7 +962,7 @@ def main():
# Generate report
report_path = args.output / "policyengine_comparison.md"
- report = generate_report(results, report_path)
+ generate_report(results, report_path)
print(f"Report saved to {report_path}")
# Generate visualizations if requested
diff --git a/dashboard/pe_comparison.md b/dashboard/pe_comparison.md
index 2d3c545..2b266f0 100644
--- a/dashboard/pe_comparison.md
+++ b/dashboard/pe_comparison.md
@@ -7,11 +7,11 @@ _Generated: 2025-12-27 22:03_
| Dataset | Records | Weighted Population |
|---------|---------|---------------------|
| PE Enhanced CPS | 144,265 | 324,365,066 |
-| Cosilico CPS | 142,125 | 337,689,642 |
+| PolicyEngine CPS | 142,125 | 337,689,642 |
## Income Aggregates Comparison
-| Variable | PE Enhanced CPS | Cosilico CPS | Ratio | Status |
+| Variable | PE Enhanced CPS | PolicyEngine CPS | Ratio | Status |
|----------|-----------------|--------------|-------|--------|
| Employment Income | 11,595B | 12,072B | 1.04 | ✅ |
| Self-Employment | 617B | 532B | 0.86 | ⚠️ |
@@ -25,7 +25,7 @@ _Generated: 2025-12-27 22:03_
## Demographics Comparison
-| Age Group | PE Enhanced CPS | Cosilico CPS | Ratio |
+| Age Group | PE Enhanced CPS | PolicyEngine CPS | Ratio |
|-----------|-----------------|--------------|-------|
| Under 18 | 69.2M | 73.0M | 1.05 |
| 18-64 | 194.9M | 203.2M | 1.04 |
@@ -45,9 +45,9 @@ _Generated: 2025-12-27 22:03_
- **Self-Employment**: 0.86 ratio - PE has higher totals (~15% more)
- **Rental Income**: 0.89 ratio - PE slightly higher
-- **Unemployment**: 1.14 ratio - Cosilico slightly higher
+- **Unemployment**: 1.14 ratio - PolicyEngine slightly higher
-### Missing in Cosilico CPS
+### Missing in PolicyEngine CPS
- Capital gains (PE has 273B total)
- Separate taxable/tax-exempt breakdowns
@@ -55,5 +55,5 @@ _Generated: 2025-12-27 22:03_
### Weight Differences
-PE Enhanced CPS weights to 324M population while Cosilico CPS weights to 338M.
+PE Enhanced CPS weights to 324M population while PolicyEngine CPS weights to 338M.
This 4% difference affects all aggregate comparisons.
diff --git a/dashboard/policyengine_comparison.py b/dashboard/policyengine_comparison.py
index 727a630..8604210 100644
--- a/dashboard/policyengine_comparison.py
+++ b/dashboard/policyengine_comparison.py
@@ -13,7 +13,7 @@
# Paths
PE_US_DATA = Path("/Users/maxghenis/PolicyEngine/policyengine-us-data")
-COSILICO_DATA = Path("/Users/maxghenis/CosilicoAI/cosilico-data-sources")
+POLICYENGINE_DATA = Path("/Users/maxghenis/PolicyEngine/arch-data")
# Key variables to compare
@@ -66,7 +66,7 @@ def load_enhanced_cps(year: int = 2024) -> pd.DataFrame:
return EnhancedCPS.load(year)
except Exception:
# Fallback to local parquet
- path = COSILICO_DATA / f"micro/us/cps_{year}.parquet"
+ path = POLICYENGINE_DATA / f"micro/us/cps_{year}.parquet"
if path.exists():
return pd.read_parquet(path)
raise FileNotFoundError(f"Could not load Enhanced CPS for {year}")
@@ -240,7 +240,7 @@ def run_policyengine_comparison(
microplex = pd.read_parquet(microplex_path)
else:
# Use CPS as proxy
- cps_path = COSILICO_DATA / "micro/us/cps_2024.parquet"
+ cps_path = POLICYENGINE_DATA / "micro/us/cps_2024.parquet"
if cps_path.exists():
microplex = pd.read_parquet(cps_path)
print(f" Using CPS as proxy: {len(microplex):,} records")
diff --git a/dashboard/tracking.py b/dashboard/tracking.py
index 865ca38..d61f8ff 100644
--- a/dashboard/tracking.py
+++ b/dashboard/tracking.py
@@ -21,7 +21,7 @@
from datetime import datetime
# Data source paths
-COSILICO_DATA = Path("/Users/maxghenis/CosilicoAI/cosilico-data-sources")
+POLICYENGINE_DATA = Path("/Users/maxghenis/PolicyEngine/arch-data")
PE_US_DATA = Path("/Users/maxghenis/PolicyEngine/policyengine-us-data")
@@ -430,7 +430,7 @@ def load_policyengine_enhanced_cps(year: int = 2024) -> pd.DataFrame:
return ecps.load()
except ImportError:
# Fallback to local parquet
- path = COSILICO_DATA / f"micro/us/cps_{year}.parquet"
+ path = POLICYENGINE_DATA / f"micro/us/cps_{year}.parquet"
if path.exists():
return pd.read_parquet(path)
raise FileNotFoundError(f"Could not load Enhanced CPS for {year}")
@@ -463,7 +463,7 @@ def run_dashboard(
print(f" Loaded {len(microplex):,} records from {microplex_path}")
else:
# Load CPS as proxy
- cps_path = COSILICO_DATA / "micro/us/cps_2024.parquet"
+ cps_path = POLICYENGINE_DATA / "micro/us/cps_2024.parquet"
if cps_path.exists():
microplex = pd.read_parquet(cps_path)
print(f" Using CPS as proxy: {len(microplex):,} records")
diff --git a/docs/_config.yml b/docs/_config.yml
index 88efb42..3dd41fa 100644
--- a/docs/_config.yml
+++ b/docs/_config.yml
@@ -1,5 +1,5 @@
title: microplex
-author: Cosilico
+author: PolicyEngine
logo: ""
execute:
@@ -7,7 +7,7 @@ execute:
timeout: 300
repository:
- url: https://github.com/CosilicoAI/microplex
+ url: https://github.com/PolicyEngine/microplex
path_to_book: docs
branch: main
@@ -18,7 +18,7 @@ sphinx:
config:
html_theme: sphinx_book_theme
html_theme_options:
- repository_url: https://github.com/CosilicoAI/micro
+ repository_url: https://github.com/PolicyEngine/micro
use_repository_button: true
use_issues_button: true
autodoc_member_order: bysource
diff --git a/experiments/tracking/dashboard_data.json b/experiments/tracking/dashboard_data.json
index 3c3ca2c..4013d3a 100644
--- a/experiments/tracking/dashboard_data.json
+++ b/experiments/tracking/dashboard_data.json
@@ -288,9 +288,9 @@
]
},
"data_paths": {
- "synthetic": "/Users/maxghenis/CosilicoAI/microplex/experiments/tracking/data/exp_20260104_234612_synthetic.parquet",
- "holdout_coverage": "/Users/maxghenis/CosilicoAI/microplex/experiments/tracking/data/exp_20260104_234612_holdout_coverage.parquet",
- "model": "/Users/maxghenis/CosilicoAI/microplex/experiments/tracking/data/exp_20260104_234612_model.pt"
+ "synthetic": "/Users/maxghenis/PolicyEngine/microplex/experiments/tracking/data/exp_20260104_234612_synthetic.parquet",
+ "holdout_coverage": "/Users/maxghenis/PolicyEngine/microplex/experiments/tracking/data/exp_20260104_234612_holdout_coverage.parquet",
+ "model": "/Users/maxghenis/PolicyEngine/microplex/experiments/tracking/data/exp_20260104_234612_model.pt"
}
}
],
diff --git a/experiments/tracking/experiments/exp_20260104_234612.json b/experiments/tracking/experiments/exp_20260104_234612.json
index 841e57f..c1c38b0 100644
--- a/experiments/tracking/experiments/exp_20260104_234612.json
+++ b/experiments/tracking/experiments/exp_20260104_234612.json
@@ -295,7 +295,7 @@
],
"overall_coverage_median": 8.188886828928913e-07,
"overall_coverage_mean": 0.00010412254367491332,
- "synthetic_data_path": "/Users/maxghenis/CosilicoAI/microplex/experiments/tracking/data/exp_20260104_234612_synthetic.parquet",
- "holdout_coverage_path": "/Users/maxghenis/CosilicoAI/microplex/experiments/tracking/data/exp_20260104_234612_holdout_coverage.parquet",
- "model_path": "/Users/maxghenis/CosilicoAI/microplex/experiments/tracking/data/exp_20260104_234612_model.pt"
+ "synthetic_data_path": "/Users/maxghenis/PolicyEngine/microplex/experiments/tracking/data/exp_20260104_234612_synthetic.parquet",
+ "holdout_coverage_path": "/Users/maxghenis/PolicyEngine/microplex/experiments/tracking/data/exp_20260104_234612_holdout_coverage.parquet",
+ "model_path": "/Users/maxghenis/PolicyEngine/microplex/experiments/tracking/data/exp_20260104_234612_model.pt"
}
\ No newline at end of file
diff --git a/paper/index.md b/paper/index.md
index d57cc95..1e89153 100644
--- a/paper/index.md
+++ b/paper/index.md
@@ -8,7 +8,7 @@ kernelspec:
**Max Ghenis**
-max@cosilico.ai | Cosilico
+max@policyengine.org | PolicyEngine
```{code-cell} python
:tags: [remove-cell]
@@ -18,7 +18,7 @@ from paper_results import r
## Abstract
-Government surveys observe different slices of the same population: the Current Population Survey (CPS) captures employment and income, the Survey of Income and Program Participation (SIPP) tracks employment dynamics, and the Panel Study of Income Dynamics (PSID) follows families longitudinally. No single survey observes all variables for all people. I present microplex, a framework for learning per-variable conditional distributions $P(v \mid V_{\text{shared}})$ from multiple surveys and generating synthetic records with complete variable coverage. Because each variable is modeled conditionally on shared demographics (age, sex), the resulting synthetic data preserves within-source marginals but does not learn cross-source correlations — a limitation I discuss. I compare six synthesis methods — quantile regression forests (QRF), quantile deep neural networks (QDNN), and masked autoregressive flows (MAF), each with and without zero-inflation (ZI) handling — using Precision, Density, and Coverage metrics adapted from {cite:t}`naeem2020reliable`, evaluated against holdouts from each source survey across {eval}`r.n_seeds` random seeds. ZI-QRF achieves the highest SIPP coverage ({eval}`r.zi_qrf.sipp_pct`) while ZI-MAF achieves the highest CPS coverage ({eval}`r.zi_maf.cps_pct`), but the key finding is architectural: zero-inflation handling lifts MAF coverage by {eval}`r.zi_maf_vs_maf_lift` and QDNN by {eval}`r.zi_qdnn_vs_qdnn_lift`, while barely affecting QRF ({eval}`r.zi_qrf_vs_qrf_lift`). I also compare five calibration methods for reweighting synthetic populations, finding that entropy balancing achieves the lowest mean relative error ({eval}`r.rw_entropy.mean_error_pct`). Code is available at [github.com/CosilicoAI/microplex](https://github.com/CosilicoAI/microplex).
+Government surveys observe different slices of the same population: the Current Population Survey (CPS) captures employment and income, the Survey of Income and Program Participation (SIPP) tracks employment dynamics, and the Panel Study of Income Dynamics (PSID) follows families longitudinally. No single survey observes all variables for all people. I present microplex, a framework for learning per-variable conditional distributions $P(v \mid V_{\text{shared}})$ from multiple surveys and generating synthetic records with complete variable coverage. Because each variable is modeled conditionally on shared demographics (age, sex), the resulting synthetic data preserves within-source marginals but does not learn cross-source correlations — a limitation I discuss. I compare six synthesis methods — quantile regression forests (QRF), quantile deep neural networks (QDNN), and masked autoregressive flows (MAF), each with and without zero-inflation (ZI) handling — using Precision, Density, and Coverage metrics adapted from {cite:t}`naeem2020reliable`, evaluated against holdouts from each source survey across {eval}`r.n_seeds` random seeds. ZI-QRF achieves the highest SIPP coverage ({eval}`r.zi_qrf.sipp_pct`) while ZI-MAF achieves the highest CPS coverage ({eval}`r.zi_maf.cps_pct`), but the key finding is architectural: zero-inflation handling lifts MAF coverage by {eval}`r.zi_maf_vs_maf_lift` and QDNN by {eval}`r.zi_qdnn_vs_qdnn_lift`, while barely affecting QRF ({eval}`r.zi_qrf_vs_qrf_lift`). I also compare five calibration methods for reweighting synthetic populations, finding that entropy balancing achieves the lowest mean relative error ({eval}`r.rw_entropy.mean_error_pct`). Code is available at [github.com/PolicyEngine/microplex](https://github.com/PolicyEngine/microplex).
## Introduction
diff --git a/paper/myst.yml b/paper/myst.yml
index 364c943..47537f4 100644
--- a/paper/myst.yml
+++ b/paper/myst.yml
@@ -4,10 +4,10 @@ project:
description: "Comparing synthesis methods on PRDC coverage against holdouts from multiple partial surveys"
authors:
- name: Max Ghenis
- email: max@cosilico.ai
+ email: max@policyengine.org
affiliations:
- - Cosilico
- github: https://github.com/CosilicoAI/microplex
+ - PolicyEngine
+ github: https://github.com/PolicyEngine/microplex
license: MIT
keywords:
- synthetic data
@@ -31,6 +31,6 @@ site:
template: book-theme
title: "Microplex"
options:
- logo_text: Cosilico
- logo_url: https://cosilico.ai
+ logo_text: PolicyEngine
+ logo_url: https://policyengine.org
hide_outline: true
diff --git a/pipelines/data_loaders.py b/pipelines/data_loaders.py
index 53ad177..d792482 100644
--- a/pipelines/data_loaders.py
+++ b/pipelines/data_loaders.py
@@ -18,9 +18,9 @@
# Data paths
-COSILICO_DATA = Path("/Users/maxghenis/CosilicoAI/cosilico-data-sources")
-STORAGE_FOLDER = COSILICO_DATA / "storage"
-PSID_DATA_DIR = Path("/Users/maxghenis/CosilicoAI/psid/psid_data")
+POLICYENGINE_DATA = Path("/Users/maxghenis/PolicyEngine/arch-data")
+STORAGE_FOLDER = POLICYENGINE_DATA / "storage"
+PSID_DATA_DIR = Path("/Users/maxghenis/PolicyEngine/psid/psid_data")
# PSID variable mappings by year (codes change each survey wave)
@@ -269,7 +269,7 @@ def load_cps(
if not use_policyengine:
if path is None:
- path = COSILICO_DATA / "micro/us/cps_2024.parquet"
+ path = POLICYENGINE_DATA / "micro/us/cps_2024.parquet"
if not path.exists():
print(f" Warning: CPS file not found: {path}")
diff --git a/pipelines/us_microplex.py b/pipelines/us_microplex.py
index 4dba9f0..046d059 100644
--- a/pipelines/us_microplex.py
+++ b/pipelines/us_microplex.py
@@ -21,8 +21,8 @@
# Data paths
-COSILICO_DATA = Path("/Users/maxghenis/CosilicoAI/cosilico-data-sources")
-CPS_2024 = COSILICO_DATA / "micro/us/cps_2024.parquet"
+POLICYENGINE_DATA = Path("/Users/maxghenis/PolicyEngine/arch-data")
+CPS_2024 = POLICYENGINE_DATA / "micro/us/cps_2024.parquet"
# Variable definitions
diff --git a/pyproject.toml b/pyproject.toml
index baced51..ef78ec0 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -9,7 +9,7 @@ description = "Microdata synthesis and reweighting using normalizing flows"
readme = "README.md"
license = "MIT"
authors = [
- { name = "Cosilico", email = "hello@cosilico.ai" }
+ { name = "PolicyEngine", email = "hello@policyengine.org" }
]
keywords = [
"microdata",
@@ -88,9 +88,9 @@ all = [
]
[project.urls]
-Homepage = "https://github.com/CosilicoAI/microplex"
-Documentation = "https://cosilicoai.github.io/microplex"
-Repository = "https://github.com/CosilicoAI/microplex"
+Homepage = "https://github.com/PolicyEngine/microplex"
+Documentation = "https://policyengine.github.io/microplex"
+Repository = "https://github.com/PolicyEngine/microplex"
[tool.hatch.build.targets.wheel]
packages = ["src/microplex"]
diff --git a/scripts/build_block_geography.py b/scripts/build_block_geography.py
index 6c46ae5..75d490c 100644
--- a/scripts/build_block_geography.py
+++ b/scripts/build_block_geography.py
@@ -261,7 +261,7 @@ def build_block_probabilities(blocks: pd.DataFrame) -> pd.DataFrame:
def upload_to_supabase(df: pd.DataFrame, table_name: str) -> bool:
"""Upload processed data to Supabase PostgreSQL."""
- db_url = os.environ.get("COSILICO_SUPABASE_DB_URL")
+ db_url = os.environ.get("POLICYENGINE_SUPABASE_DB_URL")
if not db_url or not HAVE_PSYCOPG2:
print(f" Skipping Supabase upload (no connection)")
return False
diff --git a/scripts/calibration_method_comparison.py b/scripts/calibration_method_comparison.py
index 3eb9953..eab0e63 100644
--- a/scripts/calibration_method_comparison.py
+++ b/scripts/calibration_method_comparison.py
@@ -8,7 +8,7 @@
"""
import sys
-sys.path.insert(0, '/Users/maxghenis/CosilicoAI/microplex/src')
+sys.path.insert(0, '/Users/maxghenis/PolicyEngine/microplex/src')
import pandas as pd
import numpy as np
@@ -22,9 +22,9 @@
print("="*70)
# Load data - use synthetic file with matching CD IDs
-synth = pd.read_parquet('/Users/maxghenis/CosilicoAI/microplex/data/microplex_synthetic_with_blocks.parquet')
+synth = pd.read_parquet('/Users/maxghenis/PolicyEngine/microplex/data/microplex_synthetic_with_blocks.parquet')
synth['state_fips'] = synth['state_fips'].astype(str).str.zfill(2)
-blocks = pd.read_parquet('/Users/maxghenis/CosilicoAI/microplex/data/block_probabilities.parquet')
+blocks = pd.read_parquet('/Users/maxghenis/PolicyEngine/microplex/data/block_probabilities.parquet')
print(f"Loaded {len(synth):,} households")
# =============================================================================
@@ -552,7 +552,7 @@ def compute_errors(weights, A_cat, b_cat, A_cont, b_cont):
f'{len(income_targets)} income + {len(benefit_targets)} benefit continuous)',
fontsize=13, y=1.02)
plt.tight_layout()
-plt.savefig('/Users/maxghenis/CosilicoAI/microplex/docs/calibration_method_comparison.png',
+plt.savefig('/Users/maxghenis/PolicyEngine/microplex/docs/calibration_method_comparison.png',
dpi=150, bbox_inches='tight')
print(f"\n✅ Saved: docs/calibration_method_comparison.png")
diff --git a/scripts/load_pe_targets.py b/scripts/load_pe_targets.py
index 24c63f7..1eca42d 100644
--- a/scripts/load_pe_targets.py
+++ b/scripts/load_pe_targets.py
@@ -14,7 +14,7 @@
# Supabase connection
SUPABASE_URL = "https://nsupqhfchdtqclomlrgs.supabase.co"
-SUPABASE_KEY = os.environ.get("COSILICO_SUPABASE_SERVICE_KEY")
+SUPABASE_KEY = os.environ.get("POLICYENGINE_SUPABASE_SERVICE_KEY")
PE_BASE = "https://raw.githubusercontent.com/PolicyEngine/policyengine-us-data/main/policyengine_us_data/storage/calibration_targets"
@@ -40,7 +40,7 @@ class BatchSupabaseClient:
def __init__(self, url: str, key: str, schema: str = "microplex"):
if not key:
raise ValueError(
- "COSILICO_SUPABASE_SERVICE_KEY must be set before loading "
+ "POLICYENGINE_SUPABASE_SERVICE_KEY must be set before loading "
"PolicyEngine calibration targets."
)
self.base_url = f"{url}/rest/v1"
diff --git a/scripts/run_supabase_calibration.py b/scripts/run_supabase_calibration.py
index 8caa41b..e757303 100644
--- a/scripts/run_supabase_calibration.py
+++ b/scripts/run_supabase_calibration.py
@@ -109,10 +109,10 @@ def __init__(self):
"SUPABASE_URL",
"https://nsupqhfchdtqclomlrgs.supabase.co"
)
- self.key = os.environ.get("COSILICO_SUPABASE_SERVICE_KEY")
+ self.key = os.environ.get("POLICYENGINE_SUPABASE_SERVICE_KEY")
if not self.key:
raise ValueError(
- "COSILICO_SUPABASE_SERVICE_KEY must be set before running "
+ "POLICYENGINE_SUPABASE_SERVICE_KEY must be set before running "
"Supabase calibration."
)
self.base_url = f"{self.url}/rest/v1"
diff --git a/scripts/targets_dashboard.py b/scripts/targets_dashboard.py
index 28e6769..196423a 100644
--- a/scripts/targets_dashboard.py
+++ b/scripts/targets_dashboard.py
@@ -2,7 +2,7 @@
Targets Comparison Dashboard
Compares calibration targets across data sources:
-- Cosilico (our targets)
+- PolicyEngine (our targets)
- PolicyEngine (policyengine_us_data)
- Yale Tax Simulator
- PSL Tax-Calculator
@@ -80,12 +80,12 @@ class TargetSource:
}
-def load_cosilico_targets(data_source_path: Path) -> TargetSource:
- """Load Cosilico targets from cosilico-data-sources."""
+def load_policyengine_targets(data_source_path: Path) -> TargetSource:
+ """Load PolicyEngine targets from arch-data."""
source = TargetSource(
- name="Cosilico",
- url="https://github.com/CosilicoAI/cosilico-data-sources",
- description="Cosilico's calibration targets from IRS SOI and Census",
+ name="PolicyEngine",
+ url="https://github.com/PolicyEngine/arch-data",
+ description="PolicyEngine's calibration targets from IRS SOI and Census",
)
targets_dir = data_source_path / "data" / "targets"
@@ -417,7 +417,7 @@ def generate_html_dashboard(
🗺️ Geographic Granularity
| Source | National | State | County | ZIP |
- | Cosilico | ✓ | ✓ | 🚧 | 🚧 |
+ | PolicyEngine | ✓ | ✓ | 🚧 | 🚧 |
| PolicyEngine | ✓ | ✓ | ✗ | ✗ |
| Yale TAXSIM | ✓ | ✓ | ✗ | ✗ |
| PSL Tax-Calculator | ✓ | ✗ | ✗ | ✗ |
@@ -425,7 +425,7 @@ def generate_html_dashboard(
📋 Target Categories
- | Category | Cosilico | PolicyEngine | Yale | PSL |
+ | Category | PolicyEngine | PolicyEngine | Yale | PSL |
| Income Distribution (AGI brackets) |
✓ | ✓ | ✓ | ✓ |
@@ -464,7 +464,7 @@ def generate_html_dashboard(
| Source | Method | Sparsity |
- | Cosilico |
+ PolicyEngine |
Cross-Category Selection + IPF (SparseCalibrator) |
✓ Controllable |
@@ -486,7 +486,7 @@ def generate_html_dashboard(
@@ -526,21 +526,21 @@ def main():
parser.add_argument("--json", type=str, default=None, help="Also output JSON")
args = parser.parse_args()
- # Find cosilico-data-sources
+ # Find arch-data
script_dir = Path(__file__).parent
- data_source_path = script_dir.parent.parent / "cosilico-data-sources"
+ data_source_path = script_dir.parent.parent / "arch-data"
print("Loading target sources...")
sources = []
- # Load Cosilico targets
+ # Load PolicyEngine targets
if data_source_path.exists():
- cosilico = load_cosilico_targets(data_source_path)
- sources.append(cosilico)
- print(f" ✓ {cosilico.name}: {len(cosilico.coverage)} coverage areas")
+ policyengine = load_policyengine_targets(data_source_path)
+ sources.append(policyengine)
+ print(f" ✓ {policyengine.name}: {len(policyengine.coverage)} coverage areas")
else:
- print(f" ✗ Cosilico: data sources not found at {data_source_path}")
+ print(f" ✗ PolicyEngine: data sources not found at {data_source_path}")
# Load other sources
policyengine = load_policyengine_targets()
diff --git a/src/microplex/data_sources/__init__.py b/src/microplex/data_sources/__init__.py
index 6b6e283..a4291f9 100644
--- a/src/microplex/data_sources/__init__.py
+++ b/src/microplex/data_sources/__init__.py
@@ -5,7 +5,7 @@
- CPS ASEC (Census Bureau's primary income/poverty survey)
- PSID (Panel Study of Income Dynamics - longitudinal household survey)
- PUF (Public Use File - tax return data)
-- CPS to Cosilico variable mappings with legal references
+- CPS to PolicyEngine variable mappings with legal references
- Data transformation utilities
"""
@@ -35,7 +35,7 @@
)
from microplex.data_sources.cps_transform import (
TransformedDataset,
- transform_cps_to_cosilico,
+ transform_cps_to_policyengine,
)
from microplex.data_sources.puf import (
load_puf,
@@ -84,7 +84,7 @@
"coverage_summary",
# Transform
"TransformedDataset",
- "transform_cps_to_cosilico",
+ "transform_cps_to_policyengine",
# PUF loading
"load_puf",
"download_puf",
diff --git a/src/microplex/data_sources/cps_mappings.py b/src/microplex/data_sources/cps_mappings.py
index 7515f58..3c063de 100644
--- a/src/microplex/data_sources/cps_mappings.py
+++ b/src/microplex/data_sources/cps_mappings.py
@@ -1,9 +1,9 @@
"""
-CPS ASEC -> cosilico-us variable mappings.
+CPS ASEC -> policyengine-us variable mappings.
-Maps Census CPS columns to statute-defined variables in cosilico-us.
+Maps Census CPS columns to statute-defined variables in policyengine-us.
Each mapping documents:
-- The cosilico-us variable it maps to
+- The policyengine-us variable it maps to
- The statutory reference (USC section)
- CPS columns used
- Coverage level (full, partial, derived, none)
@@ -18,7 +18,7 @@
class CoverageLevel(Enum):
- """How well CPS covers a cosilico-us variable."""
+ """How well CPS covers a policyengine-us variable."""
FULL = "full" # CPS provides all required data
PARTIAL = "partial" # CPS provides some, with known gaps
@@ -38,9 +38,9 @@ class CoverageGap:
@dataclass
class VariableMapping:
- """Metadata for a CPS -> cosilico-us variable mapping."""
+ """Metadata for a CPS -> policyengine-us variable mapping."""
- cosilico_us_variable: str
+ policyengine_us_variable: str
statute_ref: str
cps_columns: list[str]
coverage: CoverageLevel
@@ -59,7 +59,7 @@ class VariableMapping:
def _register(mapping: VariableMapping) -> VariableMapping:
"""Register a mapping in the global registry."""
- _MAPPINGS[mapping.cosilico_us_variable] = mapping
+ _MAPPINGS[mapping.policyengine_us_variable] = mapping
return mapping
@@ -68,7 +68,7 @@ def _register(mapping: VariableMapping) -> VariableMapping:
# =============================================================================
_register(VariableMapping(
- cosilico_us_variable="age",
+ policyengine_us_variable="age",
statute_ref="26 USC 63(f), 24(c)(1)",
cps_columns=["A_AGE"],
coverage=CoverageLevel.FULL,
@@ -85,7 +85,7 @@ def map_age(persons: pl.DataFrame) -> pl.DataFrame:
_register(VariableMapping(
- cosilico_us_variable="household_size",
+ policyengine_us_variable="household_size",
statute_ref="7 USC 2014(c)",
cps_columns=["H_NUMPER"],
coverage=CoverageLevel.FULL,
@@ -106,7 +106,7 @@ def map_household_size(households: pl.DataFrame) -> pl.DataFrame:
# =============================================================================
_register(VariableMapping(
- cosilico_us_variable="earned_income",
+ policyengine_us_variable="earned_income",
statute_ref="26 USC 32(c)(2) - Earned income defined",
cps_columns=["WSAL_VAL", "SEMP_VAL"],
coverage=CoverageLevel.FULL,
@@ -133,7 +133,7 @@ def map_earned_income(persons: pl.DataFrame) -> pl.DataFrame:
# =============================================================================
_register(VariableMapping(
- cosilico_us_variable="filing_status",
+ policyengine_us_variable="filing_status",
statute_ref="26 USC 1 (tax rates by status), 2 (definitions)",
cps_columns=["A_MARITL", "A_AGE", "A_EXPRRP"],
coverage=CoverageLevel.DERIVED,
@@ -194,7 +194,7 @@ def map_filing_status(persons: pl.DataFrame) -> pl.DataFrame:
# =============================================================================
_register(VariableMapping(
- cosilico_us_variable="is_blind",
+ policyengine_us_variable="is_blind",
statute_ref="26 USC 63(f)(2) - Additional standard deduction for blind",
cps_columns=["PEDISEYE"],
coverage=CoverageLevel.FULL,
@@ -217,7 +217,7 @@ def map_is_blind(persons: pl.DataFrame) -> pl.DataFrame:
# =============================================================================
_register(VariableMapping(
- cosilico_us_variable="is_dependent",
+ policyengine_us_variable="is_dependent",
statute_ref="26 USC 152 - Dependent defined",
cps_columns=["A_EXPRRP", "A_AGE", "WSAL_VAL"],
coverage=CoverageLevel.DERIVED,
@@ -263,7 +263,7 @@ def map_is_dependent(persons: pl.DataFrame) -> pl.DataFrame:
# =============================================================================
_register(VariableMapping(
- cosilico_us_variable="ctc_qualifying_children",
+ policyengine_us_variable="ctc_qualifying_children",
statute_ref="26 USC 24(c) - Qualifying child (under 17, per 152(c))",
cps_columns=["A_AGE", "A_EXPRRP", "PH_SEQ", "A_LINENO"],
coverage=CoverageLevel.DERIVED,
@@ -335,7 +335,7 @@ def map_ctc_qualifying_children(persons: pl.DataFrame) -> pl.DataFrame:
# =============================================================================
_register(VariableMapping(
- cosilico_us_variable="adjusted_gross_income",
+ policyengine_us_variable="adjusted_gross_income",
statute_ref="26 USC 62(a) - Adjusted gross income defined",
cps_columns=["WSAL_VAL", "SEMP_VAL", "INT_VAL", "DIV_VAL", "PNSN_VAL"],
coverage=CoverageLevel.PARTIAL,
@@ -437,6 +437,6 @@ def coverage_summary() -> dict[str, list[str]]:
result = {level.value: [] for level in CoverageLevel}
for mapping in _MAPPINGS.values():
- result[mapping.coverage.value].append(mapping.cosilico_us_variable)
+ result[mapping.coverage.value].append(mapping.policyengine_us_variable)
return result
diff --git a/src/microplex/data_sources/cps_transform.py b/src/microplex/data_sources/cps_transform.py
index a88e54e..5c3107a 100644
--- a/src/microplex/data_sources/cps_transform.py
+++ b/src/microplex/data_sources/cps_transform.py
@@ -1,7 +1,7 @@
"""
-Transform CPS data to cosilico-us variables.
+Transform CPS data to policyengine-us variables.
-Applies all CPS -> cosilico-us mappings and constructs tax units.
+Applies all CPS -> policyengine-us mappings and constructs tax units.
"""
from dataclasses import dataclass, field
@@ -24,7 +24,7 @@
@dataclass
class TransformedDataset:
- """CPS data transformed to cosilico-us variables."""
+ """CPS data transformed to policyengine-us variables."""
persons: pl.DataFrame
tax_units: pl.DataFrame
@@ -45,9 +45,9 @@ def summary(self) -> dict:
}
-def transform_cps_to_cosilico(cps: CPSDataset) -> TransformedDataset:
+def transform_cps_to_policyengine(cps: CPSDataset) -> TransformedDataset:
"""
- Transform CPS data to cosilico-us variables.
+ Transform CPS data to policyengine-us variables.
Steps:
1. Apply person-level mappings (age, earned_income, is_blind, etc.)
@@ -290,18 +290,18 @@ def _generate_coverage_report() -> dict:
for m in mappings:
if m.coverage == CoverageLevel.FULL:
- full.append(m.cosilico_us_variable)
+ full.append(m.policyengine_us_variable)
elif m.coverage == CoverageLevel.PARTIAL:
- partial.append(m.cosilico_us_variable)
+ partial.append(m.policyengine_us_variable)
elif m.coverage == CoverageLevel.DERIVED:
- derived.append(m.cosilico_us_variable)
+ derived.append(m.policyengine_us_variable)
else:
- none.append(m.cosilico_us_variable)
+ none.append(m.policyengine_us_variable)
# Collect gaps
for gap in m.gaps:
gaps.append({
- "variable": m.cosilico_us_variable,
+ "variable": m.policyengine_us_variable,
"component": gap.component,
"statute_ref": gap.statute_ref,
"impact": gap.impact,
diff --git a/src/microplex/targets/database.py b/src/microplex/targets/database.py
index cbc7155..12f04b9 100644
--- a/src/microplex/targets/database.py
+++ b/src/microplex/targets/database.py
@@ -2,7 +2,7 @@
Calibration Targets Database
Stores and manages calibration targets from multiple sources,
-with mappings to RAC variables for Cosilico integration.
+with mappings to RAC variables for PolicyEngine integration.
"""
from dataclasses import dataclass, field
@@ -82,7 +82,7 @@ class TargetsDatabase:
Database of calibration targets from multiple sources.
Maintains parity with PolicyEngine targets while adding
- RAC variable mappings for Cosilico integration.
+ RAC variable mappings for PolicyEngine integration.
"""
targets: list[Target] = field(default_factory=list)
_by_category: dict[TargetCategory, list[Target]] = field(default_factory=dict)
@@ -200,7 +200,7 @@ def compare_to_policyengine(self, pe_targets: pd.DataFrame) -> pd.DataFrame:
left_on=["name", "year"],
right_on=["Variable", "Year"],
how="outer",
- suffixes=("_cosilico", "_pe"),
+ suffixes=("_policyengine", "_pe"),
)
comparison["difference"] = comparison["value"] - comparison["Value"]
diff --git a/src/microplex/targets/rac_mapping.py b/src/microplex/targets/rac_mapping.py
index 88cf5bd..3e795be 100644
--- a/src/microplex/targets/rac_mapping.py
+++ b/src/microplex/targets/rac_mapping.py
@@ -1,7 +1,7 @@
"""
RAC Variable Mapping
-Maps calibration target variables to Cosilico RAC (statute) definitions.
+Maps calibration target variables to PolicyEngine RAC (statute) definitions.
Enables validation of microdata against encoded tax law.
"""
@@ -10,7 +10,7 @@
@dataclass
class RACVariable:
- """A variable defined in Cosilico RAC."""
+ """A variable defined in PolicyEngine RAC."""
name: str
statute: str # e.g., "26/62" for IRC Section 62
description: str
@@ -20,7 +20,7 @@ class RACVariable:
# Map from target variable names to RAC definitions
-# Based on cosilico-us/statute structure
+# Based on policyengine-us/statute structure
RAC_VARIABLE_MAP: dict[str, RACVariable] = {
# Income (IRC Section 61 - Gross Income)
"adjusted_gross_income": RACVariable(
diff --git a/targets_comparison.html b/targets_comparison.html
index aca93fd..0e59cf7 100644
--- a/targets_comparison.html
+++ b/targets_comparison.html
@@ -86,9 +86,9 @@ 📊 Key Metrics (IRS SOI 2021)
📦 Data Sources
@@ -120,7 +120,7 @@
✓ Target Coverage Comparison
| Target |
- Cosilico |
+ PolicyEngine |
PolicyEngine |
Yale Tax Simulator |
PSL Tax-Calculator |
@@ -328,7 +328,7 @@ 📈 Target Value Comparison
| Total Returns |
- Cosilico |
+ PolicyEngine |
161,165,563 |
153,774,320 |
+4.8% |
@@ -338,7 +338,7 @@ 📈 Target Value Comparison
🗺️ Geographic Granularity
| Source | National | State | County | ZIP |
- | Cosilico | ✓ | ✓ | 🚧 | 🚧 |
+ | PolicyEngine | ✓ | ✓ | 🚧 | 🚧 |
| PolicyEngine | ✓ | ✓ | ✗ | ✗ |
| Yale TAXSIM | ✓ | ✓ | ✗ | ✗ |
| PSL Tax-Calculator | ✓ | ✗ | ✗ | ✗ |
@@ -346,7 +346,7 @@ 🗺️ Geographic Granularity
📋 Target Categories
- | Category | Cosilico | PolicyEngine | Yale | PSL |
+ | Category | PolicyEngine | PolicyEngine | Yale | PSL |
| Income Distribution (AGI brackets) |
✓ | ✓ | ✓ | ✓ |
@@ -385,7 +385,7 @@ 🔧 Calibration Methods
| Source | Method | Sparsity |
- | Cosilico |
+ PolicyEngine |
Cross-Category Selection + IPF (SparseCalibrator) |
✓ Controllable |
@@ -407,7 +407,7 @@ 🔧 Calibration Methods
diff --git a/targets_comparison.json b/targets_comparison.json
index ba63a2d..cabfe2c 100644
--- a/targets_comparison.json
+++ b/targets_comparison.json
@@ -43,9 +43,9 @@
"ctc_amount": 122000000000
},
"sources": {
- "Cosilico": {
- "url": "https://github.com/CosilicoAI/cosilico-data-sources",
- "description": "Cosilico's calibration targets from IRS SOI and Census",
+ "PolicyEngine": {
+ "url": "https://github.com/PolicyEngine/arch-data",
+ "description": "PolicyEngine's calibration targets from IRS SOI and Census",
"coverage": {
"state_income_distribution": true,
"national_agi_brackets": true,
diff --git a/tests/test_supabase_client.py b/tests/test_supabase_client.py
index 56d58b3..b7af10e 100644
--- a/tests/test_supabase_client.py
+++ b/tests/test_supabase_client.py
@@ -236,7 +236,7 @@ class TestBatchIntegration:
def live_client(self):
"""Create a client connected to real Supabase."""
url = os.environ.get("SUPABASE_URL")
- key = os.environ.get("COSILICO_SUPABASE_SERVICE_KEY")
+ key = os.environ.get("POLICYENGINE_SUPABASE_SERVICE_KEY")
if not url or not key:
pytest.skip("No Supabase credentials - skipping integration test")
if SupabaseClient is None: