Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,8 @@ When reviewing recent changes here, check:

For a quick review, read:

1. [`/Users/maxghenis/CosilicoAI/microplex/AGENTS.md`](/Users/maxghenis/CosilicoAI/microplex/AGENTS.md)
2. [`/Users/maxghenis/CosilicoAI/microplex/_WORKSPACE.md`](/Users/maxghenis/CosilicoAI/microplex/_WORKSPACE.md)
3. [`/Users/maxghenis/CosilicoAI/microplex/_BUILD_LOG.md`](/Users/maxghenis/CosilicoAI/microplex/_BUILD_LOG.md)
1. [`/Users/maxghenis/PolicyEngine/microplex/AGENTS.md`](/Users/maxghenis/PolicyEngine/microplex/AGENTS.md)
2. [`/Users/maxghenis/PolicyEngine/microplex/_WORKSPACE.md`](/Users/maxghenis/PolicyEngine/microplex/_WORKSPACE.md)
3. [`/Users/maxghenis/PolicyEngine/microplex/_BUILD_LOG.md`](/Users/maxghenis/PolicyEngine/microplex/_BUILD_LOG.md)

Then inspect changed files and return findings first.
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
MIT License

Copyright (c) 2024 Cosilico
Copyright (c) 2024 PolicyEngine

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
6 changes: 3 additions & 3 deletions QRF_BENCHMARK_SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ This benchmark compares **microplex** (normalizing flows with two-stage zero-inf

### Recommendation

**Transition from Sequential QRF to microplex for PolicyEngine/Cosilico production use.**
**Transition from Sequential QRF to microplex for PolicyEngine/PolicyEngine production use.**

microplex provides superior statistical fidelity while being significantly faster, making it ideal for:
- CPS/ACS income imputation
Expand Down Expand Up @@ -186,7 +186,7 @@ microplex provides superior statistical fidelity while being significantly faste
### Run the Benchmark

```bash
cd /Users/maxghenis/CosilicoAI/micro
cd /Users/maxghenis/PolicyEngine/micro

# Install dependencies
pip install scikit-learn>=1.3 matplotlib seaborn
Expand Down Expand Up @@ -218,7 +218,7 @@ Results saved to `benchmarks/results/`:

## Conclusion

**microplex is the superior choice for PolicyEngine/Cosilico microdata enhancement.**
**microplex is the superior choice for PolicyEngine/PolicyEngine microdata enhancement.**

The benchmarks demonstrate:
1. ✅ **5.5x better marginal fidelity** - Critical for accurate policy estimates
Expand Down
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,8 @@
Multi-source microdata synthesis and survey reweighting.

[![PyPI](https://img.shields.io/pypi/v/microplex.svg)](https://pypi.org/project/microplex/)
[![Tests](https://github.com/CosilicoAI/microplex/actions/workflows/test.yml/badge.svg)](https://github.com/CosilicoAI/microplex/actions/workflows/test.yml)
[![Docs](https://github.com/CosilicoAI/microplex/actions/workflows/docs.yml/badge.svg)](https://cosilicoai.github.io/microplex)
[![Tests](https://github.com/PolicyEngine/microplex/actions/workflows/test.yml/badge.svg)](https://github.com/PolicyEngine/microplex/actions/workflows/test.yml)
[![Docs](https://github.com/PolicyEngine/microplex/actions/workflows/docs.yml/badge.svg)](https://policyengine.github.io/microplex)

## Overview

Expand Down Expand Up @@ -145,11 +145,11 @@ print(f"Using {stats['n_nonzero']} of {stats['n_records']} records")

## Documentation

Full documentation at [cosilicoai.github.io/microplex](https://cosilicoai.github.io/microplex)
Full documentation at [policyengine.github.io/microplex](https://policyengine.github.io/microplex)

- [Tutorial](https://cosilicoai.github.io/microplex/tutorial.html)
- [API Reference](https://cosilicoai.github.io/microplex/api.html)
- [Benchmarks](https://cosilicoai.github.io/microplex/benchmarks.html)
- [Tutorial](https://policyengine.github.io/microplex/tutorial.html)
- [API Reference](https://policyengine.github.io/microplex/api.html)
- [Benchmarks](https://policyengine.github.io/microplex/benchmarks.html)

## Benchmarks

Expand All @@ -166,7 +166,7 @@ See [benchmarks/](benchmarks/) for synthesis method comparisons:
author = {Ghenis, Max},
title = {microplex: Multi-source microdata synthesis and survey reweighting},
year = {2025},
url = {https://github.com/CosilicoAI/microplex}
url = {https://github.com/PolicyEngine/microplex}
}
```

Expand Down
4 changes: 2 additions & 2 deletions _WORKSPACE.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ This file is the durable local context for `microplex` core.

Sibling repos:

- [`/Users/maxghenis/CosilicoAI/microplex-us`](/Users/maxghenis/CosilicoAI/microplex-us)
- [`/Users/maxghenis/CosilicoAI/microplex-uk`](/Users/maxghenis/CosilicoAI/microplex-uk)
- [`/Users/maxghenis/PolicyEngine/microplex-us`](/Users/maxghenis/PolicyEngine/microplex-us)
- [`/Users/maxghenis/PolicyEngine/microplex-uk`](/Users/maxghenis/PolicyEngine/microplex-uk)

## Current shared seams

Expand Down
16 changes: 8 additions & 8 deletions benchmarks/DISTRIBUTIONAL_METRICS.md
Original file line number Diff line number Diff line change
Expand Up @@ -201,32 +201,32 @@ predictions += np.random.normal(0, fixed_noise_scale) # Fixed noise
## Files Created

### Core Metrics Module
- `/Users/maxghenis/CosilicoAI/micro/benchmarks/metrics.py`
- `/Users/maxghenis/PolicyEngine/micro/benchmarks/metrics.py`
- All distributional quality metrics
- Comprehensive evaluation function
- Report generation utilities

### Benchmark Scripts
- `/Users/maxghenis/CosilicoAI/micro/benchmarks/run_distributional_benchmark.py`
- `/Users/maxghenis/PolicyEngine/micro/benchmarks/run_distributional_benchmark.py`
- Runs full distributional quality benchmark
- Generates visualizations and report
- Compares QRF vs microplex

### Results
- `/Users/maxghenis/CosilicoAI/micro/benchmarks/results/distributional_quality.md`
- `/Users/maxghenis/PolicyEngine/micro/benchmarks/results/distributional_quality.md`
- Full analysis report
- `/Users/maxghenis/CosilicoAI/micro/benchmarks/results/distributional_metrics.json`
- `/Users/maxghenis/PolicyEngine/micro/benchmarks/results/distributional_metrics.json`
- Raw metrics in JSON format
- `/Users/maxghenis/CosilicoAI/micro/benchmarks/results/distributional_*.png`
- `/Users/maxghenis/PolicyEngine/micro/benchmarks/results/distributional_*.png`
- 5 visualization files

## Usage

```bash
cd /Users/maxghenis/CosilicoAI/micro
cd /Users/maxghenis/PolicyEngine/micro

# Run full distributional benchmark
/Users/maxghenis/CosilicoAI/micro/.venv/bin/python benchmarks/run_distributional_benchmark.py
/Users/maxghenis/PolicyEngine/micro/.venv/bin/python benchmarks/run_distributional_benchmark.py

# Results will be saved to benchmarks/results/
```
Expand Down Expand Up @@ -262,4 +262,4 @@ cd /Users/maxghenis/CosilicoAI/micro
---

**Created:** December 25, 2024
**Location:** `/Users/maxghenis/CosilicoAI/micro/benchmarks/DISTRIBUTIONAL_METRICS.md`
**Location:** `/Users/maxghenis/PolicyEngine/micro/benchmarks/DISTRIBUTIONAL_METRICS.md`
2 changes: 1 addition & 1 deletion benchmarks/MULTIVARIATE_METRICS.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ This tests the full joint distribution, not just target variables.
### Running the Benchmark

```bash
cd /Users/maxghenis/CosilicoAI/micro
cd /Users/maxghenis/PolicyEngine/micro
source .venv/bin/activate
python benchmarks/run_multivariate_benchmark.py
```
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,6 @@ benchmarks/results/
author = {Ghenis, Max},
title = {microplex: Multi-source microdata synthesis and survey reweighting},
year = {2025},
url = {https://github.com/CosilicoAI/microplex}
url = {https://github.com/PolicyEngine/microplex}
}
```
2 changes: 1 addition & 1 deletion benchmarks/compare_policyengine.py
Original file line number Diff line number Diff line change
Expand Up @@ -945,7 +945,7 @@ def generate_report(
## Reproducibility

```bash
cd /Users/maxghenis/CosilicoAI/micro
cd /Users/maxghenis/PolicyEngine/micro
source .venv/bin/activate
python benchmarks/compare_policyengine.py
```
Expand Down
4 changes: 2 additions & 2 deletions benchmarks/results/BENCHMARK_REPORT.md
Original file line number Diff line number Diff line change
Expand Up @@ -295,7 +295,7 @@ For PolicyEngine and economic microsimulation applications, **microplex is the r

## Files Generated

All benchmark artifacts saved to `/Users/maxghenis/CosilicoAI/micro/benchmarks/results/`:
All benchmark artifacts saved to `/Users/maxghenis/PolicyEngine/micro/benchmarks/results/`:

- `results.csv` - Summary metrics table
- `results.md` - Markdown results
Expand All @@ -312,7 +312,7 @@ All benchmark artifacts saved to `/Users/maxghenis/CosilicoAI/micro/benchmarks/r
To reproduce these benchmarks:

```bash
cd /Users/maxghenis/CosilicoAI/micro
cd /Users/maxghenis/PolicyEngine/micro
python benchmarks/run_benchmarks.py
```

Expand Down
6 changes: 3 additions & 3 deletions benchmarks/results/INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ If using these results:
```bibtex
@misc{microplex_benchmarks_2024,
title={microplex Benchmark Results: Comparison Against CT-GAN, TVAE, and Copula},
author={Cosilico},
author={PolicyEngine},
year={2024},
note={Results show 3.3x better marginal fidelity, 1.7x better correlation
preservation, and 2.5x better zero-inflation handling}
Expand All @@ -212,7 +212,7 @@ If using these results:
All results are fully reproducible:

```bash
cd /Users/maxghenis/CosilicoAI/micro
cd /Users/maxghenis/PolicyEngine/micro
python benchmarks/run_benchmarks.py
```

Expand All @@ -224,7 +224,7 @@ python benchmarks/run_benchmarks.py
## Contact

For questions about these benchmarks:
- Open an issue at github.com/CosilicoAI/microplex
- Open an issue at github.com/PolicyEngine/microplex
- See BENCHMARK_REPORT.md for technical details
- See ISSUES_FOUND.md for known limitations

Expand Down
2 changes: 1 addition & 1 deletion benchmarks/results/ISSUES_FOUND.md
Original file line number Diff line number Diff line change
Expand Up @@ -311,6 +311,6 @@ result.median_dcr = np.median(distances)
The opportunities identified above would make the benchmarks more comprehensive and demonstrate microplex's value even more clearly, but the current results already show:
- **Clear superiority** across all fidelity metrics
- **Practical performance** for real-world use
- **Ready for production** deployment in PolicyEngine/Cosilico
- **Ready for production** deployment in PolicyEngine/PolicyEngine

Main next step: **Test on real microdata** (CPS, ACS) to validate performance claims.
10 changes: 5 additions & 5 deletions benchmarks/results/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -205,22 +205,22 @@ See **ISSUES_FOUND.md** for detailed improvement opportunities:
### Original Benchmarks (vs CT-GAN, TVAE, Copula)

```bash
cd /Users/maxghenis/CosilicoAI/micro
cd /Users/maxghenis/PolicyEngine/micro
python benchmarks/run_benchmarks.py
```

### QRF Comparison Benchmarks

```bash
cd /Users/maxghenis/CosilicoAI/micro
/Users/maxghenis/CosilicoAI/micro/.venv/bin/python benchmarks/run_qrf_benchmark.py
cd /Users/maxghenis/PolicyEngine/micro
/Users/maxghenis/PolicyEngine/micro/.venv/bin/python benchmarks/run_qrf_benchmark.py
```

### Distributional Quality Benchmarks (NEW)

```bash
cd /Users/maxghenis/CosilicoAI/micro
/Users/maxghenis/CosilicoAI/micro/.venv/bin/python benchmarks/run_distributional_benchmark.py
cd /Users/maxghenis/PolicyEngine/micro
/Users/maxghenis/PolicyEngine/micro/.venv/bin/python benchmarks/run_distributional_benchmark.py
```

All results are deterministic (fixed random seed = 42).
Expand Down
2 changes: 1 addition & 1 deletion benchmarks/results/cps_benchmark.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ While QRF+ZI won this benchmark, microplex may still be preferred when:
## Reproducibility

```bash
cd /Users/maxghenis/CosilicoAI/microplex
cd /Users/maxghenis/PolicyEngine/microplex
.venv/bin/python benchmarks/run_cps_benchmark.py
```

Expand Down
2 changes: 1 addition & 1 deletion benchmarks/results/distributional_quality.md
Original file line number Diff line number Diff line change
Expand Up @@ -385,7 +385,7 @@ This benchmark tests whether synthetic data methods capture the **full condition
## Reproducibility

```bash
cd /Users/maxghenis/CosilicoAI/micro
cd /Users/maxghenis/PolicyEngine/micro
python benchmarks/run_distributional_benchmark.py
```

Expand Down
2 changes: 1 addition & 1 deletion benchmarks/results/policyengine_comparison.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ Based on these benchmarks:
## Reproducibility

```bash
cd /Users/maxghenis/CosilicoAI/micro
cd /Users/maxghenis/PolicyEngine/micro
source .venv/bin/activate
python benchmarks/compare_policyengine.py
```
Expand Down
4 changes: 2 additions & 2 deletions benchmarks/results/qrf_comparison.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@
- Joint distribution quality matters (policy analysis, microsimulation)
- You need conditional relationships preserved
- Zero-inflated economic variables are present
- You're doing production deployment (PolicyEngine/Cosilico)
- You're doing production deployment (PolicyEngine/PolicyEngine)

## Recommendations for PolicyEngine

Expand Down Expand Up @@ -189,7 +189,7 @@ All visualizations saved to `benchmarks/results/`:
## Reproducibility

```bash
cd /Users/maxghenis/CosilicoAI/micro
cd /Users/maxghenis/PolicyEngine/micro
python benchmarks/run_qrf_benchmark.py
```

Expand Down
6 changes: 3 additions & 3 deletions benchmarks/results/tabpfn_comparison.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,7 @@ TabPFN+ZI performs best on zero-inflated variables (assets, debt), while micropl

### Recommendations

1. **For PolicyEngine/Cosilico production**: Continue using microplex
1. **For PolicyEngine/PolicyEngine production**: Continue using microplex
- Better correlation preservation is critical for policy simulation
- Generation speed matters for interactive applications
- Scales to full CPS/ACS datasets
Expand All @@ -135,7 +135,7 @@ TabPFN+ZI performs best on zero-inflated variables (assets, debt), while micropl
## Reproducibility

```bash
cd /Users/maxghenis/CosilicoAI/micro
cd /Users/maxghenis/PolicyEngine/micro
source .venv/bin/activate
pip install tabpfn==0.1.11 # Must use v0.1.11 (later versions are gated)
python benchmarks/run_tabpfn_benchmark.py
Expand All @@ -153,4 +153,4 @@ Results are deterministic with random seed = 42.
---

**Generated:** December 26, 2024
**Location:** /Users/maxghenis/CosilicoAI/micro/benchmarks/results/
**Location:** /Users/maxghenis/PolicyEngine/micro/benchmarks/results/
2 changes: 1 addition & 1 deletion benchmarks/run_cps_benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -1108,7 +1108,7 @@ def generate_cps_markdown_report(

f.write("## Reproducibility\n\n")
f.write("```bash\n")
f.write("cd /Users/maxghenis/CosilicoAI/microplex\n")
f.write("cd /Users/maxghenis/PolicyEngine/microplex\n")
f.write("python benchmarks/run_cps_benchmark.py\n")
f.write("```\n\n")

Expand Down
2 changes: 1 addition & 1 deletion benchmarks/run_distributional_benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -425,7 +425,7 @@ def generate_distributional_markdown_report(

f.write("## Reproducibility\n\n")
f.write("```bash\n")
f.write("cd /Users/maxghenis/CosilicoAI/micro\n")
f.write("cd /Users/maxghenis/PolicyEngine/micro\n")
f.write("python benchmarks/run_distributional_benchmark.py\n")
f.write("```\n\n")

Expand Down
4 changes: 2 additions & 2 deletions benchmarks/run_qrf_benchmark.py
Original file line number Diff line number Diff line change
Expand Up @@ -420,7 +420,7 @@ def generate_qrf_markdown_report(
f.write("- Joint distribution quality matters (policy analysis, microsimulation)\n")
f.write("- You need conditional relationships preserved\n")
f.write("- Zero-inflated economic variables are present\n")
f.write("- You're doing production deployment (PolicyEngine/Cosilico)\n\n")
f.write("- You're doing production deployment (PolicyEngine/PolicyEngine)\n\n")

f.write("## Recommendations for PolicyEngine\n\n")

Expand Down Expand Up @@ -466,7 +466,7 @@ def generate_qrf_markdown_report(

f.write("## Reproducibility\n\n")
f.write("```bash\n")
f.write("cd /Users/maxghenis/CosilicoAI/micro\n")
f.write("cd /Users/maxghenis/PolicyEngine/micro\n")
f.write("python benchmarks/run_qrf_benchmark.py\n")
f.write("```\n\n")

Expand Down
Loading
Loading