Match v1 API: distribute YEAR-keyed inputs across months by hua7450 · Pull Request #1490 · PolicyEngine/policyengine-household-api

hua7450 · 2026-04-29T16:14:38Z

Summary

household.api.policyengine.org was silently dropping YEAR-keyed inputs on MONTH-defined variables (issue #1489), so partner inputs like snap_earned_income: {"2026": 31932} read as 0 in the engine and returned plausible-but-wrong benefit numbers. This PR aligns the household API with the hosted v1 API (api.policyengine.org) and OpenFisca's long-standing set_input_divide_by_period semantic (in production since 2015).

For numeric MONTH-defined variables, the year value is treated as the annual total: explicit monthly inputs are subtracted, and the remainder splits evenly across the unset months as a raw float — even when the remainder is negative. Boolean / string / enum values broadcast unchanged across the unset months while explicit monthly values still override. The API rejects with a 400 ("Inconsistent input") only when every month of the year is explicit AND those monthlies don't sum to the annual total — partial monthly overrides (0 to 11 explicit months) are silently accepted, mirroring v1's exact rule. Negative annual values are accepted (some MONTH-defined numeric variables can legitimately be negative). Output-request null slots don't count as inputs.

The user's original payload is preserved end-to-end: the response echoes back exactly what was sent; resolution and expansion happen on a deep copy passed to the engine.

This PR also adds a partner-facing warnings array on responses for the most common silent-data-loss shape — single-month input on a MONTH-defined variable paired with an annual output request (with no year-key fallback for that variable) — so partners get a heads-up that the unset months will read the engine's fallback. The numeric output is unchanged (v1 has no warning for this shape; the field is purely additive diagnostic).

Closes #1489

What changed

`policyengine_household_api/country.py`

_normalize_period_keys(household, system) — deep-copies the household, then for each MONTH-defined variable's period map runs _expand_year_keys_in_place:
- Numeric: _distribute_numeric_year_value — preserves explicit monthly inputs, splits V − sum(explicit) evenly across unset months as a raw float (negatives included, matching v1).
- Bool / str / enum: _broadcast_year_value — broadcasts to unset months; explicit monthlies still override.
- Null (output request): year key left alone so the engine returns the annual sum.
validate_period_budgets(household, system) — raises ValueError("Inconsistent input") only when all 12 months are explicit and don't sum to the annual; the endpoint converts to a 400. Mirrors policyengine-core's exact condition.
detect_period_warnings(household, system) -> list[PartialMonthlyInputWarning] — returns structured warnings for partial-monthly + annual-output combinations, suppressed when the same period map already has a non-null year-key (because the year fills the unset months from the remainder).
Period parsing uses policyengine_core.periods.period(value) (the canonical parser) instead of regex.

`policyengine_household_api/endpoints/household.py`

Calls validate_period_budgets alongside the existing payload + axes validators (raises convert to 400). Calls detect_period_warnings and attaches body["warnings"] (serialized to strings) when non-empty.

`README.md`

New "Period-key conventions" section: year/month key shapes, recommended pattern (stay consistent), and the year+month overlap rule (year is the annual total; explicit monthlies override; remainder splits across unset months; full-year inconsistency is the only rejection).

Reproduction (silent-drop bug, before any fix)

WA, single 34-year-old, snap_earned_income: {"2026": 31932}, request snap: {"2026": null}:

API	`snap`	`snap_gross_income`
v1 hosted	$0/yr	$31,932/yr
v2 (pre-fix)	$3596/yr	$0/yr
v2 (post-fix)	$0/yr	$31,932/yr

Mixed shape examples

// Annual $1200 with June pinned to $600 — remaining $600 splits across the
// other 11 months as raw float ≈ $54.55/mo. Same numbers v1 produces.
"snap_earned_income": {"2026": 1200, "2026-06": 600}

// Partial monthlies above annual — silently distributed (negative remainder),
// matching v1's silent-distribute-with-negative behavior.
"snap_earned_income": {"2026": 1200, "2026-06": 999, "2026-07": 999}

// Full year explicit AND mismatched → 400 ("Inconsistent input").
"snap_earned_income": {"2026": 1200, "2026-01": 50, ..., "2026-12": 50}

// Negative annual accepted (engine sees -$100/mo).
"snap_earned_income": {"2026": -1200}

Tests

216 unit tests pass (162 pre-existing + 54 new in test_normalize_period_keys.py and test_period_warnings.py).

TestNumericMonthDefinedVariable, TestNonNumericMonthDefinedVariable — year-only / monthly-only / null-output / year+partial-months / year+all-twelve-months / mixed bool / enum override.
TestNormalizerEdges — year-defined left alone, unknown variable, axes skip, non-string period key, original-not-mutated.
TestValidatePeriodBudgets — partial below/equal/above annual all accepted; all-12-explicit + mismatch (above and below) raises; negative year-only accepted; bool / year-defined / unknown variable skipped.
TestSnapInputOutputMatrix — 9-row end-to-end matrix pinned against hosted v1 API parity, including the mixed-coherent case.
TestDetectPeriodWarnings — partial-monthly + annual-output detection, deterministic ordering, sample truncation, multi-variable, year-defined-output false-positive guard, year-input-suppresses-partial-monthly suppression.
TestEndpointAttachesWarnings — wire-format string serialization, no warnings key on clean requests, 400 on full-year inconsistent input.

Test plan

Local repro shows v2 silently dropping the annual key (returning ~$300/mo SNAP); fix produces $0 matching v1.
Mixed year+month case reproduces v1's redistribute number ($3,321.88 for {"2026": 3600, "2026-06": 1800}).
Full-year inconsistent shape returns 400 ("Inconsistent input"), matching v1.
Partial monthlies above annual are silently accepted (negative remainder), matching v1.
Negative year-only accepted, matching v1.
Bool / enum broadcast verified end-to-end against is_incarcerated and vt_ccfap_age_group.
All 216 unit tests pass; ruff format clean.
CI passes.

🤖 Generated with Claude Code

The household API's `Simulation(situation=...)` path silently dropped a YEAR-period assignment on a MONTH-defined variable, so partner inputs like `snap_earned_income: {"2026": 31932}` read as 0 in the engine and returned plausible-but-wrong benefit numbers. The hosted v1 API (api.policyengine.org) divides the annual value across the year, so partners hit two different answers from the same payload. Normalize the situation before handing it to the engine: split numeric annual values as V/12 per month, broadcast booleans / strings / enums unchanged, and leave null output requests alone so annual sums still work. The original household dict is preserved so the response echoes back the user's keys exactly as sent. Closes #1489

…enum tests Add `detect_period_warnings` and a `warnings` field on `/calculate` responses for the most common silent-data-loss shape: a MONTH-defined variable keyed for only some months of a year, paired with an annual output request. The engine defaults the missing months to 0, so the annual sum looks like a year of benefits even though only the keyed month was specified. The warning lists which months were set, how many are missing, and points partners to either a yearly key or all 12 monthly keys. Also document the period-key conventions in README.md so partners can self-serve the recommended pattern (annual everywhere by default; mix to month-only when they need per-month variation). Add MONTH-defined boolean and enum coverage to the normalizer tests to lock in that those values broadcast unchanged across months — `bool` is a subclass of `int` in Python, so the explicit `isinstance(value, bool)` guard is the only thing keeping `True` from being coerced to 0.0833.

Two refinements after probing both kinds of MONTH-defined enums (with and without formulas). 1. `_expand_year_keys_in_place` was using `setdefault`, which left an existing None slot (a monthly output request) untouched. So `{"2026": "SUA", "2026-06": null}` never propagated SUA into June for the engine — June stayed as None and the engine fell back to the variable's default. Now overwrite slots that are missing or None while preserving existing non-null monthly inputs. 2. The partial-monthly warning said "the remaining months will default to 0", which is misleading for with-formula variables (e.g. `vt_ccfap_age_group` has a formula derived from `age` that returns TODDLER, not 0). Reword to "engine's fallback value (often 0, sometimes a formula-derived value)". Add a regression test that locks in (1) using the enum case from the v1 vs v2 divergence we found.

anth-volk

Thanks for this Ziming. The biggest concern I have here is behavior around situations where a household contains an aggregated year key, plus a month key in that year.

Also, just to flag, I don't think there's validation that confirms that user-provided months add up to less than or equal to a user-provided year value for the same var.

…ity helpers Anthony's review surfaced several real correctness and style issues; this commit addresses them together. Annual-input + explicit-monthly semantics (#4, #5): Previously, `{"2026": 1200, "2026-06": 999}` produced an inconsistent budget — the annual was split as $100/mo and June stayed at $999, so the engine saw $2099 in 2026, contradicting the stated $1200 total. Now the year value is the authoritative annual total. Explicit monthly values consume part of it; the remainder is split evenly across the unset months (rounded to two decimals). When the explicit monthlies exceed the annual total, the request is rejected with a 400 carrying a precise message — `validate_period_budgets` raises `ValueError` and the endpoint converts it to a 400 alongside the existing payload and axes validators. The semantic is numeric-only; bool / str / enum keep the prior broadcast-with-override behavior. Period-key parsing (#1): `policyengine_core.periods.period(value)` is the canonical parser and already validates `2026-15` style garbage. Replace the local regex with `_parsed_period`, `_is_year_key`, and `_month_key_year` helpers wrapping that parser. Warnings as standalone structures (#2): Introduce a `PartialMonthlyInputWarning` dataclass that knows how to render itself; `detect_period_warnings` is now a thin wrapper around `_collect_period_warnings` returning `[w.message for w in ...]`. Tightened warning detector (api-reviewer false-positive): Previously, `annual_output_years` collected any year string with a null output, so a YEAR-defined-output null (e.g. `state_name: {"2026": null}`) falsely armed the trigger for unrelated MONTH-input partial-month cases. Filter to MONTH-defined output variables only. Non-mutating normalizer (#3): `_normalize_period_keys` now deep-copies internally and returns a fresh dict instead of mutating its argument. The call site in `country.calculate` is correspondingly simpler. Test refactor (#6): Rewrite `test_normalize_period_keys.py` around the new public-ish entry point with real variables from the policyengine-us system. Drops the dated `# Expected values are the v1 hosted-API responses on 2026-04-29` phrasing in favor of "pinned against parity with the hosted v1 API." Adds budget-distribution coverage, the year-input + monthly-output collision case, and a non-mutation assertion. Add a YEAR-defined-output false-positive lock-in test in `test_period_warnings.py`. 210 unit tests pass.

Refinements after a second review pass. Detector now returns dataclasses, endpoint serializes: `detect_period_warnings` now returns `list[PartialMonthlyInputWarning]` so future callers can read the structured fields (variable, year, months_set). The endpoint serializes via `[w.message for w in ...]` at the wire boundary; the body["warnings"] contract is unchanged. Match v1's unrounded float distribution: The earlier `round(remainder/unset_count, 2)` produced $54.55 instead of v1's $54.5454...; for the $1200 + $600 June case that's a $0.05 budget-drift the engine sums back differently. Drop the rounding; the engine now sees exactly the same per-month floats v1 emits. Defense-in-depth in `_distribute_numeric_year_value`: `validate_period_budgets` is the canonical 400 path, but the normalizer now also raises if `remainder < 0` so unit tests or future callers that bypass validation can't silently distribute a negative per-month value. Mixed-coherent v1 parity row in `_SNAP_MATRIX`: Adds `({"2026": 3600, "2026-06": 1800}, "2026", {"2026": 3321.8796})` covering the case Anthony flagged in review. Tolerance loosened to ±$0.05 to absorb cross-version float drift from numpy/pandas; expected values still pinned against hosted v1 API parity. Tests reshaped: `_wa_household` now takes a full income map. Order-determinism locked in: `test__partial_months_three_set__warning_lists_them_in_order` asserts `months_set == (1, 3, 6)` and the rendered "(2026-01, 2026-03, 2026-06)" sample. Drop the unused `auth_headers` attribute. New tests assert structured warning fields (variable / year / months_set). README: "Period-key conventions" section gets a cleaner column-aligned table and an explicit "year-with-pinned-month" example for the mixed case. Wording avoids implying that pinned months are excluded from the budget.

…ant in test - Drop unused `_messages(warnings)` helper from `test_period_warnings.py` (every test inspects dataclass fields directly now). - Promote the negative-remainder guard in `_distribute_numeric_year_value` from `ValueError` to `AssertionError`. The canonical 400 path is `validate_period_budgets` at the endpoint; this raise is purely defense-in-depth and should signal a code-flow bug ("validator was bypassed") rather than a user-input error. - Add `test__bypassing_validator_with_overrun_raises_assertion_error` so the invariant has a regression test even though it's unreachable from the public endpoint.

…s, round-trip test Replace the year-as-budget semantic with a simpler conflict rule: when a variable receives both a year key and same-year monthly keys with non-null values, keep whichever group appears last in the JSON object and drop the other. Surface the resolution as an `OverlappingPeriodWarning` so partners see what was kept and what was ignored. Output-request `null` slots are exempt — they're requests, not inputs. This intentionally diverges from v1's redistribute behavior for the mixed shape; v1 parity is preserved for year-only and month-only. Wording sharpened so partners don't misread "later" as chronological: both the warning message and the README phrase the rule as "appears last in the JSON object." Warning also documents the null-output exemption so `{"2026": V, "2026-MM": null}` doesn't look like a hazard. Code cleanups while here: - `_year_overlap_resolutions` is now O(n) per period_map (position- lookup dict) and is computed once per variable, threaded into the in-place applier and the detector. Drop `_surviving_monthly_inputs`; fold the dropped-key check inline. - `_distribute_numeric_year_value` and `_broadcast_year_value` simplify back to plain V/12 / broadcast since overlap resolution ran first — no more "subtract explicit and distribute remainder." - Drop `validate_period_budgets`, `_check_year_budget`, and the AssertionError defense — overruns no longer error out, they're resolved by last-wins. Tests: - New `TestYearMonthOverlapResolution` (7 tests) covering both directions, multi-month, null-output exemption, different-year non-collision, and the former overrun shape. - New `TestOverlappingPeriodWarning` (7 tests) including single-input baselines (year-only, month-only) so "one input is fine" is the documented baseline, and message-wording assertions on both the "appears last in the JSON" phrasing and the null-exemption sentence. - New end-to-end `test__overlap_warning_round_trips_through_flask_client` posts a year+month payload through the Flask client and asserts the response warning lists the monthly key as kept — locks in that JSON insertion order survives the pydantic / Flask pipeline. - Drop the obsolete `TestValidatePeriodBudgets` class and the abandoned Mixed-coherent v1-parity matrix row. 215 unit tests pass. README rewrites the "Sending both annual and monthly inputs for the same variable" section in JSON-order terms with both directions and the null-exemption note. Changelog entry notes the last-wins divergence from v1 for the mixed shape.

…t-wins After more discussion, the team prefers full v1 API parity over the household-API-specific last-wins resolution. v1 / OpenFisca's `set_input_divide_by_period` has been treating the year value as the annual total since 2015 — explicit monthly inputs are subtracted, the remainder is split evenly across the unset months, and a budget overrun raises `ValueError("Inconsistent input...")`. This commit brings the household API back in line with that long-standing semantic across the year+month overlap case. Restored: - `_distribute_numeric_year_value` does the subtract-and-distribute again (preserves explicit monthlies, splits remainder as raw float). - `_broadcast_year_value` preserves explicit monthly overrides for bool/str/enum. - `validate_period_budgets` raises `ValueError` when explicit monthlies exceed the annual total; endpoint converts to 400. - `TestValidatePeriodBudgets` and the Mixed-coherent matrix row pinned at v1's $3321.88. Removed: - `OverlappingPeriodWarning` dataclass and all related code/tests/docs. - `_year_overlap_resolutions` and `_apply_year_overlap_resolutions_in_place`. - `TestYearMonthOverlapResolution` test class and the round-trip Flask test that pinned last-wins behavior. Kept (purely additive on top of v1): - `PartialMonthlyInputWarning` — partner-friendly heads-up when a partial monthly input is paired with an annual output for the same year. Doesn't change any number; v1 has no such warning. README, changelog entry, and PR description rewritten to match. 212 unit tests pass.

…press partial-monthly warning when year input fills the gap Three tightening fixes after the previous v1-parity revert. 1. Validator was rejecting partial-monthly overruns that v1 silently accepts (e.g. `{"2026": 1200, "2026-06": 999, "2026-07": 999}`): v1 / policyengine-core's `set_input_divide_by_period` only raises "Inconsistent input" when *every* month of the year is explicit and the sum doesn't match the annual total. Partial monthlies above the annual are silently distributed to a negative remainder. Match that exact condition. 2. Validator was also rejecting valid negative year-only values: `{"2026": -1200}` has explicit_sum = 0, and `0 > -1200` was triggering the old check. Negative annual values are legitimate for some MONTH-defined numeric variables; v1 accepts them. Drop the comparison entirely now that we only fire on the all-12-months mismatch case. 3. Detector was firing PartialMonthlyInputWarning for `{"2026": V, "2026-06": V'}` even though the normalizer fills the unset months from the year-key remainder — no missing-month hazard exists in that shape. Track which (variable, entity, year) tuples already have a non-null year-key input and suppress the warning for those. Also drop the AssertionError defense in `_distribute_numeric_year_value` — v1 silently produces negative per-month values when the partial monthlies overrun the annual, so we should too. README, changelog entry, and PR description rewritten to describe the precise rule. Issue #1489 comment edited in place. New tests: - TestValidatePeriodBudgets: partial-above-annual accepted; all-twelve below-or-above mismatched raises; negative year-only accepted. - TestDetectPeriodWarnings: year-input-with-partial-months suppresses the partial-monthly warning. - TestEndpointAttachesWarnings: 400 path now exercises the all-twelve- inconsistent shape (the only one v1 rejects). 216 unit tests pass.

anth-volk

Considering the importance of these changes, I haven't heavily reviewed for code quality. That said, I flagged a case where we might want explicit failure behavior (probably 400 BAD REQUEST response). Also, if you have the time, I'd recommend more clearly separating validation & warning setting from monthly key expansion.

anth-volk · 2026-04-29T22:37:12Z

+
+        normalized = _normalize_period_keys(household, us_system)
+
+        assert normalized["spm_units"]["spm_unit_1"]["snap_earned_income"] == {


Shouldn't this throw? We won't be able to process this, anyway

Broader recommendation: you might want to handle validation and key expansion as two separate steps

Adds `validate_period_keys` so requests with unparseable period keys (e.g. `not-a-period`, `2026/13`) get a 400 with the offending key, variable, and entity instead of a silent zero from the engine. Refactors the four passes (key validation, budget validation, warning detection, normalization) onto a shared `_walk_variable_slots` generator, removing the four-deep nested loop duplicated in each function. Addresses Anthony's review on PR #1490. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Calls /us/calculate twice with year-keyed employment_income set to $30k and $0. Asserts the resulting annual SNAP differs. If a future change silently drops year-to-month distribution again, both responses collapse to the same fallback value and this test fails — independent of the specific dollar amounts produced by current policyengine-us rules, so it survives parameter updates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

anth-volk

LGTM

hua7450 added 2 commits April 29, 2026 12:14

Drop unused parametrize column in matrix test

2fb6099

hua7450 marked this pull request as draft April 29, 2026 16:33

hua7450 added 2 commits April 29, 2026 12:39

hua7450 marked this pull request as ready for review April 29, 2026 17:23

hua7450 requested a review from anth-volk April 29, 2026 17:31

anth-volk requested changes Apr 29, 2026

View reviewed changes

hua7450 added 3 commits April 29, 2026 14:49

hua7450 changed the title ~~Fix silent-drop of YEAR-keyed input on MONTH-defined variables~~ Resolve year+month input overlap (last wins) and fix YEAR-keyed silent-drop Apr 29, 2026

hua7450 mentioned this pull request Apr 29, 2026

Silently drops MONTH-defined variable inputs keyed annually #1489

Closed

hua7450 changed the title ~~Resolve year+month input overlap (last wins) and fix YEAR-keyed silent-drop~~ Match v1 API: distribute YEAR-keyed inputs across months, reject inconsistent budgets Apr 29, 2026

hua7450 changed the title ~~Match v1 API: distribute YEAR-keyed inputs across months, reject inconsistent budgets~~ Match v1 API: distribute YEAR-keyed inputs across months Apr 29, 2026

anth-volk reviewed Apr 29, 2026

View reviewed changes

hua7450 mentioned this pull request Apr 30, 2026

Add partner integration docs (MkDocs Material site) #1492

Closed

6 tasks

anth-volk approved these changes Apr 30, 2026

View reviewed changes

hua7450 merged commit eb78e1e into main Apr 30, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Match v1 API: distribute YEAR-keyed inputs across months#1490

Match v1 API: distribute YEAR-keyed inputs across months#1490
hua7450 merged 12 commits into
mainfrom
fix/normalize-annual-keys-on-month-vars

hua7450 commented Apr 29, 2026 •

edited

Loading

Uh oh!

anth-volk left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anth-volk left a comment

Uh oh!

anth-volk Apr 29, 2026

Uh oh!

anth-volk left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		normalized = _normalize_period_keys(household, us_system)

		assert normalized["spm_units"]["spm_unit_1"]["snap_earned_income"] == {

Conversation

hua7450 commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

policyengine_household_api/country.py

policyengine_household_api/endpoints/household.py

README.md

Reproduction (silent-drop bug, before any fix)

Mixed shape examples

Tests

Test plan

Uh oh!

anth-volk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

anth-volk left a comment

Choose a reason for hiding this comment

Uh oh!

anth-volk Apr 29, 2026

Choose a reason for hiding this comment

Uh oh!

anth-volk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hua7450 commented Apr 29, 2026 •

edited

Loading

`policyengine_household_api/country.py`

`policyengine_household_api/endpoints/household.py`

`README.md`