Skip to content

Add main residence value to LA calibration#371

Closed
vahid-ahmadi wants to merge 11 commits intomainfrom
la-land-value-targets
Closed

Add main residence value to LA calibration#371
vahid-ahmadi wants to merge 11 commits intomainfrom
la-land-value-targets

Conversation

@vahid-ahmadi
Copy link
Copy Markdown
Collaborator

@vahid-ahmadi vahid-ahmadi commented Apr 20, 2026

What this PR does

Adds a derived proxy LA-level main-residence-value calibration target to datasets/local_areas/local_authorities/loss.py. Per-LA target is a constructed product, not a directly observed total:

y_la = avg_house_price_la × ownership_share_la × n_households_la
matrix col = main_residence_value (per household, from policyengine-uk)

Same multiplicative shape as the existing private-rent target (median_rent × renter_share × n_households). LAs missing any input (Wales / Scotland / NI — EHS is England-only) fall through to the national_property × la_household_share fallback, identical to how the tenure target handles missing LAs.

Lineage caveat (flagged in review by @MaxGhenis)

This is a derived/proxy target, not a direct benchmark:

  • Matrix col main_residence_value (policyengine-uk) is WAS-imputed stock wealth, regionally uprated via property-wealth intensity ratios.
  • Target value uses HMLR UK HPI "Average Price" — a transaction-weighted geography-period price index, not an observed stock total of owner-occupied main residences.
  • The product avg_price × ownership × n_households is a defensible identity ("if every owner-occupied dwelling were valued at the LA HPI average, the total would be £X") but the two sides of the calibration constraint reference different price concepts.

A separate policy question — whether derived/proxy targets like this should sit at full training weight alongside directly observed targets (HMRC SPI, ONS pop, DWP UC, VOA dwellings), or be soft-weighted — is being tracked separately and is not blocking this PR.

Closes #370.

Files

New

  • policyengine_uk_data/storage/la_land_values.csv — 360 rows: code, name, households, avg_house_price. avg_house_price from HM Land Registry UK HPI Dec 2025 with name-based fallback for re-allocated codes (Sheffield E08000019 → E08000039), NI country-level fallback for missing LGD months, national-avg fallback for the Isles of Scilly.
  • policyengine_uk_data/targets/sources/la_land.pyload_la_avg_prices(), _compute_la_targets() (observed-input product, no national-total apportionment), get_targets() returning Target objects named housing/main_residence_value/{code} with source=hmlr, geographic_level=LOCAL_AUTHORITY.
  • 18 unit tests in tests/test_la_land_value_targets.py, 8 in tests/test_la_loss_land_value.py.
  • changelog.d/370.md.

Modified

  • datasets/local_areas/local_authorities/loss.py — adds the housing/main_residence_value column following the rent-block pattern: merge avg_house_price into tenure_merged, compute target inline, apply np.where(has_property, target, national * la_household_share) fallback. Same shape as the surrounding tenure / rent / ONS-income blocks.

Tests

26 new tests cover:

  • CSV data quality — row count, schema, value bounds, no missing values, IoS regression fixture (households in [500, 5_000]).
  • Target computation — every per-LA target equals avg_price × ownership_share × n_households exactly; all-positive; English LAs covered (Wales/Scotland/NI fall through to the loss.py national-share fallback by design — same behaviour as the existing tenure target, which only has EHS England data).
  • Sanity ordering — Kensington & Chelsea total exceeds Blackpool; sum of London LA targets exceeds North-East total by ≥3×.
  • Registry integration — 291 Target objects produced (one per English LA), all tagged local_authority and source=hmlr.
  • Loss-matrix wiringhousing/main_residence_value column present in both matrix and y; per-LA y equals the observed-input product for covered LAs; matrix column equals sim.calculate("main_residence_value") (gated on enhanced-FRS fixture).

Full run including adjacent suites (regional land, target DB, target registry, release manifest): 72 passed, 15 skipped (FRS-fixture-gated), no regressions.

Sanity check

LA avg HMLR price ownership share households (Census) target
Kensington & Chelsea £1,178k ~29% ~80k £27.8bn
Blackpool £130k ~62% ~65k £5.1bn

Both ordering (K&C ≫ Blackpool) and absolute level (£10s of bn per LA) look right.

Sources (constructed inputs, not direct LA totals)

  • HM Land Registry UK HPI — Dec 2025 — avg transaction prices.
  • English Housing Survey via la_tenure.xlsx — ownership shares, England only.
  • Census 2021 via la_count_households.xlsx — household counts.

Related

Generalises targets/sources/mhclg_regional_land.py to local-authority
level. Each LA's share of national household land is proportional to
households x avg_house_price, scaled to the ONS National Balance Sheet
household-land series.

Inputs (all already used elsewhere in the repo):
- storage/la_land_values.csv: 360 LAs with households (from the existing
  local_authority_weights.h5 matrix) and avg_house_price (HM Land
  Registry UK HPI Dec 2025).
- _land.HOUSEHOLD_LAND_VALUES for the national anchor.

Tests cover CSV data quality, share/target aggregation, sensible
ordering (K&C > Blackpool by >3x, London boroughs in top quintile),
and registry integration.

Updates test_regional_land_value_targets.py to filter by
GeographicLevel.REGION now that LA targets share the same name prefix.

Closes #370

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi vahid-ahmadi self-assigned this Apr 20, 2026
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@MaxGhenis
Copy link
Copy Markdown
Contributor

Blocker: data bug in la_land_values.csv — Isles of Scilly (E06000053) has 2,492,115 households. The real figure is ~1,000 (pop ~2,000, 1,115 households per ONS mid-2023 estimate).

Impact: IoS alone absorbs 8.6 % of the national household share (2.49M / 29.0M), which the methodology then multiplies by the national household-land value — £65 bn of UK household land 'lives' in Scilly under this target, depressing every other LA's share by that amount. London LAs take the biggest hit because their share-of-average-price is highest.

Quick verification:

$ awk -F',' 'NR>1 {sum+=$3} END {print sum/1e6}' la_land_values.csv
31.5   # with IoS bug
$ awk -F',' 'NR>1 && $2 != "Isles of Scilly" {sum+=$3} END {print sum/1e6}' la_land_values.csv
29.0   # without — matches ONS ~29.4M

Looks like a UK-HPI 'national-total-as-fallback' path leaked into one LA row. Likely two lines to fix:

  1. Correct la_land_values.csv row to E06000053,Isles of Scilly,1115,308582 (or whatever the canonical source gives).
  2. Add a test that bounds households per LA, e.g. assert (df.households.between(500, 500_000)).all(). None of the current 18 tests catch a 1000x outlier.

Happy to approve once that's in. The methodology itself is sound — mirrors mhclg_regional_land.py::_compute_regional_shares correctly and the target shape aligns with the regional targets.

The E06000053 row carried households=2,492,115 — roughly the South
West region total — from an upstream fallback that fired during CSV
generation. Real IoS has ~1,115 households per ONS mid-2023. With
the bug, IoS absorbed 7.85% of the national property-wealth share,
understating every other LA's 2024 target by ~8.5% (e.g. K&C moved
from £42.6bn to £46.2bn after the fix).

Two new tests prevent the regression:
- test_households_within_plausible_range: bounds every LA to
  [500, 500_000] so any future 10x+ outlier fails immediately.
- test_isles_of_scilly_households_are_thousands_not_millions: tight
  [500, 5_000] bound on the specific row that leaked.

Methodology unchanged; LA targets still sum to the ONS national
household-land series within 1e-6.
@vahid-ahmadi
Copy link
Copy Markdown
Collaborator Author

@MaxGhenis thanks — fixed in 3ed729c.

Data fix

  • Patched E06000053,Isles of Scilly from households=2,492,115 to 1,115 (ONS mid-2023 estimate). avg_house_price=308,582 kept as-is.
  • Post-fix UK household total across the CSV drops to 29.73M, in line with the ~29.4M ONS figure you quoted (was 31.5M pre-fix, matching your awk).

Quantified impact of the fix

  • IoS share of national property-wealth share: 7.85% → 0.0038%.
  • K&C 2024 target: £42.6bn → £46.2bn (+8.5%). Every non-IoS LA gets the same ~8.5% uplift; London LAs were the most suppressed.

Tests added

  • test_households_within_plausible_range — bounds every LA to [500, 500_000] per your suggestion. A future fallback leak of this class fails immediately.
  • test_isles_of_scilly_households_are_thousands_not_millions — explicit [500, 5,000] bound on E06000053 so the specific row that leaked has a named regression test.

Full suite: 20/20 pass locally via uv run pytest policyengine_uk_data/tests/test_la_land_value_targets.py.

Generation-path note: the 2,492,115 figure matches the South West regional household total, so the fallback that fired during CSV generation was a regional sum, not "national-avg" as the PR body suggested. I'll correct the PR description; worth flagging for whoever regenerates the CSV next.

@vahid-ahmadi vahid-ahmadi requested a review from MaxGhenis April 23, 2026 13:34
The targets added in the previous commits were registered but inert —
datasets/local_areas/local_authorities/loss.py never built a column for
them, so the LA reweighter could not see them. This adds the
ons/household_land_value column to the LA target matrix:

- matrix entry: per-household household_land_value (from policyengine-uk).
- y entry: 360-vector of per-LA targets at the calibration year, taken
  from la_land._compute_la_targets and reordered to match
  local_authorities_2021.csv so the country mask and target indices
  agree at every position.

The year is selected from time_period; if it is outside
HOUSEHOLD_LAND_VALUES (defined for 2021–2026) the latest known year is
used as a fallback.

New tests in test_la_loss_land_value.py cover both layers:
- target dict ↔ la_codes ordering, finite-positive vector, sum-to-
  national for 2024/2025/2026 (no Microsimulation needed).
- full create_local_authority_target_matrix build (gated on the
  enhanced FRS fixture): column presence, length 360, sum-to-national
  for the calibration year, ordering matches la_codes, all positive,
  and matrix column equals sim.calculate("household_land_value").

Closes the "out of scope" follow-up flagged in the original PR body.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi vahid-ahmadi changed the title Add LA-level household land value calibration targets Add LA-level household land value targets and calibrate on them Apr 27, 2026
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi
Copy link
Copy Markdown
Collaborator Author

vahid-ahmadi commented Apr 29, 2026

@MaxGhenis — for context, here is the full set of LA-level targets the reweighter trains on after this PR (from datasets/local_areas/local_authorities/loss.py):

Family Columns Quantity Source
HMRC income hmrc/{var}/amount, hmrc/{var}/count for each variable in INCOME_VARIABLES (employment, self-emp, pension, dividends, …), filtered to in-SPI frame (income_tax > 0) £ totals + counts HMRC SPI table 3.15, scaled to national projections
Age age/{lower}_{upper} — one column per age band Population count ONS mid-year estimates, scaled to 90% of UK total
Universal Credit uc_households Benunit count on UC DWP Stat-Xplore
ONS small-area income ons/equiv_net_income_bhc, ons/equiv_net_income_ahc, ons/equiv_housing_costs £ aggregates of equivalised HBAI net income (BHC/AHC) and implied housing costs ONS small area income estimates × Census household counts, with national-share fallback for missing LAs
Tenure mix tenure/owned_outright, tenure/owned_mortgage, tenure/private_rent, tenure/social_rent Household counts by tenure English Housing Survey × Census household counts, with national-share fallback
Private rent £ rent/private_rent Aggregate annual private rent VOA/ONS median rent × EHS renter share × Census household counts
Main residence value (this PR) housing/main_residence_value £ aggregate main-residence value of owner-occupiers HMLR UK HPI avg price × EHS ownership share × Census household counts

All targets follow the same shape — observed per-unit value × observed share × observed count, with a national × la_household_share fallback for LAs missing any input. Plus a country mask so e.g. an English household never contributes to a Welsh LA's targets.

…onment)

Replaces the imputed land-value target with a main-residence-value
target built from observed LA-level inputs, mirroring the existing
private-rent block:

    target_la = avg_house_price_la × ownership_share_la × n_households_la
                (HMLR HPI)        × (English Housing Survey) × (Census)

Per @MaxGhenis's standup note (28 Apr): minimise target manipulation by
calibrating on observable LA-level housing indicators rather than
apportioning a national ONS land-value total across LAs. The new
target uses the same shape as the rent target (median × share × count),
including the national-share fallback for LAs missing any input.

Changes:
- la_land.py: drop HOUSEHOLD_LAND_VALUES dependency; new
  load_la_avg_prices() helper; _compute_la_targets() returns
  observed-product £ per LA; targets renamed
  housing/main_residence_value/{code}, source=hmlr.
- loss.py: replace the apportionment block with the rent-style
  inline pattern (merge avg_price into tenure_merged, target =
  price × ownership × households, na-fallback to
  national_property × la_household_share).
- Tests: drop "sums to ONS national" assertions; assert per-LA
  target equals observed product exactly. Layer-2 FRS-gated tests
  updated to use main_residence_value column.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi vahid-ahmadi changed the title Add LA-level household land value targets and calibrate on them Add LA-level main residence value targets from observed housing indicators Apr 29, 2026
@vahid-ahmadi
Copy link
Copy Markdown
Collaborator Author

vahid-ahmadi commented Apr 29, 2026

The LA target now uses directly observed housing indicators (avg_house_price × ownership_share × n_households) instead of apportioning a national ONS land-value total. The shape now mirrors the existing private-rent block exactly:

y["rent/private_rent"]            = median_rent × renter_share × n_households
y["housing/main_residence_value"] = avg_price  × ownership_share × n_households   ← this PR

Both go through the same np.where(has_data, target, national_share_fallback) pattern. Wales/Scotland/NI fall through to the national-share fallback because the EHS only covers England — same behaviour as the existing tenure target.

Matrix variable changed from household_land_value (an imputed regional-intensity rescaling of property wealth) to main_residence_value (a direct FRS observable). Target name changed from ons/household_land_value/{code} to housing/main_residence_value/{code}.

72 passed, no regressions. Sanity check: K&C target £27.8bn vs Blackpool £5.1bn, ordering and level both look right.

vahid-ahmadi and others added 2 commits April 29, 2026 11:40
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Four FRS-fixture-gated tests exercising properties the optimiser
relies on:

- y has no NaN entries (NaN would propagate silently through the
  optimiser).
- Non-English LAs use the national-share fallback (positive,
  non-NaN values), since EHS coverage is England-only.
- matrix column has non-zero variance, so the new target carries
  calibration signal rather than being inert.
- Sum of English LA targets is in the same order of magnitude
  (0.5x-3x) as the implied initial English main-residence-value,
  so the calibrator can plausibly reach the target via reweighting
  rather than 100x weight inflation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi vahid-ahmadi changed the title Add LA-level main residence value targets from observed housing indicators Add main residence value to LA calibration Apr 29, 2026
…age caveat

Per @MaxGhenis PR review: the target value is a constructed proxy
(avg HMLR price × EHS ownership share × Census households), not a
directly observed LA total of main residence value. The earlier PR
description and code comments overstated this.

Substantive lineage gap that the docs now flag explicitly:
- Matrix col main_residence_value (policyengine-uk) is WAS-imputed
  household stock wealth, regionally uprated.
- Target uses HMLR UK HPI 'Average Price' — a transaction-weighted
  geography-period price index, not an observed stock total of
  owner-occupied residences.
- Two different price concepts on the two sides of the constraint.
  The product is a defensible identity, but it is a derived proxy,
  not a direct benchmark.

Behaviour unchanged. This commit only updates the docstring in
la_land.py and the comment in loss.py to call the target
"derived proxy" rather than "directly observed".

A separate policy question (whether derived proxy targets should
sit at full training weight alongside direct VOA/HMRC/ONS/DWP
targets, or be soft-weighted) is being tracked separately.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vahid-ahmadi
Copy link
Copy Markdown
Collaborator Author

vahid-ahmadi commented Apr 29, 2026

Addressed the framing issue flagged the review (commit 717c9cd):

  • PR description, la_land.py docstring, and loss.py comment now call this target a derived proxy, not "directly observed". I had been overselling it.

  • Lineage caveat is now documented explicitly: PE main_residence_value is WAS-imputed stock wealth; HMLR HPI "Average Price" is a transaction-weighted geography-period price index, not an observed stock total of owner-occupied residences. The two sides of the constraint reference different price concepts. The product is a defensible identity, but it's a derived proxy.

No behaviour change in this commit — only documentation/labelling, since you said "#371 is not necessarily wrong, but it should be explicitly treated as a derived/proxy calibration target, not described as direct."

Open policy question that I'm not solving here: "If the standard is 'only calibrate direct official targets,' #371 should not be a hard training target as written." This applies repo-wide — tenure/*, rent/private_rent, ons/equiv_*, and now housing/main_residence_value are all derived proxies sitting at full training weight alongside direct VOA/HMRC/ONS/DWP targets. Worth a separate PR / decision. Happy to open a tracking issue if you want to formalise the policy.

@MaxGhenis
Copy link
Copy Markdown
Contributor

Follow-up on the direct-target discussion: I pushed c330b44 to make housing/main_residence_value validation-only by default instead of a training target. The target is still useful as a proxy diagnostic, but HMLR average price x EHS ownership share x Census households is not a direct LA stock-value target and crosses source/concept boundaries. This also fixes the docstring so it no longer claims soft weighting unless the optimizer actually implements it.\n\nVerification: uv run pytest policyengine_uk_data/tests/test_la_land_value_targets.py policyengine_uk_data/tests/test_la_loss_land_value.py -q; ruff check/format on touched files.

@vahid-ahmadi
Copy link
Copy Markdown
Collaborator Author

vahid-ahmadi commented Apr 30, 2026

@MaxGhenis — nit on the test side.

The new test_main_residence_value_is_validation_target_by_default only asserts that "housing/main_residence_value" is a member of the VALIDATION_TARGETS constant. That's tautological — it locks in the constant's value, not the calibrator's behaviour. Someone could later remove excluded_training_targets=LA_VALIDATION_TARGETS from create_datasets.py and the test would still pass.

A behaviourally meaningful test would run a small calibration (or use the toy calibrator already in test_calibrate_save.py) and assert the post-fit residual on housing/main_residence_value is not driven near zero — i.e., the cell really wasn't trained on. That's the property the constant is supposed to encode, and it's the property a future refactor could silently break.

Happy to take a swing at it as a tiny follow-up if you want. Not blocking.

@MaxGhenis
Copy link
Copy Markdown
Contributor

Updated based on Max's target-standard call: nonconforming/proxy quantities should not live in the targets database or calibration target matrix, even as validation-only targets.

Pushed ecfd6c3, which removes the LA housing/main_residence_value target source, LA loss-matrix column, validation-target exclusion, CSV fixture, changelog, and target tests added by this PR. The PR now has no changed files relative to the base.

Recommendation: close this PR rather than merge it. If we want a property-value diagnostic later, it should live outside the calibration target registry/matrix and be labelled as diagnostics, not as a target.

@vahid-ahmadi
Copy link
Copy Markdown
Collaborator Author

@MaxGhenis you removed all the changes in this PR in ecfd6c3 — the diff against main is now empty. Do you want to close this PR?

@MaxGhenis
Copy link
Copy Markdown
Contributor

Closing per target-standard decision: this PR now has an empty diff after removing the nonconforming/proxy LA property-value target from the target registry and calibration matrix. If we add property-value diagnostics later, they should live outside the targets database/matrix.

@MaxGhenis MaxGhenis closed this May 1, 2026
MaxGhenis added a commit that referenced this pull request May 1, 2026
…/net (#374)

* Add LA-level council tax calibration targets

Two families of LA-level targets, covering all 360 LAs in
local_authorities_2021.csv, built from four public sources:

- `ons/council_tax_band_d/{code}` (350 targets): average Band D
  council tax inclusive of all precepts per billing authority.
  Sources: MHCLG *Council Tax levels set by local authorities in
  England 2026-27*, Welsh Government *Council Tax levels April 2026
  to March 2027*, Scottish Government *Council Tax Assumptions 2025*.
  All 296 English + 22 Welsh + 32 Scottish LAs covered.
- `ons/council_tax_band_count/{code}/{band}` (2,541 targets): number
  of dwellings per band A-H per LA. Source: VOA *Council Tax: Stock
  of Properties, 2025*. Covers England + Wales (318 LAs × ~8 bands,
  minus City of London Band A which is VOA-suppressed).

NI is excluded: domestic rates, not council tax. Scotland band
counts are not in VOA; Scottish Assessors publishes them separately
and is a follow-up.

Files
-----

- `storage/la_council_tax.csv` (31 KB, 360 rows): canonical CSV
  joining DLUHC Table 10 column 17, Welsh Table 1 "Overall average
  band D", Scottish Gov "CT by Band 2025-26" Band D column, and VOA
  CTSOP1.0 bands A-H onto the reference LA list.
  - Post-2023 South Yorkshire E-codes (E08000038/39) re-mapped to
    pre-2023 codes (E08000016/19) to match the reference list.
  - Scottish ampersand/double-space naming normalised
    ("Argyll & Bute" → "Argyll and Bute", etc.).
- `targets/sources/la_council_tax.py`: reads the CSV, emits Target
  objects at geographic_level=LOCAL_AUTHORITY with per-country year
  tagging and per-country reference URL.

Testing
-------

22 hermetic tests (no network access, no baseline fixture needed):

Structure
- Row count matches local_authorities_2021.csv.
- Every expected column present.
- Four UK country codes represented.
- Every LA code matches the reference list.

Value plausibility (the #371 lesson)
- Band D amount in [£900, £3,500] for every row with a value.
- Total dwellings in [200, 800,000] for every row with a value.
- Explicit Isles of Scilly regression test: total dwellings in
  [500, 5,000], not the 2.49M outlier that slipped into #371.
- Band A-H counts sum to total dwellings within 20-property slack
  (VOA 10-property suppression allowance).
- Every band-count target value ≤ 500k (largest LA stock).

Coverage expectations
- Every English, Welsh and Scottish LA has a Band D value.
- Northern Ireland has no council tax flagged (has_council_tax=False).

Spot-checks of published facts
- Wandsworth (E09000032) and Westminster (E09000033) are the two
  lowest-Band-D English LAs (catches row-swap bugs).
- Scottish average Band D is £500+ below English average.

Target-API invariants
- get_targets() returns a non-empty list without network access.
- Band D target count matches the CSV's non-null Band D count.
- Band count target count matches Σ non-null band columns.
- Every target carries geographic_level=LOCAL_AUTHORITY and a
  geo_code.
- Band D targets use Unit.GBP; band count targets use Unit.COUNT
  with is_count=True.
- Every target has at least one year of values.

Sources
-------

- MHCLG (England 2026-27):
  https://www.gov.uk/government/statistics/council-tax-levels-set-by-local-authorities-in-england-2026-to-2027
- Welsh Government (Wales 2026-27):
  https://www.gov.wales/council-tax-levels-april-2026-march-2027-html
- Scottish Government (Scotland 2025-26):
  https://www.gov.scot/publications/council-tax-datasets/
- VOA (England + Wales 2025):
  https://www.gov.uk/government/statistics/council-tax-stock-of-properties-2025

Out of scope for this PR (follow-ups)
-------------------------------------

- Wiring these targets into
  datasets/local_areas/local_authorities/loss.py so the LA
  reweighting actually calibrates on them. Planned follow-up PR.
- Scottish Assessors per-LA chargeable-dwellings to fill the Scotland
  band-count gap.
- Council Tax Support caseload per LA (DWP StatXplore).
- Single Person Discount rate per LA (CIPFA).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address review: add Welsh Band I, source totals from VOA, tidy module

Review points addressed:

- Add count_band_I column to la_council_tax.csv, populated for all 22
  Welsh LAs (Wales revalued in 2005 and introduced a 9th band). Cardiff
  1480, Monmouthshire 670, Vale of Glamorgan 1060, etc. English rows
  keep Band I null; VOA marks it [z] (not applicable).
- Re-source total_dwellings from VOA "All properties" column instead
  of deriving it as the sum of A-H. Previously Σ(A..H) was used for
  both sides of test_band_counts_sum_to_total, making the test
  self-referential; now it validates against the published total with
  a 20-property slack for VOA rounding.
- Rename count columns symmetrically: band_A..band_H + band_D_count →
  count_band_A..count_band_I. Removes the lopsided band_D_count name
  that existed only to avoid clashing with band_d_amount.
- Align band-count target names with voa_council_tax.py:
  voa/council_tax/{code}/{band} (was ons/council_tax_band_count/...);
  variable="council_tax_band" (was council_tax_band_count, which is
  not a real PolicyEngine-UK variable); drop breakdown_variable to
  match the regional VOA module.
- Cache the CSV read with @lru_cache(maxsize=1), matching voa_council_tax.
- Update module docstring: "A-H in England/Scotland, A-I in Wales".

Tests:
- New: test_welsh_las_have_band_i (all 22 Welsh LAs populated).
- New: test_english_las_have_no_band_i (guard against spurious fills).
- New: test_cardiff_band_i_matches_published_figure (~1,480 per VOA 2025).

Final target counts:
- 350 Band D amount targets (unchanged).
- 2,563 band-count targets, up from 2,541: +22 Welsh Band I plus two
  band-H rows that were null due to the earlier truncation.

* Satisfy ruff format on la_council_tax.py

* Wire LA council-tax band-count targets into the calibration loss matrix

The targets registered in la_council_tax.py were inert — the LA target
matrix had no columns for them, so the reweighter could not see them.
This wires the eight VOA Council Tax Stock-of-Properties band-count
targets (A-H) into the LA loss matrix:

- matrix entry: per-household indicator 1[council_tax_band == B] from
  policyengine-uk.
- y entry: 360-vector of per-LA dwelling counts from
  storage/la_council_tax.csv. For LAs without VOA data — Scottish LAs
  (the VOA summary tables don't cover Scotland) and Northern Irish LAs
  (no council tax) — the value falls back to
  national_count × la_household_share, matching the existing tenure
  block's fallback pattern.

Two targets are deliberately not wired in this pass:

- Band I — Wales-only and mostly null in the CSV.
- The Band D £ amount (ons/council_tax_band_d/{code}) — a per-rate
  quantity that does not fit the linear matrix-times-weights
  aggregation. Wiring it as total council-tax revenue would need
  Scotland-specific band ratios (different from England/Wales after
  2017) and is worth a separate PR.

New tests in test_la_loss_council_tax.py cover both layers:

- Light: CSV joins to every LA code, the eight count_band_{X} columns
  exist, E/W rows are populated, Scotland is null as documented, and
  NI has has_council_tax=False.
- Full build (gated on enhanced FRS fixture): all eight columns present
  in matrix and y; y vectors length 360, finite and positive; matrix
  entries are 0/1 indicators with rows summing to ≤1; y matches the
  CSV verbatim for an English LA (Hartlepool); Scotland and NI LAs
  receive a positive fallback rather than NaN or zero.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add LA-level net council tax £ target alongside band counts

Wires the second FRS data point into the LA reweighter, addressing
the 28 Apr standup ALIGNED decision: "calibrate the two FRS data
points as the council tax information is provided after deductions."

Both sides of the new constraint are net of CTR:
- matrix col = council_tax_less_benefit (gross − CTR benefit)
- y = directly observed net council tax requirement per LA

Sources (no national-total apportionment, all directly published):
- England (296 LAs): MHCLG Council Taxbase 2025, Table 1.35 "Tax base
  after allowance for council tax support" × Band D amount.
  Sums to £47.4bn, within 3.4% of the MHCLG Table 1 published England
  Council Tax Requirement of £45.86bn (small gap from year mismatch:
  2025 taxbase × 2026-27 Band D).
- Wales (22 LAs): Welsh Government "Council Tax Levels April 2026
  to March 2027" Table 3 "Council tax income (£m)". Sums to £2.45bn.
- Scotland (32) and NI (10): no source wired; loss.py routes through
  the existing national × la_household_share fallback, same pattern
  as the band-count target and the rent target.

Mirrors the rent block in loss.py: load CSV → merge into ct_merged →
matrix col / y assignment / has_data mask / national-share fallback.

Files:
- storage/la_council_tax.csv: new column total_council_tax_net.
- targets/sources/la_council_tax.py: load_la_net_council_tax() +
  Target objects named housing/council_tax_net/{code}.
- datasets/local_areas/local_authorities/loss.py: housing/council_tax_net
  block immediately after the band-count block.
- tests/test_la_loss_council_tax.py: 11 new tests (4 layer-1 +
  7 layer-2) covering CSV column presence, country coverage, value
  range, England-total ballpark vs MHCLG, matrix-col correctness,
  na-fallback behaviour, calibratability sanity check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix gross/net mismatch in OBR national council tax compute

OBR EFO Table 4.1 reports "Total net council tax receipts" — net of
council tax reduction (CTR). The matching household-level signal is
council_tax_less_benefit (= gross council tax − CTR award), not
council_tax (which is the gross liability before CTR per its
docstring "Gross amount spent on Council Tax, before discounts").

Calibrating gross household values against a net national target
systematically pulls weights down to fit (Σ w × gross > Σ w × net),
leaking bias into adjacent national targets that share the weight
vector.

Order-of-magnitude sanity (UK 2024-25):
  Σ w × council_tax (gross)              ≈ £55bn
  Σ w × council_tax_less_benefit (net)   ≈ £47bn
  OBR Table 4.1 "Total net council tax"  ≈ £44bn

After the fix, the council tax constraint is internally consistent
(both sides net) and aligns with Max's 28 Apr standup decision on
FRS-net-of-CTR alignment. Pairs naturally with the LA-level
housing/council_tax_net target this PR adds — both use the same net
variable.

Adds three regression tests pinning the net-variable contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Zero NI council tax targets instead of fabricating fallbacks

Northern Ireland uses domestic rates, not council tax. The CSV's
has_council_tax flag has been False for NI from the original commit,
but loss.py was ignoring it and assigning national × la_household_share
to NI LAs for both band counts and the new net £ column.

Effect: the optimiser was being told "NI households should pay this
much council tax" with a positive target, while every NI household
has council_tax_band == None and council_tax_less_benefit == 0 — an
unsatisfiable constraint that wastes loss the optimiser cannot drive
to zero. Reported by @MaxGhenis in PR review.

Fix: read has_council_tax from the CSV, gate the np.where so NI LAs
get y == 0 for all 9 council-tax columns. Direct-value and fallback
paths unchanged for E/W/S.

Updates two tests that previously asserted positive fallback for NI;
adds explicit zero-NI assertion for housing/council_tax_net.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Document derived/proxy nature + lineage drift for #374 CT targets

Per @MaxGhenis PR review: both council-tax LA targets are derived
proxies, not direct matches for the matrix-side variables. The PR
description and code comments earlier overstated this.

voa/council_tax/{A..H}: target counts VOA dwellings (E&W only,
includes exempt/empty/second homes); matrix counts policyengine-uk
households. Banding ratios differ in Scotland post-2017 and Wales
has Band I.

housing/council_tax_net: target value is MHCLG taxbase × Band D
(taxbase = Band D equivalent dwellings adjusted for ~7 discount/
premium/exemption classes); matrix col is FRS-reported
council_tax_less_benefit (household-reported gross less reported
CTB). Same intent, different construction paths.

Documentation only — no code, data, or test behaviour change.
The la_council_tax.py docstring now has an explicit "Lineage
caveats" section, and loss.py block comments label both targets
as derived/proxy with cross-reference.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Mask unavailable LA council tax targets

* Remove redundant council tax availability gate

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: Max Ghenis <mghenis@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add LA-level household land value calibration targets

2 participants