Skip to content

Use valid FSE dictionary costs in optimal parser#4694

Open
Rohithmatham12 wants to merge 1 commit into
facebook:devfrom
Rohithmatham12:fix-opt-parser-tiny-dict-fse
Open

Use valid FSE dictionary costs in optimal parser#4694
Rohithmatham12 wants to merge 1 commit into
facebook:devfrom
Rohithmatham12:fix-opt-parser-tiny-dict-fse

Conversation

@Rohithmatham12

Copy link
Copy Markdown

Fixes #4681.

ZSTD_loadCEntropy() already marks dictionary FSE tables that contain zero-probability symbols as FSE_repeat_check, because they are not safe to assume valid for the full symbol range. ZSTD_rescaleFreqs() ignored those per-stream repeat modes and entered the dictionary-derived FSE pricing path whenever the Huffman table was HUF_repeat_valid.

For tiny dictionaries with optimal-parser strategies, that can make FSE_getMaxNbBits() return its documented fake cost for zero-frequency symbols (tableLog + 1). With scaleLog = 10, this reaches the assert(bitCost < scaleLog) in the optimal parser.

This changes first-block optimal-parser pricing to decide per entropy stream:

  • use dictionary HUF costs only when HUF repeat mode is valid,
  • use dictionary FSE costs only when that specific FSE repeat mode is valid,
  • otherwise fall back to the existing baseline stats for that stream.

That preserves dictionary-derived pricing when the table is known complete, while avoiding invalid costs from partial FSE tables.

Test plan:

  • make -j4 fuzzer
  • ./fuzzer -i1 -s1 -v
  • git diff --check

@meta-cla meta-cla Bot added the CLA Signed label Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Reachable assertion in ZSTD_rescaleFreqs (bitCost < scaleLog) when CDict built with optimal-parser strategy + tiny dictionary

1 participant