Feature/issue 1274 notebook format check#1299
Conversation
Add check/nbformat, a new bash script that checks the format of all tracked Jupyter notebook (.ipynb) files using the TensorFlow Docs notebook format checker (tensorflow_docs.tools.nbformat). The script: - Discovers all .ipynb files tracked by git via git ls-files - Runs python -m tensorflow_docs.tools.nbformat on each notebook - Defaults to check-only mode (non-destructive) - Accepts --fix to reformat notebooks in place - Accepts --help to print usage information - Exits non-zero if any notebook fails the format check Wire check/nbformat into: - check/all: called unconditionally alongside check/mypy and check/shellcheck, since notebook format is structural and not related to which Python files changed - check/shellcheck: added to the required_shell_scripts list (in sorted order) so shellcheck verifies the new script is present Add tensorflow-docs to dev_tools/requirements/deps/pytest.txt alongside the existing nbformat dep, since both are needed for notebook-related checks and belong in the same env grouping. Add a notebook-checks CI job to .github/workflows/ci.yaml that triggers when .ipynb files change and runs check/nbformat using the pytest.env.txt environment (which includes tensorflow-docs). Also add the new job to report-results so it is a required check.
Run pip-compile to update pytest.env.txt and dev.env.txt with the pinned version of tensorflow-docs (==2025.2.19.33219), which was added to deps/pytest.txt for use by check/nbformat. Also picks up its transitive deps protobuf and pyyaml.
…tion Regenerated pytest.env.txt and dev.env.txt with pip-compile after accepting upstream version bumps during merge conflict resolution. tensorflow-docs==2025.2.19.33219 is now correctly pinned alongside the updated package versions from main.
There was a problem hiding this comment.
Code Review
This pull request introduces a new script, check/nbformat, to validate and fix Jupyter notebook formatting using tensorflow_docs.tools.nbformat, and updates the project's dependency requirements and lockfiles. The review highlights two critical issues: the inefficiency of invoking the format checker in a loop rather than passing all files at once, and the breaking of environment compatibility for older Python versions caused by regenerating lockfiles with Python 3.13 instead of the project's minimum supported version.
|
Hey @mhucka any feedback on this also? No stress, just checking in. Thanks. |
|
@rosspeili Thank you for the contribution, and sorry for the long delay in getting back to this. The merge conflicts are because files in the repo changed in the meantime, which is my fault for not reviewing this sooner, so I will take the time to fix it in a moment and push an updated version to your branch. (I hope you don't mind – it will save everyone time.) A bigger problem is that this does not work. How did you test this? This is what happens when I run check/nbformat: |
| declare -a errors=() | ||
| for notebook in "${notebooks[@]}"; do | ||
| echo " Checking: ${notebook}" | ||
| if ! python -m tensorflow_docs.tools.nbformat "${tf_args[@]}" "${notebook}"; then |
There was a problem hiding this comment.
tensorflow_docs.tools does not have an nbformat.
This could not have worked. How was this tested, exactly?
|
Hey @mhucka no stress at all. Appreciate the review and comments as always. As disclosed also in previews PRs, the PR description indeed is formatted with AI, based on a dump, but will make it more concise or just drop the original blurp. Code is hand written and passed through AI for review before pr. I used existing tests with py 3.13, but will check again and come back to you asap. Please do feel free to push to my branch if available. Or suggest and I'll approve. 🙏 |
Co-authored-by: Michael Hucka <mhucka@google.com>
Closes #1274
What was solved based on the 1274 issue
OpenFermion has 8 Jupyter notebook tutorials in
docs/tutorials/, but none of the scripts incheck/ever looked at them. This PR adds a newcheck/nbformatscript that validates all tracked.ipynbfiles using the https://github.com/tensorflow/docs notebook format checker, and wires it into the rest of the CI pipeline.Changes
check/nbformat(new file) which is a bash script that finds all.ipynbfiles tracked by git and runspython -m tensorflow_docs.tools.nbformaton each one. Defaults to check-only mode, pass--fixto reformat notebooks in place,--helpfor usage info. Follows the same structure and conventions as the other scripts incheck/.check/all, addedrun check/nbformatso it runs as part of master check.check/shellcheck, also addedcheck/nbformatto the required scripts list (in sorted order) so shellcheck verifies itself the new script is sound.dev_tools/requirements/deps/pytest.txt, addedtensorflow-docsalongside the existingnbformatdep (explaining decision below).dev_tools/requirements/envs/pytest.env.txt/dev.env.txt, regenerated pinned lockfiles to includetensorflow-docs==2025.2.19.33219and its transitive deps (protobuf,pyyaml)..github/workflows/ci.yaml, added anotebook-checksCI job that triggers when.ipynbfiles change and runscheck/nbformat. Also added toreport-resultsso it's required.Rational behind changes
1. Check-only vs. auto-fix?
The script supports both. By default it's read-only and exits non-zero if any notebook fails, which is what we want in CI. The
--fixflag is there for devs who want to fix their notebooks locally without manually figuring out what's wrong. This matches the pattern found incheck/format-incremental(which also has an--applyflag).2. Should
check/nbformatrun unconditionally incheck/all?I think yes. Unlike Python linting or formatting, notebook format issues aren't tied to which files changed in a given commit, but they're property of the repo as a whole. So it runs unconditionally alongside
check/mypyandcheck/shellcheck, rather than only in the changed files branch.3. Where to put the
tensorflow-docsdependency?tensorflow-docsalready had its own orphaneddeps/tensorflow-docs.txtfile that wasn't referenced by any compiled environment. Rather than creating yet another new file, I added it todeps/pytest.txt, which containednbformatalready (the package for reading notebooks). Both are notebook-related checking tools used outside of normal test runs, so I thought they belong together. This keeps the number of dep files as low as possible and the grouping easy to grasp.Let me know how this looks, or if there's something I missed. <3