Skip to content

gh-144884: Restructure re.sub docs, clarify aspects of repl notation#144891

Open
cben wants to merge 1 commit intopython:mainfrom
cben:doc-re-sub
Open

gh-144884: Restructure re.sub docs, clarify aspects of repl notation#144891
cben wants to merge 1 commit intopython:mainfrom
cben:doc-re-sub

Conversation

@cben
Copy link

@cben cben commented Feb 16, 2026

This proposes a "maximalist" re-structuring of re.sub docs for #144884, putting ALL discussion of how-we-match parameters before what-we-do-with-matches.

👉 re.sub in preview

Moving flags with pattern made it easy to say it's only relevant for string patterns.

Consolidated repl notation from two far-away paragraphs to one place.

  • Starting from \1 and \g which are the whole purpose of dealing with backslashes!
  • Briefly mention \octal wart, 99 limit, and \g<100> avoiding them.
  • Draw attention to \\ for getting a literal backslash.
  • Clarify that most escapes are supported but \x\u\U\N aren't.
  • Move "Unknown escapes of ASCII letters" after listing all the known ones.
  • Added a note promoting raw string notation for repl too.

Further, I swapped order to explain callback before backslash notation because it's much shorter — but also to promote it. IMHO, people fear it as a "last-resort escape hatch" while it's actually conceptually simpler than backslashes (#128138 is one example). YMMV 😜


The downside is that backslash notation is covered at the end, after rarely important details like adjacent empty matches...

  • I tried to mitigate that by a better example early on. Pulled out the "beans and spam" example which was out of place anyway (under "if repl is a function" but this repl is a string!), expanded to demonstrate \1 templating, and flags usage with both string & re.compile.

  • A more drastic alternative could be to split sub & subn from "Functions" into a new "Search and replace" section, and explain the notation there at top level, even before the functions — like we do for RE syntax. That could work even better if gh-105636: Add re.Pattern.compile_template() #135992 lands.


📚 Documentation preview 📚: https://cpython-previews--144891.org.readthedocs.build/

- `flags` are only relevant when `pattern` is a string (followup to python#119960).
- Extended "beans and spam" example to demonstrate both string & re.compile
  flags usage, `\1` templating, and moved it close to start.
- Discuss all how-we-match parameters before what-we-do-with-matches.
  TODO: Is important info close enough to start?

- Explain callback before backslash notation because it's shorter but also
  to promote it. IMHO, people fear it as a "last-resort escape hatch"
  while it's actually *simpler* than backslashes (python#128138 is one example).
  TODO: Will this order make sense after python#135992 ?

- Consolidated `repl` notation from two far-away paragraphs to one place.
  - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes!
  - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them.
  - Draw attention to `\\` for getting a literal backslash.
  - Clarify that *most* escapes are supported but `\x\u\U\N` aren't.
  - Move "Unknown escapes of ASCII letters" *after* listing all the known ones.
  - Added a note promoting raw string notation for `repl` too.
- ``\g<name>`` are replaced by the substring matched by named ``(?P<name>...)``
groups.
- ``\g<number>`` is another way to refer to numbered groups.
``\g<2>0`` inserts group 2 followed by the literal character ``'0'``,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel we're using too many different verbs for same idea: "are replaced by", "[back]refer[ence]", "uses", "inserts", "converted to", "substitutes in"...
Any advice welcome!

  • Technically, describing these as "backreference"s is inexact: backreferences in RE assert equality while matching — here we copy the captured text into replacement. And the syntaxes are somewhat different (see "Ways to reference it" table).
    However, I suspect readers do think of them as flip sides of same idea, and the doc uses "refer" in both senses widely...
    And for \20 inability to express \g<20> vs \g<2>0 distinction 👇 I had trouble phrasing it as well in other ways (perhaps because "reference" doubles as a noun).

  • Dropping "substitutes" would be a shame because that's the one place we show what the function name stands for 😐 Ideally we'd use it in opening sentence, but that sounds clumsy in my head.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting review docs Documentation in the Doc dir skip news

Projects

Status: Todo

Development

Successfully merging this pull request may close these issues.

1 participant