gh-144884: Restructure re.sub docs, clarify aspects of repl notation#144891
gh-144884: Restructure re.sub docs, clarify aspects of repl notation#144891cben wants to merge 1 commit intopython:mainfrom
repl notation#144891Conversation
- `flags` are only relevant when `pattern` is a string (followup to python#119960). - Extended "beans and spam" example to demonstrate both string & re.compile flags usage, `\1` templating, and moved it close to start. - Discuss all how-we-match parameters before what-we-do-with-matches. TODO: Is important info close enough to start? - Explain callback before backslash notation because it's shorter but also to promote it. IMHO, people fear it as a "last-resort escape hatch" while it's actually *simpler* than backslashes (python#128138 is one example). TODO: Will this order make sense after python#135992 ? - Consolidated `repl` notation from two far-away paragraphs to one place. - Starting from `\1` and `\g` which are the whole purpose of dealing with backslashes! - Briefly mention `\octal` wart, 99 limit and `\g<100>` avoiding them. - Draw attention to `\\` for getting a literal backslash. - Clarify that *most* escapes are supported but `\x\u\U\N` aren't. - Move "Unknown escapes of ASCII letters" *after* listing all the known ones. - Added a note promoting raw string notation for `repl` too.
| - ``\g<name>`` are replaced by the substring matched by named ``(?P<name>...)`` | ||
| groups. | ||
| - ``\g<number>`` is another way to refer to numbered groups. | ||
| ``\g<2>0`` inserts group 2 followed by the literal character ``'0'``, |
There was a problem hiding this comment.
I feel we're using too many different verbs for same idea: "are replaced by", "[back]refer[ence]", "uses", "inserts", "converted to", "substitutes in"...
Any advice welcome!
-
Technically, describing these as "backreference"s is inexact: backreferences in RE assert equality while matching — here we copy the captured text into replacement. And the syntaxes are somewhat different (see "Ways to reference it" table).
However, I suspect readers do think of them as flip sides of same idea, and the doc uses "refer" in both senses widely...
And for\20inability to express\g<20>vs\g<2>0distinction 👇 I had trouble phrasing it as well in other ways (perhaps because "reference" doubles as a noun). -
Dropping "substitutes" would be a shame because that's the one place we show what the function name stands for 😐 Ideally we'd use it in opening sentence, but that sounds clumsy in my head.
This proposes a "maximalist" re-structuring of re.sub docs for #144884, putting ALL discussion of how-we-match parameters before what-we-do-with-matches.
👉 re.sub in preview
Moving flags with pattern made it easy to say it's only relevant for string patterns.
Consolidated repl notation from two far-away paragraphs to one place.
\1and\gwhich are the whole purpose of dealing with backslashes!\octalwart, 99 limit, and\g<100>avoiding them.\\for getting a literal backslash.\x\u\U\Naren't.repltoo.Further, I swapped order to explain callback before backslash notation because it's much shorter — but also to promote it. IMHO, people fear it as a "last-resort escape hatch" while it's actually conceptually simpler than backslashes (#128138 is one example). YMMV 😜
The downside is that backslash notation is covered at the end, after rarely important details like adjacent empty matches...
I tried to mitigate that by a better example early on. Pulled out the "beans and spam" example which was out of place anyway (under "if repl is a function" but this repl is a string!), expanded to demonstrate
\1templating, andflagsusage with both string &re.compile.A more drastic alternative could be to split sub & subn from "Functions" into a new "Search and replace" section, and explain the notation there at top level, even before the functions — like we do for RE syntax. That could work even better if gh-105636: Add re.Pattern.compile_template() #135992 lands.
📚 Documentation preview 📚: https://cpython-previews--144891.org.readthedocs.build/