Skip to content

Commit d9c8da9

Browse files
timsaucerclaude
andcommitted
docs(skill): point at spark __all__ instead of enumerating
Enumerating spark functions in the user-facing skill duplicates the __all__ list in python/datafusion/functions/spark.py and will drift the moment a new function lands or is renamed. Replace the per-function listing with a category summary and a discovery snippet that queries the actual __all__ at runtime, which is the authoritative source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 310df18 commit d9c8da9

1 file changed

Lines changed: 12 additions & 19 deletions

File tree

skills/datafusion_python/SKILL.md

Lines changed: 12 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -782,25 +782,18 @@ ctx.enable_spark_functions() # makes Spark UDFs visible to SQL
782782
ctx.sql("SELECT sha2('hello', 256)").show()
783783
```
784784

785-
Coverage spans aggregate (`avg`, `try_sum`, `collect_list`, `collect_set`),
786-
array (`array`, `array_contains`, `array_repeat`, `shuffle`, `slice`,
787-
`size`), bitmap, bitwise (`shiftleft`, `shiftright`, `shiftrightunsigned`,
788-
`bit_get`, `bit_count`, `bitwise_not`), datetime (`add_months`,
789-
`date_add`, `date_sub`, `date_diff`, `date_trunc`, `time_trunc`, `trunc`,
790-
`next_day`, `from_utc_timestamp`, `to_utc_timestamp`, `unix_date`,
791-
`unix_micros`/`millis`/`seconds`, `make_interval`, `make_dt_interval`),
792-
hash (`crc32`, `sha1`, `sha2`, `xxhash64`), JSON (`json_tuple`),
793-
map (`map_from_arrays`, `map_from_entries`, `str_to_map`), math
794-
(`abs`, `ceil`, `floor`, `round`, `expm1`, `factorial`, `hex`,
795-
`modulus`/`pmod`, `rint`, `unhex`, `width_bucket`, `csc`/`sec`,
796-
`negative`, `bin`), string (`ascii`, `base64`/`unbase64`, `char`,
797-
`concat`, `elt`, `like`/`ilike`, `length`, `luhn_check`, `format_string`,
798-
`space`, `substring`, `soundex`, `is_valid_utf8`/`make_valid_utf8`),
799-
URL (`parse_url`/`try_parse_url`, `url_decode`/`url_encode`,
800-
`try_url_decode`), and conditional (`if_`, `spark_cast`).
801-
802-
The full list is in the API reference; see
803-
`python/datafusion/functions/spark.py`.
785+
Coverage spans aggregate, array, bitmap, bitwise, datetime, hash, JSON,
786+
map, math, string, URL, and conditional categories. The authoritative
787+
list of what is currently exposed is the `__all__` in
788+
`python/datafusion/functions/spark.py`:
789+
790+
```bash
791+
python -c "from datafusion.functions import spark; print(sorted(spark.__all__))"
792+
```
793+
794+
When you need to know whether a specific pyspark function is available,
795+
check `__all__` rather than this skill — the list there moves with the
796+
code; any enumeration here would drift.
804797

805798
**Semantic divergences vs the default namespace.** Functions that exist in
806799
both `functions` and `functions.spark` may behave differently:

0 commit comments

Comments
 (0)