GH-49351: [C++] Check TZDIR environment variable in vendored date library#49353
GH-49351: [C++] Check TZDIR environment variable in vendored date library#49353canassa wants to merge 2 commits intoapache:mainfrom
Conversation
…te library The vendored Howard Hinnant date library hardcodes /usr/share/zoneinfo as the timezone database path. This adds a TZDIR check in discover_tz_dir() before falling back to platform-specific defaults, consistent with POSIX conventions. This fixes timezone operations on non-FHS Linux distributions (e.g. NixOS) where zoneinfo lives under a non-standard path.
|
|
|
Thanks for opening this @canassa! Is this an issue you're personally seeing on NixOS? |
Yes! I noticed this when working on a Python project that uses PyArrow on my NixOS desktop. I could work around it by creating a symlink, but I figured it would be nice if Arrow supported the environment variable. |
|
Great to hear! |
Sure, I will take a look! |
|
Thanks, I pushed the linting fixes. I also investigated other approaches, but the vendored library change seem to be unavoidable. The |
Rationale for This Change
The vendored Howard Hinnant date library hardcodes
/usr/share/zoneinfoas the timezone database path indiscover_tz_dir(). It does not check theTZDIRenvironment variable, which is the POSIX standard mechanism for overriding this path.This causes timezone operations to fail on non FHS Linux distributions such as NixOS, where
zoneinforesides under a non standard path like/nix/store/.../share/zoneinfo.The upstream library also lacks
TZDIRsupport.What Changes Are Included in This PR?
cpp/src/arrow/vendored/datetime/tz.cpp: Check theTZDIRenvironment variable indiscover_tz_dir()before falling back to platform specific hardcoded paths. Usesstat()andS_ISDIR()for validation, matching the existing pattern in the function.cpp/src/arrow/vendored/datetime/README.md: Document the patch.cpp/src/arrow/public_api_test.cc: Add a non Windows test that setsTZDIRand verifies timezone resolution succeeds through Arrow's compute API.python/pyarrow/conftest.py: RespectTZDIRin the emscriptentimezone_datatest marker.Are These Changes Tested?
Yes. A new
Misc.TZDIREnvironmentVariabletest setsTZDIRto a validzoneinfodirectory and casts a UTC timestamp toAmerica/New_York, verifying the code path works end to end.Are There Any User Facing Changes?
Arrow now respects the
TZDIRenvironment variable on non Windows platforms, enabling timezone operations on systems without/usr/share/zoneinfo.Fixes #49351