-
Notifications
You must be signed in to change notification settings - Fork 505
ORC-1481: [C++] Better error message when TZDB is unavailable #1587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c++/src/Timezone.cc
Outdated
| if (!std::filesystem::exists(std::filesystem::path(filename))) { | ||
| std::stringstream ss; | ||
| ss << "Time zone file " << filename << " does not exist." | ||
| << " Please install IANA time zone database and set TZDIR env properly" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a suggestion: Can rephrase as
"Please install IANA time zone database and set TZDIR env property to point to /usr/share/zoneinfo" or something like that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this: Please install IANA time zone database and set TZDIR env if it is not installed at /usr/share/zoneinfo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit:
... set TZDIR env to point to correct location, if not installed....
I am just wearing a user's hat and trying to see if the message helps me resolve things. Please note that English is not my 1st language
@wgtmac Thanks :)
c++/src/Timezone.cc
Outdated
| std::stringstream ss; | ||
| ss << "Time zone file " << filename << " does not exist." | ||
| << " Please install IANA time zone database and set TZDIR env" | ||
| << " if it is not installed at /usr/share/zoneinfo"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default location /usr/share/zoneinfo does not apply to Windows. I think the error message works for Windows, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's not applied to Windows OS because the path style is invalid.
This PR might lead a user to try to check the location of invalid path information by mapping from /usr/share/zoneinfo to C:\\usr\share\zoneinfo. Although the user ends up to conclude that the condition statement (if it is not installed at /usr/share/zoneinfo) is false, this message might confuse him by suggesting him to install it into C:\\usr\share\zoneinfo.
When the error message happens, I'm wondering if we don't need the message (if it is not installed at /usr/share/zoneinfo) at all.
The first two lines look sufficient to me. WDYT?
ss << "Time zone file " << filename << " does not exist."
<< " Please install IANA time zone database and set TZDIR env.";There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you, let me change it.
dongjoon-hyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM (except one comment, https://github.com/apache/orc/pull/1587/files#r1298940081).
| if (itr != timezoneCache.end()) { | ||
| return *(itr->second).get(); | ||
| } | ||
| if (!std::filesystem::exists(std::filesystem::path(filename))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, LGTM! @wgtmac However, the logic of determining whether a file exists can be integrated into the readLocalFile function by throwing a FileNotFound exception. This exception can be caught in the getTimezoneByFilename function and further thrown as a Timezone exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I just want to check proactively before reading it so the error message is clear. It may fail reading due to other reasons.
|
I've merged it. Thanks @mystic-lama @dongjoon-hyun @coderex2522! |
|
Thank you so much, @wgtmac and all. |
|
Sorry for writing in a closed thread but I think the question fits here. I downloaded the database from https://data.iana.org/time-zones/releases/tzdata2023c.tar.gz unpacked it, set the env variable TZDIR to the unpacked directory and still can't get ORC to work as it looks for a file called "UTC" but files in the database are mostly continent names like "africa", "asia", etc. |
You still need to build the package, while that has data is is still a source distribution. For example, see how conda-forge does it here: https://github.com/conda-forge/tzdata-feedstock/blob/main/recipe/build.sh Once you've built it, you should see the expected files that can be used for |
### What changes were proposed in this pull request? Check the availability of TZDB and throw helpful error message before it crashes. ### Why are the changes needed? When local IANA TZDB is unavailable or TZDIR env is not properly set, getTimezoneByName() simply fails without much helpful guidance on what to do. ### How was this patch tested? Added a test case TestTimezone.testMissingTZDB to make sure error message is as expected. Closes apache#1587 from wgtmac/ORC-1481. Authored-by: Gang Wu <ustcwg@gmail.com> Signed-off-by: Gang Wu <ustcwg@gmail.com>
### What changes were proposed in this pull request? Find tzdb without having to set `TZDIR` when in a conda-environment (where `tzdata` [has](https://conda-metadata-app.streamlit.app/?q=conda-forge%2Fnoarch%2Ftzdata-2024a-h0c530f3_0.conda) a uniform location of `$CONDA_PREFIX/share/zoneinfo` across all platforms). ### Why are the changes needed? This is due to issues in arrow (see apache/arrow#36026) that cannot really be fixed there, as it assumes that orc >=2.0 knows how to find the tzdb. Having to set `TZDIR` in all user environments is an intrusive change that should be avoided, and since the cost here is checking a single environment variable, it's hopefully not too onerous for consideration. ### How was this patch tested? CI here ### Was this patch authored or co-authored using generative AI tooling? No CC wgtmac See also: #1587 Closes #1882 from h-vetinari/tzdb. Authored-by: H. Vetinari <h.vetinari@gmx.com> Signed-off-by: Gang Wu <ustcwg@gmail.com>
### What changes were proposed in this pull request? Find tzdb without having to set `TZDIR` when in a conda-environment (where `tzdata` [has](https://conda-metadata-app.streamlit.app/?q=conda-forge%2Fnoarch%2Ftzdata-2024a-h0c530f3_0.conda) a uniform location of `$CONDA_PREFIX/share/zoneinfo` across all platforms). ### Why are the changes needed? This is due to issues in arrow (see apache/arrow#36026) that cannot really be fixed there, as it assumes that orc >=2.0 knows how to find the tzdb. Having to set `TZDIR` in all user environments is an intrusive change that should be avoided, and since the cost here is checking a single environment variable, it's hopefully not too onerous for consideration. ### How was this patch tested? CI here ### Was this patch authored or co-authored using generative AI tooling? No CC wgtmac See also: #1587 Closes #1882 from h-vetinari/tzdb. Authored-by: H. Vetinari <h.vetinari@gmx.com> Signed-off-by: Gang Wu <ustcwg@gmail.com> (cherry picked from commit e89ca33) Signed-off-by: Gang Wu <ustcwg@gmail.com>
What changes were proposed in this pull request?
Check the availability of TZDB and throw helpful error message before it crashes.
Why are the changes needed?
When local IANA TZDB is unavailable or TZDIR env is not properly set, getTimezoneByName() simply fails without much helpful guidance on what to do.
How was this patch tested?
Added a test case TestTimezone.testMissingTZDB to make sure error message is as expected.