Skip to content

Conversation

@wgtmac
Copy link
Member

@wgtmac wgtmac commented Aug 16, 2023

What changes were proposed in this pull request?

Check the availability of TZDB and throw helpful error message before it crashes.

Why are the changes needed?

When local IANA TZDB is unavailable or TZDIR env is not properly set, getTimezoneByName() simply fails without much helpful guidance on what to do.

How was this patch tested?

Added a test case TestTimezone.testMissingTZDB to make sure error message is as expected.

if (!std::filesystem::exists(std::filesystem::path(filename))) {
std::stringstream ss;
ss << "Time zone file " << filename << " does not exist."
<< " Please install IANA time zone database and set TZDIR env properly"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a suggestion: Can rephrase as

"Please install IANA time zone database and set TZDIR env property to point to /usr/share/zoneinfo" or something like that

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this: Please install IANA time zone database and set TZDIR env if it is not installed at /usr/share/zoneinfo

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit:
... set TZDIR env to point to correct location, if not installed....

I am just wearing a user's hat and trying to see if the message helps me resolve things. Please note that English is not my 1st language
@wgtmac Thanks :)

std::stringstream ss;
ss << "Time zone file " << filename << " does not exist."
<< " Please install IANA time zone database and set TZDIR env"
<< " if it is not installed at /usr/share/zoneinfo";
Copy link
Member

@dongjoon-hyun dongjoon-hyun Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for investigating and making a PR, @wgtmac .

Just a question: The original issue report (#1577) also mentioned Windows, what happens in Windows? Is this an actionable message in Windows?

Specifically on Windows, the DEFAULT_TZDIR location is an invalid path,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default location /usr/share/zoneinfo does not apply to Windows. I think the error message works for Windows, too.

Copy link
Member

@dongjoon-hyun dongjoon-hyun Aug 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's not applied to Windows OS because the path style is invalid.

This PR might lead a user to try to check the location of invalid path information by mapping from /usr/share/zoneinfo to C:\\usr\share\zoneinfo. Although the user ends up to conclude that the condition statement (if it is not installed at /usr/share/zoneinfo) is false, this message might confuse him by suggesting him to install it into C:\\usr\share\zoneinfo.

When the error message happens, I'm wondering if we don't need the message (if it is not installed at /usr/share/zoneinfo) at all.

The first two lines look sufficient to me. WDYT?

ss << "Time zone file " << filename << " does not exist."
   << " Please install IANA time zone database and set TZDIR env.";

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you, let me change it.

@dongjoon-hyun dongjoon-hyun added this to the 2.0.0 milestone Aug 19, 2023
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (itr != timezoneCache.end()) {
return *(itr->second).get();
}
if (!std::filesystem::exists(std::filesystem::path(filename))) {
Copy link
Contributor

@coderex2522 coderex2522 Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM! @wgtmac However, the logic of determining whether a file exists can be integrated into the readLocalFile function by throwing a FileNotFound exception. This exception can be caught in the getTimezoneByFilename function and further thrown as a Timezone exception.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I just want to check proactively before reading it so the error message is clear. It may fail reading due to other reasons.

@wgtmac wgtmac closed this in 873d00c Aug 23, 2023
@wgtmac
Copy link
Member Author

wgtmac commented Aug 23, 2023

I've merged it. Thanks @mystic-lama @dongjoon-hyun @coderex2522!

@dongjoon-hyun
Copy link
Member

Thank you so much, @wgtmac and all.

@rafal-c
Copy link

rafal-c commented Nov 9, 2023

Sorry for writing in a closed thread but I think the question fits here. I downloaded the database from https://data.iana.org/time-zones/releases/tzdata2023c.tar.gz unpacked it, set the env variable TZDIR to the unpacked directory and still can't get ORC to work as it looks for a file called "UTC" but files in the database are mostly continent names like "africa", "asia", etc.
What am I missing?

@scw
Copy link

scw commented Nov 10, 2023

Sorry for writing in a closed thread but I think the question fits here. I downloaded the database from https://data.iana.org/time-zones/releases/tzdata2023c.tar.gz unpacked it, set the env variable TZDIR to the unpacked directory and still can't get ORC to work as it looks for a file called "UTC" but files in the database are mostly continent names like "africa", "asia", etc. What am I missing?

You still need to build the package, while that has data is is still a source distribution. For example, see how conda-forge does it here: https://github.com/conda-forge/tzdata-feedstock/blob/main/recipe/build.sh

Once you've built it, you should see the expected files that can be used for TZDIR

cxzl25 pushed a commit to cxzl25/orc that referenced this pull request Jan 11, 2024
### What changes were proposed in this pull request?
Check the availability of TZDB and throw helpful error message before it crashes.

### Why are the changes needed?
When local IANA TZDB is unavailable or TZDIR env is not properly set, getTimezoneByName() simply fails without much helpful guidance on what to do.

### How was this patch tested?
Added a test case TestTimezone.testMissingTZDB to make sure error message is as expected.

Closes apache#1587 from wgtmac/ORC-1481.

Authored-by: Gang Wu <ustcwg@gmail.com>
Signed-off-by: Gang Wu <ustcwg@gmail.com>
@wgtmac wgtmac deleted the ORC-1481 branch February 27, 2024 02:28
wgtmac pushed a commit that referenced this pull request Apr 12, 2024
### What changes were proposed in this pull request?

Find tzdb without having to set `TZDIR` when in a conda-environment (where `tzdata` [has](https://conda-metadata-app.streamlit.app/?q=conda-forge%2Fnoarch%2Ftzdata-2024a-h0c530f3_0.conda) a uniform location of `$CONDA_PREFIX/share/zoneinfo` across all platforms).

### Why are the changes needed?

This is due to issues in arrow (see apache/arrow#36026) that cannot really be fixed there, as it assumes that orc >=2.0 knows how to find the tzdb. Having to set `TZDIR` in all user environments is an intrusive change that should be avoided, and since the cost here is checking a single environment variable, it's hopefully not too onerous for consideration.

### How was this patch tested?

CI here

### Was this patch authored or co-authored using generative AI tooling?

No

CC wgtmac

See also: #1587

Closes #1882 from h-vetinari/tzdb.

Authored-by: H. Vetinari <h.vetinari@gmx.com>
Signed-off-by: Gang Wu <ustcwg@gmail.com>
wgtmac pushed a commit that referenced this pull request Apr 12, 2024
### What changes were proposed in this pull request?

Find tzdb without having to set `TZDIR` when in a conda-environment (where `tzdata` [has](https://conda-metadata-app.streamlit.app/?q=conda-forge%2Fnoarch%2Ftzdata-2024a-h0c530f3_0.conda) a uniform location of `$CONDA_PREFIX/share/zoneinfo` across all platforms).

### Why are the changes needed?

This is due to issues in arrow (see apache/arrow#36026) that cannot really be fixed there, as it assumes that orc >=2.0 knows how to find the tzdb. Having to set `TZDIR` in all user environments is an intrusive change that should be avoided, and since the cost here is checking a single environment variable, it's hopefully not too onerous for consideration.

### How was this patch tested?

CI here

### Was this patch authored or co-authored using generative AI tooling?

No

CC wgtmac

See also: #1587

Closes #1882 from h-vetinari/tzdb.

Authored-by: H. Vetinari <h.vetinari@gmx.com>
Signed-off-by: Gang Wu <ustcwg@gmail.com>
(cherry picked from commit e89ca33)
Signed-off-by: Gang Wu <ustcwg@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants