- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
How to handle out of range timestamps for impala with parquet format?
- Labels:
-
Apache Hive
-
Apache Impala
Created on ‎07-23-2015 02:35 AM - edited ‎09-16-2022 02:35 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I tried creating a table with the following command:
create table foo stored as parquet as select cast('0001-01-01 00:00:00' as timestamp) ts;
If I launch such command in Impala, the timestamp becomes a NULL value and from that moment on both Hive and Impala see the content as NULL. However, if I create the table using Hive the timestamp is preserved and I can see it correctly within Hive, but Impala shows the following error:
Year is out of valid range: 1400..10000
How can I handle this issue? I need both parquet format and the Impala connection.
Created ‎07-24-2015 12:15 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Do you really need dates before the year 1400 or after 10000?
Impala has a different supported date range than Hive due to how timestamps are handled internally (Impala uses Boost, Hive uses the Java built-ins)
Created ‎07-24-2015 09:31 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎07-29-2015 11:47 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Since the differences in the two systems are due to their implementation, I'd say you have the following options:
1. Use a differnet type, e.g., STRING. When concerting from STRING to TIMESTAMP you will encounter the same issues though.
2. Change your ingestion pipeline to enforce a timestamp range that is valid in both systems. This assumes that your a date with year 0000 would be considered "garbage" by your application.
3. Live with the fact that a NULL timestamp could mean it is out of range.
May I ask what exactly is causing the headache? The fact that both systems return different results or the fact that for your application the year 0000 is a meaningful date?
