Created on 07-23-2015 02:35 AM - edited 09-16-2022 02:35 AM
I tried creating a table with the following command:
create table foo stored as parquet as select cast('0001-01-01 00:00:00' as timestamp) ts;
If I launch such command in Impala, the timestamp becomes a NULL value and from that moment on both Hive and Impala see the content as NULL. However, if I create the table using Hive the timestamp is preserved and I can see it correctly within Hive, but Impala shows the following error:
Year is out of valid range: 1400..10000
How can I handle this issue? I need both parquet format and the Impala connection.
Created 07-24-2015 12:15 AM
Do you really need dates before the year 1400 or after 10000?
Impala has a different supported date range than Hive due to how timestamps are handled internally (Impala uses Boost, Hive uses the Java built-ins)
Created 07-24-2015 09:31 AM
Created 07-29-2015 11:47 AM
Since the differences in the two systems are due to their implementation, I'd say you have the following options:
1. Use a differnet type, e.g., STRING. When concerting from STRING to TIMESTAMP you will encounter the same issues though.
2. Change your ingestion pipeline to enforce a timestamp range that is valid in both systems. This assumes that your a date with year 0000 would be considered "garbage" by your application.
3. Live with the fact that a NULL timestamp could mean it is out of range.
May I ask what exactly is causing the headache? The fact that both systems return different results or the fact that for your application the year 0000 is a meaningful date?