Support Questions

Find answers, ask questions, and share your expertise

How to handle out of range timestamps for impala with parquet format?

avatar
New Contributor

I tried creating a table with the following command:

 

create table foo
stored as parquet
as
select cast('0001-01-01 00:00:00' as timestamp) ts;

If I launch such command in Impala, the timestamp becomes a NULL value and from that moment on both Hive and Impala see the content as NULL. However, if I create the table using Hive the timestamp is preserved and I can see it correctly within Hive, but Impala shows the following error:

 

Year is out of valid range: 1400..10000

How can I handle this issue? I need both parquet format and the Impala connection.

 

3 REPLIES 3

avatar

Do you really need dates before the year 1400 or after 10000?

Impala has a different supported date range than Hive due to how timestamps are handled internally (Impala uses Boost, Hive uses the Java built-ins)

avatar
New Contributor
We have an AVRO file that we imported with sqoop that contains some dates with "0000" year and some other dates with "9999". Apparently the upper bound is not a problem and is common between the two systems, but the lower bound is giving us a headache. What is the standard practice to deal with this?

avatar

Since the differences in the two systems are due to their implementation, I'd say you have the following options:

 

1. Use a differnet type, e.g., STRING. When concerting from STRING to TIMESTAMP you will encounter the same issues though.

2. Change your ingestion pipeline to enforce a timestamp range that is valid in both systems. This assumes that your a date with year 0000 would be considered "garbage" by your application.

3. Live with the fact that a NULL timestamp could mean it is out of range.

 

May I ask what exactly is causing the headache? The fact that both systems return different results or the fact that for your application the year 0000 is a meaningful date?