Support Questions

lucaluca · ‎07-23-2015

I tried creating a table with the following command:

create table foo
stored as parquet
as
select cast('0001-01-01 00:00:00' as timestamp) ts;

If I launch such command in Impala, the timestamp becomes a NULL value and from that moment on both Hive and Impala see the content as NULL. However, if I create the table using Hive the timestamp is preserved and I can see it correctly within Hive, but Impala shows the following error:

Year is out of valid range: 1400..10000

How can I handle this issue? I need both parquet format and the Impala connection.

alex.behm · ‎07-24-2015

Do you really need dates before the year 1400 or after 10000?

Impala has a different supported date range than Hive due to how timestamps are handled internally (Impala uses Boost, Hive uses the Java built-ins)

lucaluca · ‎07-24-2015

We have an AVRO file that we imported with sqoop that contains some dates with "0000" year and some other dates with "9999". Apparently the upper bound is not a problem and is common between the two systems, but the lower bound is giving us a headache. What is the standard practice to deal with this?

alex.behm · ‎07-29-2015

Since the differences in the two systems are due to their implementation, I'd say you have the following options:

1. Use a differnet type, e.g., STRING. When concerting from STRING to TIMESTAMP you will encounter the same issues though.

2. Change your ingestion pipeline to enforce a timestamp range that is valid in both systems. This assumes that your a date with year 0000 would be considered "garbage" by your application.

3. Live with the fact that a NULL timestamp could mean it is out of range.

May I ask what exactly is causing the headache? The fact that both systems return different results or the fact that for your application the year 0000 is a meaningful date?

Cloudera Community

Support Questions

How to handle out of range timestamps for impala with parquet format?