10-27-2017 11:51 AM
We have a Map-Reduce job that uses ParquetOutputFormat to write parquet files that are subsequently used for Impala queries. We would like to add a column that is a complex type:
I'm wondering if anyone has done that and if there is some example code that I can see.
I tried googling, but was unsuccessful. I know it can be done, because Hive will do it. Thanks in advance
10-27-2017 02:02 PM
Please correct me if I misunderstood your question.
The Impala docs should tell you how to create and query a table with complex types in Impala. Please be aware that Impala can only read complex types in Parquet. Impala cannot write complex types.
10-27-2017 02:33 PM
Yes. That is my question.
We currently use parquet.hadoop.ParquetOutputFormat with parquet.example.data.Group (a somewhat suspicious package name). If another mechanism is better, I'd be happy to hear about it.