We have a Map-Reduce job that uses ParquetOutputFormat to write parquet files that are subsequently used for Impala queries. We would like to add a column that is a complex type:
I'm wondering if anyone has done that and if there is some example code that I can see.
I tried googling, but was unsuccessful. I know it can be done, because Hive will do it. Thanks in advance
Please correct me if I misunderstood your question.
The Impala docs should tell you how to create and query a table with complex types in Impala. Please be aware that Impala can only read complex types in Parquet. Impala cannot write complex types.
Or is your question on how to write a MapReduce job to produce a complex type using the ParquetOutputFormat?
Yes. That is my question.
We currently use parquet.hadoop.ParquetOutputFormat with parquet.example.data.Group (a somewhat suspicious package name). If another mechanism is better, I'd be happy to hear about it.
Sorry, I don't know how to do that with MR.
Hopefully somebody more knowledgable can chime in and help you.