Support Questions

Find answers, ask questions, and share your expertise

Writing complex types using ParquetOutputFormat


We have a Map-Reduce job that uses ParquetOutputFormat to write parquet files that are subsequently used for Impala queries. We would like to add a column that is a complex type:

widgets ARRAY<STRUCT<c1:STRING,c2:STRING,c3:INT,c4:INT>>  


I'm wondering if anyone has done that and if there is some example code that I can see.


I tried googling, but was unsuccessful. I know it can be done, because Hive will do it. Thanks in advance


Master Collaborator

Please correct me if I misunderstood your question.


The Impala docs should tell you how to create and query a table with complex types in Impala. Please be aware that Impala can only read complex types in Parquet. Impala cannot write complex types.






Master Collaborator

Or is your question on how to write a MapReduce job to produce a complex type using the ParquetOutputFormat?


Yes.  That is my question.  


We currently use parquet.hadoop.ParquetOutputFormat with (a somewhat suspicious package name).  If another mechanism is better, I'd be happy to hear about it.



Master Collaborator

Sorry, I don't know how to do that with MR.


Hopefully somebody more knowledgable can chime in and help you.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.