Reply
Highlighted
Explorer
Posts: 10
Registered: ‎10-16-2014

Writing complex types using ParquetOutputFormat

We have a Map-Reduce job that uses ParquetOutputFormat to write parquet files that are subsequently used for Impala queries. We would like to add a column that is a complex type:

widgets ARRAY<STRUCT<c1:STRING,c2:STRING,c3:INT,c4:INT>>  

 

I'm wondering if anyone has done that and if there is some example code that I can see.

 

I tried googling, but was unsuccessful. I know it can be done, because Hive will do it. Thanks in advance

Cloudera Employee
Posts: 307
Registered: ‎10-16-2013

Re: Writing complex types using ParquetOutputFormat

Please correct me if I misunderstood your question.

 

The Impala docs should tell you how to create and query a table with complex types in Impala. Please be aware that Impala can only read complex types in Parquet. Impala cannot write complex types.

 

Documentation:

https://www.cloudera.com/documentation/enterprise/latest/topics/impala_complex_types.html

 

Examples:

https://blog.cloudera.com/blog/2015/11/new-in-cloudera-enterprise-5-5-support-for-complex-types-in-i...

 

Cloudera Employee
Posts: 307
Registered: ‎10-16-2013

Re: Writing complex types using ParquetOutputFormat

Or is your question on how to write a MapReduce job to produce a complex type using the ParquetOutputFormat?

Explorer
Posts: 10
Registered: ‎10-16-2014

Re: Writing complex types using ParquetOutputFormat

Yes.  That is my question.  

 

We currently use parquet.hadoop.ParquetOutputFormat with parquet.example.data.Group (a somewhat suspicious package name).  If another mechanism is better, I'd be happy to hear about it.

 

 

Cloudera Employee
Posts: 307
Registered: ‎10-16-2013

Re: Writing complex types using ParquetOutputFormat

Sorry, I don't know how to do that with MR.

 

Hopefully somebody more knowledgable can chime in and help you.

Announcements