Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Will impala support xml data type?

avatar
Expert Contributor

Anybody has an experience how to process xml data (imported from MSSQL) and how to store and analyze them in Impala?

Thanks

Tomas

 

1 ACCEPTED SOLUTION

avatar
Guru

I got this from one of our engineers:

 

Impala doesn't support xml natively. Instead can you convert the xml data into one of the supported formats [1] using hive and work with them from impala.
You can probably use hive to create XML based tables using xml serde [2] and then use hive to convert the data to avro based table using "insert overwrite avro_table select * from xml_table". Just make sure you create the avro_table using the avro serde and hive's insert overwrite takes care of format conversion. Btw this xml serde [2] is a third party package that we didn't test with CDH. You can probably give it a try.



[1] http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_file_formats.htm...
[2] https://github.com/dvasilen/Hive-XML-SerDe

View solution in original post

1 REPLY 1

avatar
Guru

I got this from one of our engineers:

 

Impala doesn't support xml natively. Instead can you convert the xml data into one of the supported formats [1] using hive and work with them from impala.
You can probably use hive to create XML based tables using xml serde [2] and then use hive to convert the data to avro based table using "insert overwrite avro_table select * from xml_table". Just make sure you create the avro_table using the avro serde and hive's insert overwrite takes care of format conversion. Btw this xml serde [2] is a third party package that we didn't test with CDH. You can probably give it a try.



[1] http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_file_formats.htm...
[2] https://github.com/dvasilen/Hive-XML-SerDe