Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Will impala support xml data type?

Solved Go to solution

Will impala support xml data type?

Rising Star

Anybody has an experience how to process xml data (imported from MSSQL) and how to store and analyze them in Impala?

Thanks

Tomas

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Will impala support xml data type?

Master Collaborator

I got this from one of our engineers:

 

Impala doesn't support xml natively. Instead can you convert the xml data into one of the supported formats [1] using hive and work with them from impala.
You can probably use hive to create XML based tables using xml serde [2] and then use hive to convert the data to avro based table using "insert overwrite avro_table select * from xml_table". Just make sure you create the avro_table using the avro serde and hive's insert overwrite takes care of format conversion. Btw this xml serde [2] is a third party package that we didn't test with CDH. You can probably give it a try.



[1] http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_file_formats.htm...
[2] https://github.com/dvasilen/Hive-XML-SerDe

1 REPLY 1

Re: Will impala support xml data type?

Master Collaborator

I got this from one of our engineers:

 

Impala doesn't support xml natively. Instead can you convert the xml data into one of the supported formats [1] using hive and work with them from impala.
You can probably use hive to create XML based tables using xml serde [2] and then use hive to convert the data to avro based table using "insert overwrite avro_table select * from xml_table". Just make sure you create the avro_table using the avro serde and hive's insert overwrite takes care of format conversion. Btw this xml serde [2] is a third party package that we didn't test with CDH. You can probably give it a try.



[1] http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_file_formats.htm...
[2] https://github.com/dvasilen/Hive-XML-SerDe