Created on 01-06-2015 06:21 AM - edited 09-16-2022 02:17 AM
Anybody has an experience how to process xml data (imported from MSSQL) and how to store and analyze them in Impala?
Thanks
Tomas
Created 02-19-2015 02:41 PM
I got this from one of our engineers:
Impala doesn't support xml natively. Instead can you convert the xml data into one of the supported formats [1] using hive and work with them from impala.
You can probably use hive to create XML based tables using xml serde [2] and then use hive to convert the data to avro based table using "insert overwrite avro_table select * from xml_table". Just make sure you create the avro_table using the avro serde and hive's insert overwrite takes care of format conversion. Btw this xml serde [2] is a third party package that we didn't test with CDH. You can probably give it a try.
[1] http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_file_formats.htm...
[2] https://github.com/dvasilen/Hive-XML-SerDe
Created 02-19-2015 02:41 PM
I got this from one of our engineers:
Impala doesn't support xml natively. Instead can you convert the xml data into one of the supported formats [1] using hive and work with them from impala.
You can probably use hive to create XML based tables using xml serde [2] and then use hive to convert the data to avro based table using "insert overwrite avro_table select * from xml_table". Just make sure you create the avro_table using the avro serde and hive's insert overwrite takes care of format conversion. Btw this xml serde [2] is a third party package that we didn't test with CDH. You can probably give it a try.
[1] http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_file_formats.htm...
[2] https://github.com/dvasilen/Hive-XML-SerDe