- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Will impala support xml data type?
- Labels:
-
Apache Impala
Created on ‎01-06-2015 06:21 AM - edited ‎09-16-2022 02:17 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Anybody has an experience how to process xml data (imported from MSSQL) and how to store and analyze them in Impala?
Thanks
Tomas
Created ‎02-19-2015 02:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I got this from one of our engineers:
Impala doesn't support xml natively. Instead can you convert the xml data into one of the supported formats [1] using hive and work with them from impala.
You can probably use hive to create XML based tables using xml serde [2] and then use hive to convert the data to avro based table using "insert overwrite avro_table select * from xml_table". Just make sure you create the avro_table using the avro serde and hive's insert overwrite takes care of format conversion. Btw this xml serde [2] is a third party package that we didn't test with CDH. You can probably give it a try.
[1] http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_file_formats.htm...
[2] https://github.com/dvasilen/Hive-XML-SerDe
Created ‎02-19-2015 02:41 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I got this from one of our engineers:
Impala doesn't support xml natively. Instead can you convert the xml data into one of the supported formats [1] using hive and work with them from impala.
You can probably use hive to create XML based tables using xml serde [2] and then use hive to convert the data to avro based table using "insert overwrite avro_table select * from xml_table". Just make sure you create the avro_table using the avro serde and hive's insert overwrite takes care of format conversion. Btw this xml serde [2] is a third party package that we didn't test with CDH. You can probably give it a try.
[1] http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/impala_file_formats.htm...
[2] https://github.com/dvasilen/Hive-XML-SerDe
