Member since
06-23-2017
8
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 4346 | 01-03-2019 03:04 PM |
01-03-2019
03:04 PM
I figured out that if I use regex_replace, I can work around the data issue. select
xpath_string(regex_replace(xml_column, '\\?\<', ''), '//Root [@name="OfferId"]/@value')
from table;
... View more
01-02-2019
10:57 PM
We have a Hive External table where a couple of columns have XML data in them. The XML structure is valid but some of the records contain "?<" or "?>" in the data of an element. As you can see, the XML itself is valid as the value is quoted but Hive xpath is unhappy. Trying to figure out if there's a better way to escape out the data elements or have xpath ignore the extra "<". Example: <Root><data1><data2 id="4" value="622" name="CoID" /><data2 id="9999" value="Company XYZ" name="Company Name" /><data2 id="9999" value="2222345=0000?<" name="OfferId" /></data1></Root> Query: select
xpath_string(xml_column, '//Root [@name="OfferId"]/@value')
from table; Here is the Error that we receive: Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 284; The value of attribute "value" associated with an element type "Root" must not contain the '<' character.
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.eval(UDFXPathUtil.java:90)
... 35 more
... View more
Labels:
- Labels:
-
Apache Hive