Member since
06-23-2017
8
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3312 | 01-03-2019 03:04 PM |
01-03-2019
03:04 PM
I figured out that if I use regex_replace, I can work around the data issue. select
xpath_string(regex_replace(xml_column, '\\?\<', ''), '//Root [@name="OfferId"]/@value')
from table;
... View more
01-02-2019
10:57 PM
We have a Hive External table where a couple of columns have XML data in them. The XML structure is valid but some of the records contain "?<" or "?>" in the data of an element. As you can see, the XML itself is valid as the value is quoted but Hive xpath is unhappy. Trying to figure out if there's a better way to escape out the data elements or have xpath ignore the extra "<". Example: <Root><data1><data2 id="4" value="622" name="CoID" /><data2 id="9999" value="Company XYZ" name="Company Name" /><data2 id="9999" value="2222345=0000?<" name="OfferId" /></data1></Root> Query: select
xpath_string(xml_column, '//Root [@name="OfferId"]/@value')
from table; Here is the Error that we receive: Caused by: org.xml.sax.SAXParseException; lineNumber: 1; columnNumber: 284; The value of attribute "value" associated with an element type "Root" must not contain the '<' character.
at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)
at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)
at org.apache.hadoop.hive.ql.udf.xml.UDFXPathUtil.eval(UDFXPathUtil.java:90)
... 35 more
... View more
Labels:
- Labels:
-
Apache Hive
04-02-2018
03:57 PM
Thank you @Alessio Palma, this was the solution we decided to use... pretty straight forward!
... View more
03-12-2018
04:55 AM
@Chris Parker, You have me waiting in anticipation, sir...
... View more
03-08-2018
05:08 PM
I have a data flow that I'd like to add some data to using a MS SQL datasource for the lookup. I've attempted to follow @Matt Burgess' post, SQL in NiFi with ExecuteScript, but unfortunately I can seem to get my head wrapped around how to accomplish this. I've also read through the forum topic How to get DBCP service inside ScriptedLookupService and tried to accomplish the same thing through the LookupRecord processor but can't seem to get anything to work. I know it's because I don't have the correct objects or methods defined. Unfortunately, I'm not a programmer by trade so some concepts do allude me. Usually I'm pretty good at figuring out things by example but this one has me stumped so I'm reaching out for some help. I think using the LookupRecord method would be best as it would seem to be easier to inject it into the current flow that's already setup. But I can't get the ScriptedLookupService in the controller services to work following the code examples from the above linked posts. I'm good with not doing the name lookup for the DBConntionPool service, I'll use the Id if it makes the code simpler. And I think I get the concept that the ScriptedLookupService doesn't include objects that say the ExecuteScript processor does. If someone has done something similar to what I'm trying to accomplish, and would not mind providing some more detailed examples or pushing me in a more correct direction (even if its off a cliff), I would be very grateful!
... View more
Labels:
- Labels:
-
Apache NiFi
01-29-2018
04:29 PM
In our case it was a resource allocation issue. We had our queues not set up optimally and we also changed some memory configurations around to allow the initial query process to run properly. Basically, the default queue didn't have any resources and we noticed that when a query first is submitted, there's a process that runs in the default queue before submitting the job to the queue specified.
... View more
10-19-2017
02:47 AM
I'm experiencing this same issue btw. I'm wondering if we're running out of resources for tez. I have a support case open with Hortonworks and will let you know what comes of it. We usually have to restart Hive in order to for more Tez/Hive jobs to run.
... View more