Using HDP 2.3.2 sandbox. This is the second error I got trying to get this working. The first error and "solution" can be found here.
I have an external table defined over a folder that contains XML documents. There is 1 column in this table with the column containing each documents data as a string.
I am trying to create a view on top of the XML data with xpaths. So for example,
CREATE VIEW myview (column1,...Column N) AS SELECT xpath_string(rawxml, '/my/xpath/to/value'), xpath_string(rawxml, '/another/xpath') FROM myxmltable;
The XML document has 400+ xpaths that I want to grab and put into the view. I can do about 60 columns worth of xpaths before I get this error.
FAILED: Hive Internal Error: com.sun.jersey.api.client.ClientHandlerException(java.io.IOException: java.io.IOException: Error writing to server) com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: java.io.IOException: Error writing to server at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648)
My cursory research seems to indicate that the query string is too long and is breaking something. I am writing these queries on the hiveCLI so not sure how else I can fix this. I also tried using beeline and get the same error.
Can you write it as a script and execute script instead of as one query? You can pass script to beeline or hive shell or do source command within CLI. @kevin vasko
Thanks but same issue. How can I increase the value of how long of string Hive can take as a query?
I created a SimpleUDF that takes an input of the XML string and does all the xpath parsing on that file and returns a map type. I was hoping that getting rid of all the xpath calls would eliminate the issue but didn't work. I can now do SELECT m["key"] FROM (SELECT myfunc(xmldata) FROM xmlSource). But when I do SELECT m["key1"]....m[key400" FROM ...(...) I'm back at the "full HEAD" issue for some reason.
since there isn't really any hard limit, and 400 columns I shouldn't be enough to cause oom memory issues I'm not quite sure on what else to do. This issue to me purely looks like configuration/bugs in Hive or its dependencies. I posted this issue on the user mailing list but I haven't heard anything. Any suggestions?
@Neeraj Sabharwal There are two errors I've been fighting with on getting access to all of these columns in the same query. The second one I *thought* I had a work around for by disabling security (unchecking the security box in Ambari for Hive) but it keeps showing back up. Here is the defect which I think I'm running into for the FULL head issue.
UPDATE: I'm about 99.99% sure I figured out the problem! I started looking further into the ERROR logs. This line here "at org.apache.atlas.security.SecureClientUtils$1$1.run(SecureClientUtils.java:103)" tipped me off that in some way ATLAS was being interacted with. I disabled ATLAS by turning off the atlas service and removing
I ran my entire query and it worked without issue! I would venture to say that this is an issue with ATLAS not being able to handle really long queries.
1. Error writing to server
2. FULL head