Created 02-18-2016 05:47 PM
Using HDP 2.3.2 sandbox. This is the second error I got trying to get this working. The first error and "solution" can be found here.
I have an external table defined over a folder that contains XML documents. There is 1 column in this table with the column containing each documents data as a string.
I am trying to create a view on top of the XML data with xpaths. So for example,
CREATE VIEW myview (column1,...Column N) AS SELECT xpath_string(rawxml, '/my/xpath/to/value'), xpath_string(rawxml, '/another/xpath') FROM myxmltable;
The XML document has 400+ xpaths that I want to grab and put into the view. I can do about 60 columns worth of xpaths before I get this error.
FAILED: Hive Internal Error: com.sun.jersey.api.client.ClientHandlerException(java.io.IOException: java.io.IOException: Error writing to server) com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: java.io.IOException: Error writing to server at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149) at com.sun.jersey.api.client.Client.handle(Client.java:648)
My cursory research seems to indicate that the query string is too long and is breaking something. I am writing these queries on the hiveCLI so not sure how else I can fix this. I also tried using beeline and get the same error.
Created 02-24-2016 10:45 AM
Created 02-20-2016 05:51 PM
Can you write it as a script and execute script instead of as one query? You can pass script to beeline or hive shell or do source command within CLI. @kevin vasko
Created 02-23-2016 02:05 PM
I have not tried. I'll try it and see.
Created 02-24-2016 10:45 AM
Created 02-24-2016 11:46 PM
Thanks but same issue. How can I increase the value of how long of string Hive can take as a query?
I created a SimpleUDF that takes an input of the XML string and does all the xpath parsing on that file and returns a map type. I was hoping that getting rid of all the xpath calls would eliminate the issue but didn't work. I can now do SELECT m["key"] FROM (SELECT myfunc(xmldata) FROM xmlSource). But when I do SELECT m["key1"]....m[key400" FROM ...(...) I'm back at the "full HEAD" issue for some reason.
Created 02-25-2016 12:30 AM
Created 02-25-2016 12:55 AM
since there isn't really any hard limit, and 400 columns I shouldn't be enough to cause oom memory issues I'm not quite sure on what else to do. This issue to me purely looks like configuration/bugs in Hive or its dependencies. I posted this issue on the user mailing list but I haven't heard anything. Any suggestions?
Created 02-25-2016 02:21 AM
@Kevin Vasko I need more information on this.
Please share more information from logs
com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: java.io.IOException:Error writing to server
Created 02-25-2016 02:12 PM
@Neeraj Sabharwal There are two errors I've been fighting with on getting access to all of these columns in the same query. The second one I *thought* I had a work around for by disabling security (unchecking the security box in Ambari for Hive) but it keeps showing back up. Here is the defect which I think I'm running into for the FULL head issue.
UPDATE: I'm about 99.99% sure I figured out the problem! I started looking further into the ERROR logs. This line here "at org.apache.atlas.security.SecureClientUtils$1$1.run(SecureClientUtils.java:103)" tipped me off that in some way ATLAS was being interacted with. I disabled ATLAS by turning off the atlas service and removing
hive.exec.failure.hooks=org.apache.atlas.hive.hook.HiveHook
I ran my entire query and it worked without issue! I would venture to say that this is an issue with ATLAS not being able to handle really long queries.
https://issues.apache.org/jira/browse/HIVE-11720
1. Error writing to server
https://gist.github.com/kur1j/513e5a1499eef6c727a1
2. FULL head