Archives of Support Questions (Read Only)

This is an archived board for historical reference. Information and links may no longer be available or relevant
Announcements
This board is archived and read-only for historical reference. To ask a new question, please post a new topic on the appropriate active board.

Hive fails with IOException: Error writing to server on really long query

avatar
Expert Contributor

Using HDP 2.3.2 sandbox. This is the second error I got trying to get this working. The first error and "solution" can be found here.

https://community.hortonworks.com/questions/18007/hive-fails-with-hive-internal-error-message-full-h...

I have an external table defined over a folder that contains XML documents. There is 1 column in this table with the column containing each documents data as a string.

I am trying to create a view on top of the XML data with xpaths. So for example,

CREATE VIEW myview (column1,...Column N) AS SELECT xpath_string(rawxml, '/my/xpath/to/value'), xpath_string(rawxml, '/another/xpath') FROM myxmltable;

The XML document has 400+ xpaths that I want to grab and put into the view. I can do about 60 columns worth of xpaths before I get this error.

FAILED: Hive Internal Error: com.sun.jersey.api.client.ClientHandlerException(java.io.IOException: java.io.IOException: Error writing to server)
com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: java.io.IOException: Error writing to server
  at com.sun.jersey.client.urlconnection.URLConnectionClientHandler.handle(URLConnectionClientHandler.java:149)
  at com.sun.jersey.api.client.Client.handle(Client.java:648)

My cursory research seems to indicate that the query string is too long and is breaking something. I am writing these queries on the hiveCLI so not sure how else I can fix this. I also tried using beeline and get the same error.

1 ACCEPTED SOLUTION

avatar
Master Mentor
8 REPLIES 8

avatar
Master Mentor

Can you write it as a script and execute script instead of as one query? You can pass script to beeline or hive shell or do source command within CLI. @kevin vasko

avatar
Expert Contributor

I have not tried. I'll try it and see.

avatar
Master Mentor

avatar
Expert Contributor

Thanks but same issue. How can I increase the value of how long of string Hive can take as a query?

I created a SimpleUDF that takes an input of the XML string and does all the xpath parsing on that file and returns a map type. I was hoping that getting rid of all the xpath calls would eliminate the issue but didn't work. I can now do SELECT m["key"] FROM (SELECT myfunc(xmldata) FROM xmlSource). But when I do SELECT m["key1"]....m[key400" FROM ...(...) I'm back at the "full HEAD" issue for some reason.

avatar
Master Mentor

avatar
Expert Contributor

since there isn't really any hard limit, and 400 columns I shouldn't be enough to cause oom memory issues I'm not quite sure on what else to do. This issue to me purely looks like configuration/bugs in Hive or its dependencies. I posted this issue on the user mailing list but I haven't heard anything. Any suggestions?

avatar
Master Mentor

@Kevin Vasko I need more information on this.

Please share more information from logs

com.sun.jersey.api.client.ClientHandlerException: java.io.IOException: java.io.IOException:Error writing to server

avatar
Expert Contributor

@Neeraj Sabharwal There are two errors I've been fighting with on getting access to all of these columns in the same query. The second one I *thought* I had a work around for by disabling security (unchecking the security box in Ambari for Hive) but it keeps showing back up. Here is the defect which I think I'm running into for the FULL head issue.

UPDATE: I'm about 99.99% sure I figured out the problem! I started looking further into the ERROR logs. This line here "at org.apache.atlas.security.SecureClientUtils$1$1.run(SecureClientUtils.java:103)" tipped me off that in some way ATLAS was being interacted with. I disabled ATLAS by turning off the atlas service and removing

hive.exec.failure.hooks=org.apache.atlas.hive.hook.HiveHook

I ran my entire query and it worked without issue! I would venture to say that this is an issue with ATLAS not being able to handle really long queries.

https://issues.apache.org/jira/browse/HIVE-11720

1. Error writing to server

https://gist.github.com/kur1j/513e5a1499eef6c727a1

2. FULL head

https://gist.github.com/kur1j/217eae2065c7953d9cf7