Support Questions

Find answers, ask questions, and share your expertise

XML and Hive parsing error with Serde.

avatar
Contributor

I am trying to ingest a simple xml to hive table. Table is created but when executing select query on that table getting below error:

org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: RuntimeException java.lang.ClassNotFoundException: com.ibm.spss.hive.serde2.xml.XmlInputFormat

I followed this article: https://community.hortonworks.com/content/kbentry/972/hive-and-xml-pasring.html

also modified the value :

hive.tez.input.format= org.apache.hadoop.hive.ql.io.CombineHiveInputFormat which was before

hive.tez.input.format= org.apache.hadoop.hive.ql.io.HiveInputFormat

Any help to resolve this issue greatly appreciated.

Thank you.

 org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: FAILED: RuntimeException java.lang.ClassNotFoundException: com.ibm.spss.hive.serde2.xml.XmlInputFormat
1 ACCEPTED SOLUTION

avatar

Hello @Marshal Tito!
Could you check the following?
1 - Hive user has permission to read the jar? Try to chmod 777 to the jar.
2 - Every time that you run a query that needs the jar, did you add the jar first in the same session?
(cause using add jar command, you will need to add the jar for every query that hit the table with the specific jar)
3 - One test that you can give it a shot is to add the jar in the hive.aux.jars.path (to take effect you will need to restart hive afterwards).

Lastly, If nothing works, I'd try to enable the debug for HiveCLI and take the same steps and watch if something shows up on the logs in the console:

hive --hiveconf hive.root.logger=DEBUG,console

ps: I've also followed the article and was working fine for me.
Hope this helps!

View solution in original post

8 REPLIES 8

avatar
Contributor

@Neeraj Sabharwal would you please look into this? i am getting error while following your article "xml and HIVE parsing".

To check jar i got this:

hive> list jars;

/tmp/hivexmlserde-1.0.5.3.jar

also executed command to see the specific class:

[root@sandbox-hdp tmp]# jar -tf hivexmlserde-1.0.5.3.jar | grep -icom.ibm.spss.hive.serde2.xml.XmlInputFormat
com/ibm/spss/hive/serde2/xml/XmlInputFormat$XmlRecordReader.class
com/ibm/spss/hive/serde2/xml/XmlInputFormat.class

avatar

Hello @Marshal Tito!
Could you check the following?
1 - Hive user has permission to read the jar? Try to chmod 777 to the jar.
2 - Every time that you run a query that needs the jar, did you add the jar first in the same session?
(cause using add jar command, you will need to add the jar for every query that hit the table with the specific jar)
3 - One test that you can give it a shot is to add the jar in the hive.aux.jars.path (to take effect you will need to restart hive afterwards).

Lastly, If nothing works, I'd try to enable the debug for HiveCLI and take the same steps and watch if something shows up on the logs in the console:

hive --hiveconf hive.root.logger=DEBUG,console

ps: I've also followed the article and was working fine for me.
Hope this helps!

avatar
Contributor

Hi @Vinicius Higa Murakami ,

Thank you so much for your detailed reply. I tried all the first 3 steps except the 4th one. Now when i am executing any Select statement, getting below error:

Failed to fetch next batch for the Resultset
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.NullPointerException

avatar

Hello @Marshal Tito!
Which message do you face if you enable the DEBUG?
And also, just to confirm, do you see your jar added in the aux libs?

hive -e "set hive.aux.jars.path;"

Hope this helps.

avatar
Contributor

Hi @Vinicius Higa Murakami ,

Thanks a lot. Finally it worked. The last problem was due to ” instead of " in the xml file, which was preventing Serde to read the xml file. Thank you and i do appreciate your kind help.

avatar
Contributor

Hi @Vinicius Higa Murakami ,

I have included the jar in hive-site.xml with property name <hive.aux.jars.path>. my property is like below:But still from Ambari every time i need to add the jar file to execute any query. I want to write query without "add jar" each time. how can i do that?

 <property>
      <name>hive.aux.jars.path</name>
      <value>/tmp/hivexmlserde-1.0.5.3.jar</value>
    </property>

avatar

Hi @Marshal Tito!
I'm glad you made it 🙂
So, regarding the aux jar, did you set this through ambari? If so, you should be able to use the jar without add.

Otherwise, try to take these steps:

https://community.hortonworks.com/content/supportkb/48734/how-to-permanently-add-custom-jar-files-to...

Hope this helps.

avatar
Contributor

Hi @Vinicius Higa Murakami ,

Thanks for your response. It worked just executing the below query in Hive View @ Ambari.

add jar hdfs:///tmp/hivexmlserde-1.0.5.3.jar;

Thank you so much.. 🙂