Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

XMLSerDe not working when writing data to another table

XMLSerDe not working when writing data to another table

New Contributor

I have created an XML Table with XML Serde in Hive (I'm using hortonworks 2.6.1):

CREATE EXTERNAL TABLE xml_factura(id STRING, version STRING, ambiente STRING, tipoEmision STRING, razonSocial STRING, 
                                  nombreComercial STRING, ruc STRING,claveAcceso STRING,codDoc STRING,estab STRING,
                                  ptoEmi STRING,secuencial STRING,dirMatriz STRING,fechaEmision STRING,dirEstablecimiento STRING,
                                  contribuyenteEspecial STRING, obligadoContabilidad STRING,tipoIdentificacionComprador STRING,
                                  razonSocialComprador STRING,identificacionComprador STRING,totalSinImpuestos STRING,
                                  totalDescuento STRING,totalImpuesto ARRAY <map<STRING,STRING>>,propina STRING,
                                  importeTotal STRING,moneda STRING,detalle ARRAY <map<STRING,STRING>>,
                                  infoAdicional STRUCT<nombre:STRING,campoAdicional:STRING>) 
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe' 
WITH SERDEPROPERTIES ( "column.xpath.id"="/factura/@id", 
                      "column.xpath.version"="/factura/@version", 
                      "column.xpath.ambiente"="/factura/infoTributaria/ambiente/text()", 
                      "column.xpath.tipoEmision"="/factura/infoTributaria/tipoEmision/text()", 
                      "column.xpath.razonSocial"="/factura/infoTributaria/razonSocial/text()", 
                      "column.xpath.nombreComercial"="/factura/infoTributaria/nombreComercial/text()", 
                      "column.xpath.ruc"="/factura/infoTributaria/ruc/text()", 
                      "column.xpath.razonSocial"="/factura/infoTributaria/razonSocial/text()", 
                      "column.xpath.claveAcceso"="/factura/infoTributaria/claveAcceso/text()", 
                      "column.xpath.codDoc"="/factura/infoTributaria/codDoc/text()", 
                      "column.xpath.estab"="/factura/infoTributaria/estab/text()", 
                      "column.xpath.ptoEmi"="/factura/infoTributaria/ptoEmi/text()", 
                      "column.xpath.secuencial"="/factura/infoTributaria/secuencial/text()", 
                      "column.xpath.dirMatriz"="/factura/infoTributaria/dirMatriz/text()", 
                      "column.xpath.fechaEmision"="/factura/infoFactura/fechaEmision/text()", 
                      "column.xpath.dirEstablecimiento"="/factura/infoFactura/dirEstablecimiento/text()", 
                      "column.xpath.contribuyenteEspecial"="/factura/infoFactura/contribuyenteEspecial/text()", 
                      "column.xpath.obligadoContabilidad"="/factura/infoFactura/obligadoContabilidad/text()",
                      "column.xpath.tipoIdentificacionComprador"="/factura/infoFactura/tipoIdentificacionComprador/text()", 
                      "column.xpath.razonSocialComprador"="/factura/infoFactura/razonSocialComprador/text()", 
                      "column.xpath.identificacionComprador"="/factura/infoFactura/identificacionComprador/text()", 
                      "column.xpath.totalSinImpuestos"="/factura/infoFactura/totalSinImpuestos/text()", 
                      "column.xpath.totalDescuento"="/factura/infoFactura/totalDescuento/text()", 
                      "column.xpath.totalImpuesto"="/factura/infoFactura/totalConImpuestos/totalImpuesto", 
                      "column.xpath.propina"="/factura/infoFactura/propina/text()", 
                      "column.xpath.importeTotal"="/factura/infoFactura/importeTotal/text()", 
                      "column.xpath.moneda"="/factura/infoFactura/moneda/text()", 
                      "column.xpath.detalle"="/factura/detalles/detalle", 
                      "column.xpath.infoAdicional"="/factura/infoAdicional/campoAdicional" ) 
STORED AS INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat' 
LOCATION '/user/mamv141114/xmlfacturas' 
TBLPROPERTIES ( "xmlinput.start"="<factura id", "xmlinput.end"="</factura>" ) 

I have uploaded three xml files and I'm able to Query the table (SELECT * , SELECT count(*), etc...) . Everything works fine.

Then I created the following ORC table:

CREATE TABLE xml_factura_orc (id STRING, version STRING, ambiente STRING, tipoEmision STRING, razonSocial STRING, nombreComercial STRING, ruc STRING,claveAcceso STRING,codDoc STRING,estab STRING, ptoEmi STRING,secuencial STRING,dirMatriz STRING,fechaEmision STRING,dirEstablecimiento STRING, contribuyenteEspecial STRING, obligadoContabilidad STRING,tipoIdentificacionComprador STRING, razonSocialComprador STRING,identificacionComprador STRING,totalSinImpuestos STRING, totalDescuento STRING,totalImpuesto ARRAY <map<STRING,STRING>>,propina STRING, importeTotal STRING,moneda STRING, detalle ARRAY <map<STRING,STRING>>, infoAdicional STRUCT<nombre:STRING,campoAdicional:STRING>) STORED AS ORC tblproperties ("orc.compress"="SNAPPY");

When I try to insert all the data from this table into this ORC table with this sentence:

INSERT OVERWRITE TABLE facturas.xml_factura_orc 
SELECT id, version, ambiente, tipoEmision, razonSocial, nombreComercial, ruc, claveAcceso, codDoc, estab, ptoEmi, secuencial, dirMatriz, fechaEmision, dirEstablecimiento, contribuyenteEspecial, obligadoContabilidad, tipoIdentificacionComprador, razonSocialComprador, identificacionComprador, totalSinImpuestos, totalDescuento, totalImpuesto, propina, importeTotal, moneda, detalle, infoAdicional FROM facturas.xml_factura;

I get the following error:

Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1506550557761_0005_1_00, diagnostics=[Vertex vertex_1506550557761_0005_1_00 [Map 1] killed/failed due to:INIT_FAILURE, Fail to create InputInitializerManager, org.apache.tez.dag.api.TezReflectionException: Unable to instantiate class with 1 arguments: org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator
    at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:70)
    at org.apache.tez.common.ReflectionUtils.createClazzInstance(ReflectionUtils.java:89)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:151)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$1.run(RootInputInitializerManager.java:148)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager.createInitializer(RootInputInitializerManager.java:148)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager.runInputInitializers(RootInputInitializerManager.java:121)
    at org.apache.tez.dag.app.dag.impl.VertexImpl.setupInputInitializerManager(VertexImpl.java:4620)
    at org.apache.tez.dag.app.dag.impl.VertexImpl.access$4400(VertexImpl.java:202)
    at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.handleInitEvent(VertexImpl.java:3436)
    at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3385)
    at org.apache.tez.dag.app.dag.impl.VertexImpl$InitTransition.transition(VertexImpl.java:3366)
    at org.apache.hadoop.yarn.state.StateMachineFactory$MultipleInternalArc.doTransition(StateMachineFactory.java:385)
    at org.apache.hadoop.yarn.state.StateMachineFactory.doTransition(StateMachineFactory.java:302)
    at org.apache.hadoop.yarn.state.StateMachineFactory.access$300(StateMachineFactory.java:46)
    at org.apache.hadoop.yarn.state.StateMachineFactory$InternalStateMachine.doTransition(StateMachineFactory.java:448)
    at org.apache.tez.state.StateMachineTez.doTransition(StateMachineTez.java:57)
    at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:1938)
    at org.apache.tez.dag.app.dag.impl.VertexImpl.handle(VertexImpl.java:201)
    at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2080)
    at org.apache.tez.dag.app.DAGAppMaster$VertexEventDispatcher.handle(DAGAppMaster.java:2066)
    at org.apache.tez.common.AsyncDispatcher.dispatch(AsyncDispatcher.java:184)
    at org.apache.tez.common.AsyncDispatcher$1.run(AsyncDispatcher.java:115)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.tez.common.ReflectionUtils.getNewInstance(ReflectionUtils.java:68)
    ... 25 more
Caused by: java.lang.RuntimeException: Failed to load plan: hdfs://ndatos01.sri.ad:8020/tmp/hive/mamv141114/1762793d-62ac-45d8-ae09-2900aa4851f9/hive_2017-09-28_08-14-12_349_3611498623986481792-11/mamv141114/_tez_scratch_dir/7bed50d7-f0f0-424d-ab95-5d9797635eba/map.xml: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: com.ibm.spss.hive.serde2.xml.XmlInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
    at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:479)
    at org.apache.hadoop.hive.ql.exec.Utilities.getMapWork(Utilities.java:318)
    at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.<init>(HiveSplitGenerator.java:101)
    ... 30 more
Caused by: org.apache.hive.com.esotericsoftware.kryo.KryoException: Unable to find class: com.ibm.spss.hive.serde2.xml.XmlInputFormat
Serialization trace:
inputFileFormatClass (org.apache.hadoop.hive.ql.plan.PartitionDesc)
aliasToPartnInfo (org.apache.hadoop.hive.ql.plan.MapWork)
    at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
    at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:656)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:238)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.DefaultSerializers$ClassSerializer.read(DefaultSerializers.java:226)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObjectOrNull(Kryo.java:745)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:113)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:776)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:139)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.MapSerializer.read(MapSerializer.java:17)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:694)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.ObjectField.read(ObjectField.java:106)
    at org.apache.hive.com.esotericsoftware.kryo.serializers.FieldSerializer.read(FieldSerializer.java:507)
    at org.apache.hive.com.esotericsoftware.kryo.Kryo.readObject(Kryo.java:672)
    at org.apache.hadoop.hive.ql.exec.Utilities.deserializeObjectByKryo(Utilities.java:1182)
    at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:1069)
    at org.apache.hadoop.hive.ql.exec.Utilities.deserializePlan(Utilities.java:1083)
    at org.apache.hadoop.hive.ql.exec.Utilities.getBaseWork(Utilities.java:439)
    ... 32 more
Caused by: java.lang.ClassNotFoundException: com.ibm.spss.hive.serde2.xml.XmlInputFormat
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.hive.com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:136)
    ... 50 more
]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0 (state=08S01,code=2)

¿Where do I have to put the hivexmlserde.jar so this INSERT works? ¿Has anyone faced the same problem? I have copied the jar in several lib directories and have no success, so I think there's another place where I have to copy this jar

¿Is there another way to read a batch of XML files and insert into an ORC table or another hive table?

I really appreciate if anyone can help me.

4 REPLIES 4

Re: XMLSerDe not working when writing data to another table

New Contributor

Were you able to figure out ??? I am facing the same issue. @Miguel Marquez
Could you please let me know , how did it work??

Re: XMLSerDe not working when writing data to another table

New Contributor

Actually I was waiting for an answer. My problem was not solved. I switched to another solution that doesn't involve XML Files. Sorry @Anji Raju

Highlighted

Re: XMLSerDe not working when writing data to another table

New Contributor

@Shu Any idea ???

Re: XMLSerDe not working when writing data to another table

New Contributor

I am facing the same issue were you able to resolve this @Anji Raju