Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Null Pointer exception when loading XML to Hive

Highlighted

Null Pointer exception when loading XML to Hive

Contributor

Hi all

I'm currently facing some issues when loading an xml file into Hive using hive-serde. I followed the tips here but i'm still receiving a null pointer exception when trying to read the data that had been loading into hive via an xml file.

Here is the xml with some dummy values

<?xml version="1.0"?><History-Group-Comm-CommB-DT-RBB-Work xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<item id="HISTORY_6_GMT">
<pxAddedByID>HUBINT</pxAddedByID>
<pxAddedBySystem>CD</pxAddedBySystem>
<pxHistoryForReference>GR</pxHistoryForReference>
<pxInsName>GMT</pxInsName>
<pxObjClass>Work</pxObjClass>
<pxTimeCreated>2017-02-13T13:08:28.776Z</pxTimeCreated>
<pyFlowKey>RULE-OBJ</pyFlowKey>
<pyFlowName>pyStartCase</pyFlowName>
<pyFlowType>pyStartCase</pyFlowType>
<pyMessageKey>ItemCreated</pyMessageKey>
<pyPerformer>HUB</pyPerformer>
<pzInsKey>776 GMT</pzInsKey>
</item>


Here is the sql loading the xml

add jar hdfs://DEVHDPVM01HA:8020/HADOOP/DASD_ACQ/common/lib/hivexmlserde-1.0.5.3.jar;


DROP TABLE IF EXISTS test.test_tbl_stg;
CREATE EXTERNAL TABLE test.test_tbl_stg  (
        ADDED_BY_ID STRING COMMENT 'pxAddedByID',
        ADDED_BY_SYSTEM STRING COMMENT 'pxAddedBySystem',
        HISTORY_FOR_REFERENCE STRING COMMENT '',
        INSERT_NAME STRING COMMENT '',
        OBJECT_CLASS STRING COMMENT '',
        TIME_CREATED STRING COMMENT '',
        FLOW_KEY STRING COMMENT '',
        FLOW_NAME STRING COMMENT '',
        FLOW_TYPE FLOAT COMMENT '',
        MESSAGE STRING COMMENT '',
        PERFORMER STRING COMMENT '',
        INSERT_KEY STRING COMMENT '' ) COMMENT 'Optional Table Comment'
        ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
        WITH SERDEPROPERTIES (
        "column.xpath.ADDED_BY_ID"="/item/pxAddedByID/text()",
        "column.xpath.ADDED_BY_SYSTEM"="/item/pxAddedBySystem/text()",
        "column.xpath.HISTORY_FOR_REFERENCE"="/item/pxHistoryForReference/text()",
        "column.xpath.INSERT_NAME"="/item/pxInsName/text()",
        "column.xpath.OBJECT_CLASS"="/item/pxObjClass/text()",
        "column.xpath.TIME_CREATED"="/item/pxTimeCreated/text()",
        "column.xpath.FLOW_KEY"="/item/pyFlowKey/text()",
        "column.xpath.FLOW_NAME"="/item/pyFlowName/text()",
        "column.xpath.FLOW_TYPE"="/item/pyFlowType/text()",
        "column.xpath.MESSAGE"="/item/pyMessageKey/text()",
        "column.xpath.PERFORMER"="/item/pyPerformer/text()",
        "column.xpath.INSERT_KEY"="/item/pzInsKey/text()")
        STORED AS
        INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
        OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
        LOCATION '${stagingFolderPath}'
        TBLPROPERTIES ("xmlinput.start"="<item id=","xmlinput.end"="</item>");

Any suggestions is greatly appreciated

Don't have an account?
Coming from Hortonworks? Activate your account here