Community Articles

Find and share helpful community-sourced technical articles.
Announcements
Celebrating as our community reaches 100,000 members! Thank you!
Labels (1)
avatar
Master Mentor
XML:
[hdfs@sandbox ~]$ hdfs dfs -mkdir -p /test/sequences 
[hdfs@sandbox ~]$ 
[hdfs@sandbox ~]$ vi sample.xml 
<sequences period=”5″>
             <moment game-clock=”300.00″ time=”1433732215737″ game-event-id=”” shot-clock=”24.00″ locations=”-1,-1,96.95182,1.98648,5.75987;9,173004,45.54661,17.35545,0;9,338365,24.04722,25.67399,0;9,457611,46.95292,27.93478,0;9,468895,48.59834,33.96586,0;9,552336,33.73381,24.05929,0;5,214152,59.26872,24.12006,0;5,253997,45.71551,17.41071,0;5,457186,48.59834,33.96586,0;5,531447,78.09629,34.24688,0;5,552806,47.79678,22.8155,0″/> 
             <moment game-clock=”300.00″ time=”1433732215794″ game-event-id=”” shot-clock=”24.00″ locations=”-1,-1,97.79683,.89407,3.67626;9,173004,45.62283,17.34854,0;9,338365,24.04248,25.6784,0;9,457611,46.84978,27.8463,0;9,468895,48.52017,33.89189,0;9,552336,33.74064,24.03493,0;5,214152,59.27282,24.07895,0;5,253997,45.87101,17.38532,0;5,457186,48.52017,33.89189,0;5,531447,78.06394,34.2155,0;5,552806,47.8269,22.81393,0″/> 
             <moment game-clock=”300.00″ time=”1433732215829″ game-event-id=”” shot-clock=”24.00″ locations=”-1,-1,97.78946,.91006,3.68332;9,173004,45.61995,17.35703,0;9,338365,24.03815,25.68402,0;9,457611,46.71077,27.71191,0;9,468895,48.37095,33.77756,0;9,552336,33.74769,24.00829,0;5,214152,59.27627,24.06055,0;5,253997,46.00077,17.36555,0;5,457186,48.37095,33.77756,0;5,531447,78.0439,34.20521,0;5,552806,47.84297,22.83292,0″/> 
             <moment game-clock=”300.00″ time=”1433732215856″ game-event-id=”” shot-clock=”24.00″ locations=”-1,-1,97.73786,1.02206,3.73271;9,173004,45.57851,17.34979,0;9,338365,24.04207,25.61049,0;9,457611,46.63871,27.56226,0;9,468895,48.2033,33.7142,0;9,552336,33.75497,23.97935,0;5,214152,59.27906,24.06485,0;5,253997,46.10481,17.35141,0;5,457186,48.29748,33.63262,0;5,531447,78.03618,34.216,0;5,552806,47.84498,22.87247,0″/> 
             <moment game-clock=”300.00″ time=”1433732215905″ game-event-id=”” shot-clock=”24.00″ locations=”-1,-1,97.59781,1.32606,3.8668;9,173004,45.57865,17.34643,0;9,338365,24.04224,25.61058,0;9,457611,46.56615,27.44014,0;9,468895,48.01722,33.7018,0;9,552336,33.76247,23.94813,0;5,214152,59.27976,24.07223,0;5,253997,46.26668,17.38672,0;5,457186,48.29974,33.45708,0;5,531447,78.02931,34.2208,0;5,552806,47.86752,22.85019,0″/>
</sequences>
[hdfs@sandbox ~]$ 
hdfs dfs -put sample.xml /test/sequences/
[hdfs@sandbox ~]$ 
[hdfs@sandbox ~]$ wget http://search.maven.org/remotecontent?filepath=com/ibm/spss/hive/serde2/xml/hivexmlserde/1.0.5.3/hiv... 
[hdfs@sandbox ~]$ mv remotecontent?filepath=com%2Fibm%2Fspss%2Fhive%2Fserde2%2Fxml%2Fhivexmlserde%2F1.0.5.3%2Fhivexmlserde-1.0.5.3.jar hivexmlserde-1.0.5.3.jar
[
hdfs@sandbox ~]$ pwd 
/home/hdfs 
[hdfs@sandbox ~]$ mv hivexmlserde-1.0.5.3.jar /tmp 
hive 
hive> add jar /tmp/hivexmlserde-1.0.5.3.jar; 
Added [/tmp/hivexmlserde-1.0.5.3.jar] to class path 
Added resources: [/tmp/hivexmlserde-1.0.5.3.jar] 
hive> drop table sequences; 
OK 
Time taken: 3.717 seconds 
CREATE EXTERNAL TABLE sequences(
       gameclock double,
       time bigint,
       gameeventid string,
       shotclock double,
       locations string
     )
     ROW FORMAT SERDE ‘com.ibm.spss.hive.serde2.xml.XmlSerDe’
     WITH SERDEPROPERTIES (
     “column.xpath.gameclock”=”/moment/@game-clock”,
     “column.xpath.time”=”/moment/@time”,
     “column.xpath.gameeventid”=”/moment/@game-event-id”,
     “column.xpath.shotclock”=”/moment/@shot-clock”,
     “column.xpath.locations”=”/moment/@locations”
     )
     STORED AS
     INPUTFORMAT ‘com.ibm.spss.hive.serde2.xml.XmlInputFormat’
     OUTPUTFORMAT ‘org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat’
     LOCATION ‘/test/sequences’
     TBLPROPERTIES (
     “xmlinput.start”=”<moment “,
     “xmlinput.end”=”/”
     )
     ;
OK 
Time taken: 2.871 seconds
hive> 
hive> select * from sequences; 
OK 
300.01433732215737  24.0 -1,-1,96.95182,1.98648,5.75987;9,173004,45.54661,17.35545,0;9,338365,24.04722,25.67399,0;9,457611,46.95292,27.93478,0;9,468895,48.59834,33.96586,0;9,552336,33.73381,24.05929,0;5,214152,59.26872,24.12006,0;5,253997,45.71551,17.41071,0;5,457186,48.59834,33.96586,0;5,531447,78.09629,34.24688,0;5,552806,47.79678,22.8155,0
300.01433732215794  24.0 -1,-1,97.79683,.89407,3.67626;9,173004,45.62283,17.34854,0;9,338365,24.04248,25.6784,0;9,457611,46.84978,27.8463,0;9,468895,48.52017,33.89189,0;9,552336,33.74064,24.03493,0;5,214152,59.27282,24.07895,0;5,253997,45.87101,17.38532,0;5,457186,48.52017,33.89189,0;5,531447,78.06394,34.2155,0;5,552806,47.8269,22.81393,0
300.01433732215829  24.0 -1,-1,97.78946,.91006,3.68332;9,173004,45.61995,17.35703,0;9,338365,24.03815,25.68402,0;9,457611,46.71077,27.71191,0;9,468895,48.37095,33.77756,0;9,552336,33.74769,24.00829,0;5,214152,59.27627,24.06055,0;5,253997,46.00077,17.36555,0;5,457186,48.37095,33.77756,0;5,531447,78.0439,34.20521,0;5,552806,47.84297,22.83292,0
300.01433732215856  24.0 -1,-1,97.73786,1.02206,3.73271;9,173004,45.57851,17.34979,0;9,338365,24.04207,25.61049,0;9,457611,46.63871,27.56226,0;9,468895,48.2033,33.7142,0;9,552336,33.75497,23.97935,0;5,214152,59.27906,24.06485,0;5,253997,46.10481,17.35141,0;5,457186,48.29748,33.63262,0;5,531447,78.03618,34.216,0;5,552806,47.84498,22.87247,0
300.01433732215905  24.0 -1,-1,97.59781,1.32606,3.8668;9,173004,45.57865,17.34643,0;9,338365,24.04224,25.61058,0;9,457611,46.56615,27.44014,0;9,468895,48.01722,33.7018,0;9,552336,33.76247,23.94813,0;5,214152,59.27976,24.07223,0;5,253997,46.26668,17.38672,0;5,457186,48.29974,33.45708,0;5,531447,78.02931,34.2208,0;5,552806,47.86752,22.85019,0 
Time taken: 2.8 seconds, 
Fetched: 5 row(s) 
hive> 
    > select gameclock, time,gameeventid, shotclock, concat(substr(locations,1,10), ‘ …’) as sequence_truncated_for_email_to_anoop from sequences;
OK
300.01433732215737  24.0 -1,-1,96.9 …
300.01433732215794  24.0 -1,-1,97.7 …
300.01433732215829  24.0 -1,-1,97.7 …
300.01433732215856  24.0 -1,-1,97.7 …
300.01433732215905  24.0 -1,-1,97.5 … 
Time taken: 0.898 seconds, Fetched: 5 row(s)
hive> 
31,304 Views
Comments
avatar
New Contributor

I think there is a small typo in the CREATE TABLE command:

Line 22 should be:

"xmlinput.end"="/>"

The "greater than" arrow symbol is missing.

avatar
Contributor

Hi Neeraj, I have tried this approach on a xml which does not have an attritbute value.. but the query returns a null value. can you pls guide

https://community.hortonworks.com/questions/40979/hive-xml-parising-null-value-returned.html.

avatar
Expert Contributor

where to get this jar file com.ibm.spss.hive.serde2.xml.XmlSerDe?

avatar

@vamsi valiveti the jar file is from the XML SerDe created by the community and is available on github: https://github.com/dvasilen/Hive-XML-SerDe

avatar
New Contributor
It helped me a lot. thanks bro:)
avatar
New Contributor

I'm using hive 3.1 and encountered below error. Any help is appreciated.

Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/hadoop/hive/serde2/SerDe (state=08S01,code=1)

avatar

@torafca5 Did you solve this problem? Having the same issue right now.

Failing with "ClassNotFoundException: org.apache.hadoop.hive.serde2.SerDeException"

Hive 3.1.0

avatar
Cloudera Employee

@torafca5  

Could you please try downloading the jar from the below link,

 

http://www.congiu.net/hive-json-serde/1.3.8/hdp23/json-serde-1.3.8-jar-with-dependencies.jar

 

Once the jar is downloaded, move the jar to the location /usr/hdp/3.0.1.0-187/hive/lib.

Please place the jar on all the nodes hosting Hive services.
Also, please make sure you are not using LLAP(HiveserverInteractive) to connect to the hive.
add jar command does not work with LLAP.

 

implementing the above recommendation should help overcome this issue.

Version history
Last update:
‎10-06-2015 01:17 PM
Updated by:
Contributors