Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

xml parsing

Highlighted

xml parsing

New Contributor

I am needing some guidance on parsing some weather data in xml. I have tried the databricks xml package to no avail. example.xml I am using spark 2.3

4 REPLIES 4

Re: xml parsing

New Contributor

Maybe show this, that link doesn't render in a browser well:

<?xml version="1.0" encoding="UTF-8"?>
<wmo-bulletin category-subcode="37" header-time="261500" category-code="SA" region="IR" originator="OIII" wmo-header="SAIR37 OIII 261500" leads-receipt-time="2018-11-26T15:04:24Z"><![CDATA[SAIR37 OIII 261500
METAR OIBL 261500Z 00000KT 9999 SCT030 22/20 Q1016=
METAR OIBQ 261500Z 29016KT 9999 SCT020 20/16 Q1019=
METAR OIIK 261500Z 03002KT 9999 FEW040 BKN090 07/01 Q1017=
METAR OIMC 261500Z AUTO 08004KT //// // ////// 06/05 Q1018=
METAR OIMD 261500Z NIL=
METAR OIMQ 261500Z AUTO 24004KT //// // ////// 09/09 Q1015=
METAR OINE 261500Z 00000KT 4000 -RA BR BKN015 OVC080 08/08 Q1019=
METAR OITK 261500Z NIL=
METAR OITM 261500Z 30002KT 9999 FEW037 05/02 Q1019=
]]></wmo-bulletin>
<?xml version="1.0" encoding="UTF-8"?>
<wmo-bulletin afos-header="LSRLOT" afos-category="LSR" category-subcode="53" header-time="261503" category-code="NW" region="US" originator="KLOT" wmo-header="NWUS53 KLOT 261503" afos-designator="LOT" leads-receipt-time="2018-11-26T15:0
4:19Z"><![CDATA[NWUS53 KLOT 261503
LSRLOT
PRELIMINARY LOCAL STORM REPORT
NATIONAL WEATHER SERVICE CHICAGO IL
903 AM CST MON NOV 26 2018
..TIME... ...EVENT... ...CITY LOCATION... ...LAT.LON...
..DATE... ....MAG.... ..COUNTY LOCATION..ST.. ...SOURCE....
..REMARKS..
0700 AM HEAVY SNOW 1 WSW HARVARD 42.42N 88.63W
11/26/2018 M9.0 INCH MCHENRY IL CO-OP OBSERVER
&&
]]></wmo-bulletin>

Re: xml parsing

New Contributor

Two big CDATA blocks, not just one.

Re: xml parsing

New Contributor

You can test this NIFI groovy processor that converts XML files to CSV or AVRO

https://github.com/maxbback/nifi-xml