Member since
08-16-2017
7
Posts
0
Kudos Received
0
Solutions
12-20-2018
01:30 PM
I am needing some guidance on parsing some weather data in xml. I have tried the databricks xml package to no avail. example.xml I am using spark 2.3
... View more
Labels:
- Labels:
-
Apache Spark
09-15-2017
04:24 PM
Perfect, thanks!
... View more
09-12-2017
08:18 PM
I am running this query through beeline and it transforms my string value from string to timestamp successfully. select cast(regexp_replace(createdatetime,'(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2}).(\\d{3})Z','$1-$2-$3 $4:$5:$6.$7') as timestamp) as thetime from L2_view where load_dt='20170908' and createdatetime is not null; When I run the same query through spark sqlContext I get nulls. sqlContext.sql("select cast(regexp_replace(createdatetime,'(\\d{4})-(\\d{2})-(\\d{2})T(\\d{2}):(\\d{2}):(\\d{2}).(\\d{3})Z','$1-$2-$3 $4:$5:$6.$7') as timestamp) as thetime from L2_view where load_dt='20170908' and createdatetime is not null").show +-------+
|thetime|
+-------+
+-------+ Can you explain why this happens? I am running Spark 1.6.
... View more
Labels:
- Labels:
-
Apache Spark
08-18-2017
08:28 PM
I went back and checked my uploaded file and I too can parse it. I am thinking there may be some type of hidden characters somewhere in my file. It is too big to upload the complete file. Is there a way to map the file as an RDD, replaceAll, and input it through the Databricks parser?
... View more
08-17-2017
06:14 PM
Yes, I realize I could write it to disk but was trying to avoid that if possible. I am not using Zeppellin.
... View more
08-16-2017
07:17 PM
I have a need to process some xml data in Spark ( 1.6) using the databricks xml jar. My problem is the data source adds "xmlns: /data/path/d" to the root element tag and this extra verbiage makes the databricks xml parser not parse a node. If I remove the extra verbiage and leave a normal tag like <tag1\> , the parser parses fine. I would like to load the file to an RDD, replaceAll on the verbiage, and then run the RDD through the databricks xml parser to create a dataframe. So, the main question is I'm not sure how to load the RDD into the databricks xml jar. I only see examples of files being loaded.
... View more
Labels:
- Labels:
-
Apache Spark