Member since
11-14-2018
9
Posts
0
Kudos Received
0
Solutions
05-31-2019
10:32 AM
Its a problem with permissions, you need to let spark let know about local dir, following code then works: def xmlConvert(spark): etl_time = time.time() df = spark.read.format('com.databricks.spark.xml').options(rowTag='HistoricalTextData').load(
'/home/zangetsu/proj/prometheus-core/demo/demo-1-iot-predictive-maintainance/dataset/train/') df = df.withColumn("TimeStamp", df["TimeStamp"].cast("timestamp")).groupBy("TimeStamp").pivot("TagName").sum(
"TagValue").na.fill(0) df.repartition(1).write.csv( path="/home/zangetsu/proj/prometheus-core/demo/demo-1-iot-predictive-maintainance/result/", mode="overwrite", header=True, sep=",") print("Time taken to do xml transformation: --- %s seconds ---" % (time.time() - etl_time))
if __name__ == '__main__': spark = SparkSession \
.builder \
.appName('XML ETL') \
.master("local[*]") \
.config('job.local.dir', '/home/zangetsu/proj/prometheus-core/demo/demo-1-iot-predictive-maintainance') \
.config('spark.driver.memory','64g') \
.config('spark.debug.maxToStringFields','200') \
.config('spark.jars.packages', 'com.databricks:spark-xml_2.11:0.5.0') \
.getOrCreate() print('Session created')
try: xmlConvert(spark)
finally: spark.stop()
... View more
05-31-2019
09:29 AM
And I found a solution by pointint job.local.dir to directory with the code: spark = SparkSession \ .builder \ .appName('XML ETL') \ .master("local[*]") \ .config('job.local.dir', 'file:/home/zangetsu/proj/prometheus-core/demo/demo-1-iot-predictive-maintainance') \ .config('spark.jars.packages', 'com.databricks:spark-xml_2.11:0.5.0') \ .getOrCreate() Now all works
... View more
05-31-2019
09:24 AM
My simple ETL code: def xmlConvert(spark): etl_time = time.time() df = spark.read.format('com.databricks.spark.xml').options(rowTag='HistoricalTextData').load( 'file:///home/zangetsu/proj/prometheus-core/demo/demo-1-iot-predictive-maintainance/dataset/data_train') df = df.withColumn("TimeStamp", df["TimeStamp"].cast("timestamp")).groupBy("TimeStamp").pivot("TagName").sum( "TagValue").na.fill(0) df.repartition(1).write.csv( path="file:///proj/prometheus-core/demo/demo-1-iot-predictive-maintainance/dataset/", mode="overwrite", header=True, sep=",") print("Time taken to do xml transformation: --- %s seconds ---" % (time.time() - etl_time)) if __name__ == '__main__': spark = SparkSession \ .builder \ .appName('XML ETL') \ .master("local[*]") \ .config('spark.jars.packages', 'com.databricks:spark-xml_2.11:0.5.0') \ .getOrCreate() print('Session created') try: xmlConvert(spark) finally: spark.stop() Still throwing the issue reported.
... View more
02-03-2019
04:37 PM
Hi, I am not expert on administrating Cloudera, but from existing Express docker image found i upgraded to latest 5.*. I created this docker image as kind of base for new project development. https://cloud.docker.com/u/archenroot/repository/docker/archenroot/cloudera-cdap-jdk8 What could be the right steps to upgrade this image to 6.*? @maziyar - if you are willing to help, I can add you as collaborator in dockerhub...
... View more
11-14-2018
09:04 PM
I am happy you fixed the issue, but next time you might consider writing some details about how you get out of that trouble situation as others might be in same situation as well 🙂
... View more