About ArchenROOT

ArchenROOT · ‎05-31-2019

Its a problem with permissions, you need to let spark let know about local dir, following code then works: def xmlConvert(spark): etl_time = time.time() df = spark.read.format('com.databricks.spark.xml').options(rowTag='HistoricalTextData').load( '/home/zangetsu/proj/prometheus-core/demo/demo-1-iot-predictive-maintainance/dataset/train/') df = df.withColumn("TimeStamp", df["TimeStamp"].cast("timestamp")).groupBy("TimeStamp").pivot("TagName").sum( "TagValue").na.fill(0) df.repartition(1).write.csv( path="/home/zangetsu/proj/prometheus-core/demo/demo-1-iot-predictive-maintainance/result/", mode="overwrite", header=True, sep=",") print("Time taken to do xml transformation: --- %s seconds ---" % (time.time() - etl_time)) if __name__ == '__main__': spark = SparkSession \ .builder \ .appName('XML ETL') \ .master("local[*]") \ .config('job.local.dir', '/home/zangetsu/proj/prometheus-core/demo/demo-1-iot-predictive-maintainance') \ .config('spark.driver.memory','64g') \ .config('spark.debug.maxToStringFields','200') \ .config('spark.jars.packages', 'com.databricks:spark-xml_2.11:0.5.0') \ .getOrCreate() print('Session created') try: xmlConvert(spark) finally: spark.stop()

ArchenROOT · ‎05-31-2019

And I found a solution by pointint job.local.dir to directory with the code: spark = SparkSession \ .builder \ .appName('XML ETL') \ .master("local[*]") \ .config('job.local.dir', 'file:/home/zangetsu/proj/prometheus-core/demo/demo-1-iot-predictive-maintainance') \ .config('spark.jars.packages', 'com.databricks:spark-xml_2.11:0.5.0') \ .getOrCreate() Now all works

ArchenROOT · ‎02-03-2019

Hi, I am not expert on administrating Cloudera, but from existing Express docker image found i upgraded to latest 5.*. I created this docker image as kind of base for new project development. https://cloud.docker.com/u/archenroot/repository/docker/archenroot/cloudera-cdap-jdk8 What could be the right steps to upgrade this image to 6.*? @maziyar - if you are willing to help, I can add you as collaborator in dockerhub...

desind · ‎11-16-2018

This was an issue with that consumer group in __consumer_offsets adn these were the steps we did to fix this issue On a single broker run the below 1) find /kafka/data -name "*.log" | grep -i consumer | awk '{a=$1;b="kafka-run-class kafka.tools.DumpLogSegments --deep-iteration --print-data-log -files "a; print b}' Now run each and every command on xxxx broker to see which log file has consumer group "prod-abc-events" 2) kafka-run-class kafka.tools.DumpLogSegments --deep-iteration --print-data-log -files /kafka/data/sdc/__consumer_offsets-24/00000000000000000000.log | grep -i 'prod-abc-events Do steps above on all the brokers and make a list of all the files that have 'prod-abc-events' . In our instance we found 3 files that refrenced this group "prod-abc-events' broker1: /kafka/data/sda/__consumer_offsets-24/00000000000000000000.log broker2: /kafka/data/sdc/__consumer_offsets-24/00000000000000000000.log broker3: /kafka/data/sdc/__consumer_offsets-24/00000000000000000000.log We noticed that the .log file on broker1 was different in size and content from the remaining two. We backed up the file from broker1 and then replaced it with the one from broker2 . and that has resolved this issue. Most likely this happened to us when we ran kafka-reassign-partitions and drives reached 99% and then something broke in _consumer_offsets.

Online	Offline
Last Visited	‎05-31-2019 01:55 PM

Member Since	‎11-14-2018 09:02 PM
Last Visited	‎05-31-2019 01:55 PM
Posts	9

Cloudera Community

Re: Writing from Spark to a shared file system

Re: Spark - Cannot mkdir file

Re: Cloudera Express 6.x release

Re: Error in kafka consumer