Member since
11-14-2018
9
Posts
0
Kudos Received
0
Solutions
05-31-2019
10:32 AM
Its a problem with permissions, you need to let spark let know about local dir, following code then works: def xmlConvert(spark): etl_time = time.time() df = spark.read.format('com.databricks.spark.xml').options(rowTag='HistoricalTextData').load(
'/home/zangetsu/proj/prometheus-core/demo/demo-1-iot-predictive-maintainance/dataset/train/') df = df.withColumn("TimeStamp", df["TimeStamp"].cast("timestamp")).groupBy("TimeStamp").pivot("TagName").sum(
"TagValue").na.fill(0) df.repartition(1).write.csv( path="/home/zangetsu/proj/prometheus-core/demo/demo-1-iot-predictive-maintainance/result/", mode="overwrite", header=True, sep=",") print("Time taken to do xml transformation: --- %s seconds ---" % (time.time() - etl_time))
if __name__ == '__main__': spark = SparkSession \
.builder \
.appName('XML ETL') \
.master("local[*]") \
.config('job.local.dir', '/home/zangetsu/proj/prometheus-core/demo/demo-1-iot-predictive-maintainance') \
.config('spark.driver.memory','64g') \
.config('spark.debug.maxToStringFields','200') \
.config('spark.jars.packages', 'com.databricks:spark-xml_2.11:0.5.0') \
.getOrCreate() print('Session created')
try: xmlConvert(spark)
finally: spark.stop()
... View more
05-31-2019
09:29 AM
And I found a solution by pointint job.local.dir to directory with the code: spark = SparkSession \ .builder \ .appName('XML ETL') \ .master("local[*]") \ .config('job.local.dir', 'file:/home/zangetsu/proj/prometheus-core/demo/demo-1-iot-predictive-maintainance') \ .config('spark.jars.packages', 'com.databricks:spark-xml_2.11:0.5.0') \ .getOrCreate() Now all works
... View more
02-03-2019
04:37 PM
Hi, I am not expert on administrating Cloudera, but from existing Express docker image found i upgraded to latest 5.*. I created this docker image as kind of base for new project development. https://cloud.docker.com/u/archenroot/repository/docker/archenroot/cloudera-cdap-jdk8 What could be the right steps to upgrade this image to 6.*? @maziyar - if you are willing to help, I can add you as collaborator in dockerhub...
... View more
11-16-2018
07:25 AM
2 Kudos
This was an issue with that consumer group in __consumer_offsets adn these were the steps we did to fix this issue On a single broker run the below 1) find /kafka/data -name "*.log" | grep -i consumer | awk '{a=$1;b="kafka-run-class kafka.tools.DumpLogSegments --deep-iteration --print-data-log -files "a; print b}' Now run each and every command on xxxx broker to see which log file has consumer group "prod-abc-events" 2) kafka-run-class kafka.tools.DumpLogSegments --deep-iteration --print-data-log -files /kafka/data/sdc/__consumer_offsets-24/00000000000000000000.log | grep -i 'prod-abc-events Do steps above on all the brokers and make a list of all the files that have 'prod-abc-events' . In our instance we found 3 files that refrenced this group "prod-abc-events' broker1: /kafka/data/sda/__consumer_offsets-24/00000000000000000000.log broker2: /kafka/data/sdc/__consumer_offsets-24/00000000000000000000.log broker3: /kafka/data/sdc/__consumer_offsets-24/00000000000000000000.log We noticed that the .log file on broker1 was different in size and content from the remaining two. We backed up the file from broker1 and then replaced it with the one from broker2 . and that has resolved this issue. Most likely this happened to us when we ran kafka-reassign-partitions and drives reached 99% and then something broke in _consumer_offsets.
... View more