About leo_adnan

leo_adnan · ‎04-25-2016

Hi Benjamin, I tested that in my local and Hortonworks sandbox. Both places I get expected behavior, it is based on splits size. I think it is something related to wasb. Thanks

leo_adnan · ‎04-25-2016

I am getting a strange behavior. I have a file stored in azure wasb (size 1 GB) when I create an RDD using below statement, it only creates two partitions. I am under impression it should be based on HDFS block size which is 128M in our environment. val fileRDD = sc.textFile("/user/aahmed/file.csv") Seems like its creates one partition for 500MB each. I tried it with one large file (28G) and I got 56 partitions. It supposed to be based HDFS block size not based on 500MB

leo_adnan · ‎04-08-2016

When I submit a spark job using below command, spark-submit --num-executors 10 --executor-cores 5 --executor-memory 2G --master yarn-cluster --conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true --class com.example.SparkJob target/scala-2.10/spark-poc-assembly-0.1.jar 10.0.201.6 hdfs:///user/aahmed/example.csv It gives me these messages on console. I want to see org.apache.spark INFO level message. How and where can I configure this? 16/04/08 15:09:50 INFO Client: Application report for application_1460098549233_0013 (state: RUNNING) 16/04/08 15:09:51 INFO Client: Application report for application_1460098549233_0013 (state: RUNNING) 16/04/08 15:09:52 INFO Client: Application report for application_1460098549233_0013 (state: RUNNING) 16/04/08 15:09:53 INFO Client: Application report for application_1460098549233_0013 (state: RUNNING) 16/04/08 15:09:54 INFO Client: Application report for application_1460098549233_0013 (state: RUNNING)

Online	Offline
Last Visited	‎12-08-2016 07:32 PM

Member Since	‎04-08-2016 03:17 PM
Last Visited	‎12-08-2016 07:32 PM
Posts	4

Cloudera Community

Re: Spark RDD partitions behavior in HDInsight (Az...

Spark RDD partitions behavior in HDInsight (Azure)

Spark job submit log messages on console