Member since
04-08-2016
4
Posts
0
Kudos Received
0
Solutions
04-25-2016
10:38 PM
Hi Benjamin, I tested that in my local and Hortonworks sandbox. Both places I get expected behavior, it is based on splits size. I think it is something related to wasb. Thanks
... View more
04-25-2016
05:01 PM
I am getting a strange behavior. I have a file stored in azure wasb (size 1 GB) when I create an RDD using below statement, it only creates two partitions. I am under impression it should be based on HDFS block size which is 128M in our environment. val fileRDD = sc.textFile("/user/aahmed/file.csv") Seems like its creates one partition for 500MB each. I tried it with one large file (28G) and I got 56 partitions. It supposed to be based HDFS block size not based on 500MB
... View more
Labels:
- Labels:
-
Apache Spark
04-08-2016
03:24 PM
When I submit a spark job using below command, spark-submit --num-executors 10 --executor-cores 5 --executor-memory 2G --master yarn-cluster --conf spark.driver.userClassPathFirst=true --conf spark.executor.userClassPathFirst=true --class com.example.SparkJob target/scala-2.10/spark-poc-assembly-0.1.jar 10.0.201.6 hdfs:///user/aahmed/example.csv It gives me these messages on console. I want to see org.apache.spark INFO level message. How and where can I configure this? 16/04/08 15:09:50 INFO Client: Application report for application_1460098549233_0013 (state: RUNNING) 16/04/08 15:09:51 INFO Client: Application report for application_1460098549233_0013 (state: RUNNING) 16/04/08 15:09:52 INFO Client: Application report for application_1460098549233_0013 (state: RUNNING) 16/04/08 15:09:53 INFO Client: Application report for application_1460098549233_0013 (state: RUNNING) 16/04/08 15:09:54 INFO Client: Application report for application_1460098549233_0013 (state: RUNNING)
... View more
Labels:
- Labels:
-
Apache Spark