Created 09-24-2016 10:50 PM
Hello,
I put a small script to work on Spark (installed alongside the HDP suite, through Ambari). The script took more than 5 hours to run and ended up failing due to "IOError: [Errno 28] No space left on device".
Whenever I start the pyspark shell or run a script using spark-submit, some warnings are shown that I'm afraid may be a good indicator to know how to solve the trouble I'm facing here. As the resources spark may be using can be only from the host in which I'm running the client and may be ignoring the cluster.
16/09/23 23:26:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/09/23 23:26:52 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
Thanks in advance,
Created 10-02-2016 03:48 AM
Those warnings have nothing to do with your issue, but it is good to fix them anyway.
1. If you don't want your job to use all the resources from the cluster, then define a separate YARN queue for Spark jobs and submit it to that queue. I already assume that you submit Spark job via YARN. Obviously, your job will max out that Spark queue resources, but the resources not assigned to that queue could still be used by others. You still have your problem, but others can still execute jobs.
2. Look at your job and determine why is using so much resources, redesign it, tune it, break it in small pieces, etc. If the job is well tuned then your cluster does not have enough resources. Check resource use during the execution of the job to determine the bottleneck (RAM, CPU, etc).
Created 09-25-2016 06:31 PM
How are you launching the spark job?
FYI:
The warning are just letting you know that you haven't setup the environment variables. You can fix that by checking your environment settings and correcting them.
I'm pretty sure something like below would remove the warnings.
export LD_LIBRARY_PATH=/usr/local/hadoop/lib/native/:$LD_LIBRARY_PATH
Created 10-02-2016 03:48 AM
Those warnings have nothing to do with your issue, but it is good to fix them anyway.
1. If you don't want your job to use all the resources from the cluster, then define a separate YARN queue for Spark jobs and submit it to that queue. I already assume that you submit Spark job via YARN. Obviously, your job will max out that Spark queue resources, but the resources not assigned to that queue could still be used by others. You still have your problem, but others can still execute jobs.
2. Look at your job and determine why is using so much resources, redesign it, tune it, break it in small pieces, etc. If the job is well tuned then your cluster does not have enough resources. Check resource use during the execution of the job to determine the bottleneck (RAM, CPU, etc).