I am making some calls with Boto3 in my pyspark which is working fine in master=local mode but when I switch to master=yarn I am getting "NoCredentialsError: Unable to locate credentials" which is a bit annoying as I cannot work out why!I have been running this application fine on Mesos and EMR. Does HDP PySpark2 run within a virtualenv or something?
Where do you store the credentials for boto (from my understanding the default for boto 3 is in the user directory)?
Since it works with master=local I guess you've added the credentials under your user in ~/aws/credentials
Since master=yarn will execute the python stuff on multiple machine in parallel, the credentials need to be available on each machine and accessible for the user that runs the spark job. And depending on your cluster setup it could be different users that run this jobs started by YARN.
So, I would check in YARN Resource UI (accessible via Ambari Quick link) which user the job runs under and then add credentials on each machine under ~/.aws/credentials of this user