Support Questions

Find answers, ask questions, and share your expertise

Unable to run spark-shell command with k8s as master

avatar
Contributor

Hello all - I'm trying to run the below spark-shell command from the bin directory of spark 3.4.3 extracted location. I specified the master as my Kubernetes environment as I'd like my executors to run on the k8s environment.

I created a service account with all necessary permissions.

# kubectl auth can-i create pod --as=system:serviceaccount:my-namespace:spark-sa -n my-namespace                                                         

yes

Command:

# export K8S_TOKEN=$(kubectl get secrets -o jsonpath="{.items[?(@.metadata.annotations['kubernetes\.io/service-account\.name']=='spark-sa')].data.token}"|base64 --decode)

# ./bin/spark-shell \
--master k8s://https://my-k8s-cluster:6443 \
--deploy-mode client \
--name spark-shell-poc \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.container.image=my-docker-hub/spark_poc:v1.4 \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.namespace=dynx-center-resources \
--conf spark.driver.pod.name=dynx-spark-driver \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \
--conf spark.kubernetes.authenticate.submission.oauthToken=$K8S_TOKEN

Even though I specified the service account to use and its token, it always ends up in using 'system:anonymous' user to create pods in my k8s environment and because of that I get the below error (snippet from a huge stack trace).

25/02/06 14:36:32 WARN ExecutorPodsSnapshotsStoreImpl: Exception when notifying snapshot subscriber.
io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://my-k8s-cluster:6443/api/v1/namespaces/dynx-center-resources/pods. Message: pods is forbidden: User "system:anonymous" cannot create resource "pods" in API group "" in the namespace "dynx-center-resources". Received status: Status(apiVersion=v1, code=403, details=StatusDetails(causes=[], group=null, kind=pods, name=null, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=pods is forbidden: User "system:anonymous" cannot create resource "pods" in API group "" in the namespace "dynx-center-resources", metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Forbidden, status=Failure, additionalProperties={}).
at io.fabric8.kubernetes.client.KubernetesClientException.copyAsCause(KubernetesClientException.java:238)
at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.waitForResult(OperationSupport.java:538)
at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleResponse(OperationSupport.java:558)
at io.fabric8.kubernetes.client.dsl.internal.OperationSupport.handleCreate(OperationSupport.java:349)
at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:711)
at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.handleCreate(BaseOperation.java:93)
at io.fabric8.kubernetes.client.dsl.internal.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:42)
at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:1113)
at io.fabric8.kubernetes.client.dsl.internal.BaseOperation.create(BaseOperation.java:93)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:440)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:417)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36(ExecutorPodsAllocator.scala:370)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$36$adapted(ExecutorPodsAllocator.scala:363)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:363)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3(ExecutorPodsAllocator.scala:134)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3$adapted(ExecutorPodsAllocator.scala:134)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.org$apache$spark$scheduler$cluster$k8s$ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber$$processSnapshotsInternal(ExecutorPodsSnapshotsStoreImpl.scala:143)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.processSnapshots(ExecutorPodsSnapshotsStoreImpl.scala:131)
at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl.$anonfun$addSubscriber$1(ExecutorPodsSnapshotsStoreImpl.scala:85)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
at java.base/java.lang.Thread.run(Thread.java:840)

As a part of troubleshooting, I tried running the below curl command using the same service account user token and got the results.

curl -X GET https://my-k8s-cluster:6443/api --header "Authorization: Bearer eyJhbGciOiJSUzI1NiIsImtpZCI6IjdWOXgwTjdIeUdCTGx2eEItOXZ3eDlSV1I1UXd1d0MtTXJENFBhXzNDTTgifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJkeW54LWNlbnRlci1yZXNvdXJjZXMiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlY3JldC5uYW1lIjoic3Bhcmstc2EiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoic3Bhcmstc2EiLCJrdWJlcm5ldGVzxxxxxxxxxxxxxxxxxxxxxmMS03NzI5LTQ5OTAtYWZkOC1mYjZiNzU4ZDg5YzAiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6ZHlueC1jZW50ZXItcmVzb3VyY2VzOnNwYXJrLXNhIn0.TWAQYmu_N-N1gnZ1hYYn_wvavs9f9w33v0P0Kgchd1eETO8TpHlYS_JSt8jzWlX6C4JF293Q8VRk8p1Nx3zRdqjZnYWmMvJYCaq5mBAyvXAW8fXW_ZtQD7HJPUEUb2ZDXUz3b2XLgvJoWui8vhqZBYUev67YgHHRspgkwDbLrRIB1oRPbx_2osYMQW3tPxoThyzUqdvyBij3hjW-syrsp_sR1ir-78XzIZpkV2OBFds7u8vd0IqoWLOtmnZwdq1RKCKtFk292VfWSbN0HYJUs_aJUeaqLpekopZLfDM2U_GT0ImwBUOL2EILpb-K1xdWr4-Jv4qPsFBLFh31S2OMAg" --insecure
{
 "kind": "APIVersions",
 "versions": [
   "v1"
 ],
 "serverAddressByClientCIDRs": [
   {
     "clientCIDR": "0.0.0.0/0",
     "serverAddress": "10.14.3.19:6443"
   }
 ]
}%                                

However, if I run the spark-submit command via cluster deploy mode, it runs without any issue and produce the desired output. 

# ./bin/spark-submit \
--master k8s://https://my-k8s-cluster:6443 \
--deploy-mode cluster \
--name spark-poc \
--conf spark.executor.instances=2 \
--conf spark.kubernetes.container.image=my-docker-hub/spark_poc:v1.4 \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.namespace=dynx-center-resources \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \
--conf spark.kubernetes.authenticate.submission.oauthToken=$K8S_TOKEN \
--class org.apache.spark.examples.SparkPi \
local:///opt/spark/examples/jars/spark-examples_2.12-3.4.3.jar 1000

Not sure what I'm missing. Appreciate any help on this.

 

1 REPLY 1

avatar
Master Mentor

@spserd 
Looking at your issue with Spark on Kubernetes, I see a clear difference between the client and cluster deployment modes that's causing the "system" authentication problem. the issue is when running in client mode with spark-shell, you're encountering an authorization issue where Spark is trying to create executor pods as "system" instead of using your service account "spark-sa", despite providing the token.

Possible Solution
For client mode, you need to add a specific configuration to tell Spark to use the token for executor pod creation

Spoiler
--conf spark.kubernetes.authenticate.executor.serviceAccountName=spark-sa

So your updated command should look like this 

Spoiler
./bin/spark-shell \
--master k8s://https://my-k8s-cluster:6443 \
--deploy-mode client \
--name spark-shell-poc \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.container.image=my-docker-hub/spark_poc:v1.4 \
--conf spark.kubernetes.container.image.pullPolicy=IfNotPresent \
--conf spark.kubernetes.namespace=dynx-center-resources \
--conf spark.driver.pod.name=dynx-spark-driver \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark-sa \
--conf spark.kubernetes.authenticate.executor.serviceAccountName=spark-sa \
--conf spark.kubernetes.authenticate.submission.oauthToken=$K8S_TOKEN

The key is that in client mode, you need to explicitly configure the executor authentication because the driver is running outside the cluster and needs to delegate this permission.

If this still doesn't work, ensure your service account has appropriate ClusterRole bindings that allow it to create and manage pods in the specified namespace.

Happy hadooping