Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Spark thrift got general security issue

Highlighted

Spark thrift got general security issue

Explorer

Hi;

At the first day we started spark thrift with keytab file and principal, we could use beeline to connect database and could get data from tables.

[spark@xxx] $SPARK_HOME/sbin/start-thriftserver.sh --master yarn-client --keytab /keytab/spark_thrift.keytab --principal thriftuser/thrift.server.org@THRIFT.REALMS.ORG --hiveconf hive.server2.thrift.port=10102 --conf spark.hadoop.fs.hdfs.impl.disable.cache=true --hiveconf hive.server2.authetication.kerberos.pricipal=thriftuser/thrift.server.org@THRIFT.REALMS.ORG --hiveconf hive.server2.authetication.kerberos.keytab /keytab/spark_thrift.keytab --hiveconf hive.server2.logging.operation.enabled=true

We renewed the principal every 18hours.

while (true) do
  kinit -kt /keytab/spark_thrift.keytab thriftuser/thrift.server.org@THRIFT.REALMS.ORG
  sleep 18h
done &

The first day we started spark thrift, we could use beeline normally.

[spark@xxx]beeline
beeline> !connect jdbc:hive2://hive.server.org:10102/database;principal=thriftuser/thrift.server.org@THRIFT.REALMS.ORG

beeline> select count(1) from table;

###This would show the table details.

But one day left we tried get data again. it would throw errors like:

	java.lang.ClassCastException: 
	org.apache.hadoop.security.authentication.client.AuthenticationException 
	cannot be cast to java.security.GeneralSecurityException 
	        at 
org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:189) 
	        at 
org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388) 
	        at 
	org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1381) 
	        at 
	org.apache.hadoop.hdfs.DFSClient.createWrappedInputStream(DFSClient.java:1451) 
	        at 
	org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:305) 
	        at 
	org.apache.hadoop.hdfs.DistributedFileSystem$3.doCall(DistributedFileSystem.java:299) 
	        at 
	org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) 
	        at 
	org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:312) 
	        at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:769) 
	        at 
	org.apache.hadoop.mapred.LineRecordReader.<init>(LineRecordReader.java:109) 
	        at 
	org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:67) 
	        at 
	org.apache.spark.rdd.HadoopRDD$$anon$1.liftedTree1$1(HadoopRDD.scala:252) 
	        at 
	org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:251) 
	        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:211) 
	        at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:102) 
	        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) 
	        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) 
	        at 
	org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
	        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) 
	        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) 
	        at 
	org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
	        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) 
	        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) 
	        at 
	org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
	        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) 
	        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) 
	        at 
	org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 
	        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) 
	        at org.apache.spark.rdd.RDD.iterator(RDD.scala:287) 
	        at 
	org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) 
	        at org.apache.spark.scheduler.Task.run(Task.scala:99) 
	        at 
	org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322) 
	        at 
	java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
	        at 
	java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
	        at java.lang.Thread.run(Thread.java:748) 

What we've checked:

1, We renewed the principal we used every 18 hours.

2, Checked spark thrift log. we found that credentials stored in HDFS were renewed every 15 hours.

18/09/19 16:45:58 INFO Client: Credentials file set to : credentials-xxxxx
18/09/19 16:45:59 INFO Client: To enable the AM to login from keytab, credentials are being copied over to the AM via the YARN secure Distributed Cache.
18/09/19 16:46:10 INFO CredentialUpdater:Scheduling credentials refresh from HDFS in 57588753ms.
18/09/20 08:45:58 INFO CredentialUpdater:Reading new credentials from hdfs://cluster/user/thriftuser/.sparkStaging/application_xxx/credentials-xxyyx
18/09/20 08:45:58 INFO CredentialUpdater:Credentials updated from credentials files.
18/09/20 08:45:58 INFO CredentialUpdater:Scheduling credentials refresh from HDFS in 57588700ms.

3, Checked ranger-kms access log, we found that error code was 403 when decrypt.

xxx.xxx.xxx.xxx - - [20/Sep/2018:10:57:50 +0800] "POST /kms/v1/keyversion/thriftuser_key%400/_eek?eek_op=decrypt HTTP/1.1 403 410"

4, But when we read encrypted data (which hive metadata stored.) directory with the principal active, the data can be read successfully..

[spark@xxx]hdfs dfs -cat /user/thriftuser/test.txt

test!
1 REPLY 1

Re: Spark thrift got general security issue

New Contributor

thank you!

Don't have an account?
Coming from Hortonworks? Activate your account here