Member since
11-30-2017
6
Posts
0
Kudos Received
0
Solutions
03-12-2018
05:31 PM
Every 5 minutes batch I used this method UserGroupInformation.loginUserFromKeytabAndReturnUGI for renew. Do you have any reference ?
... View more
03-12-2018
12:28 PM
I working on spark streaming job in which incoming stream join with existing hive table. I have created a singleton hiveContext. When hiveContext fetch the table data from hive, spark give warning and after few day warning converts into error.
18/03/10 15:55:28 INFO parquet.ParquetRelation$anonfun$buildInternalScan$1$anon$1: Input split: ParquetInputSplit{part: hdfs://nameservice1/user/hive/warehouse/iot.db/iotdevice/part-r-00000-931d1d81-af03-41a4-b659-81a883131289.gz.parquet start: 0 end: 5695 length: 5695 hosts: []}
18/03/10 15:55:28 WARN security.UserGroupInformation: PriviledgedActionException as:svc-ra-iotloaddev (auth:SIMPLE) cause:org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)
18/03/10 15:55:28 WARN kms.LoadBalancingKMSClientProvider: KMS provider at [https://iotserver9009.kd.iotserver.com:16000/kms/v1/] threw an IOException [org.apache.hadoop.security.authentication.client.AuthenticationException: GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]!!
It will stop the job after some day. Here is code for creating hivecontext @transient private var instance: HiveContext = _
def getHiveContext(sparkContext: SparkContext, propertiesBroadcast: Broadcast[Properties]): HiveContext = {
synchronized {
val configuration = new Configuration
configuration.addResource("/etc/hadoop/conf/hdfs-site.xml")
UserGroupInformation.setConfiguration(configuration)
UserGroupInformation.getCurrentUser.setAuthenticationMethod(AuthenticationMethod.KERBEROS)
val secure = propertiesBroadcast.value.getProperty("kerberosSecurity").toBoolean
if (instance == null) {
UserGroupInformation.loginUserFromKeytabAndReturnUGI(
propertiesBroadcast.value.getProperty("hadoop.kerberos.principal"), sparkContext.getConf.get("spark.yarn.keytab"))
.doAs(new PrivilegedExceptionAction[HiveContext]() {
@Override
def run(): HiveContext = {
System.setProperty("hive.metastore.uris", propertiesBroadcast.value.getProperty("hive.metastore.uris"));
if (secure) {
System.setProperty("hive.metastore.sasl.enabled", "true")
System.setProperty("hive.metastore.kerberos.keytab.file", sparkContext.getConf.get("spark.yarn.keytab"))
System.setProperty("hive.security.authorization.enabled", "false")
System.setProperty("hive.metastore.kerberos.principal", propertiesBroadcast.value.getProperty("hive.metastore.kerberos.principal"))
System.setProperty("hive.metastore.execute.setugi", "true")
}
instance = new HiveContext(sparkContext)
instance.setConf("spark.sql.parquet.writeLegacyFormat", "true")
instance.sparkContext.hadoopConfiguration.set("parquet.enable.summary-metadata", "false")
instance.setConf("hive.exec.dynamic.partition", "true")
instance.setConf("hive.exec.dynamic.partition.mode", "nonstrict")
instance
}
})
}
UserGroupInformation.loginUserFromKeytabAndReturnUGI(
propertiesBroadcast.value.getProperty("hadoop.kerberos.principal"), sparkContext.getConf.get("spark.yarn.keytab"))
.doAs(new PrivilegedExceptionAction[HiveContext]() {
@Override
def run(): HiveContext = {
instance
}
})
}
}
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache Hive
-
Apache Spark
02-19-2018
11:08 PM
I have already provided the principal and keytab. UserGroupInformation.loginUserFromKeytabAndReturnUGI( properties.getProperty( "hadoop.kerberos.principal" ), sparkContext.getConf.get( "spark.yarn.keytab" ) )
... View more
02-19-2018
10:42 PM
Hi, I am working on spark-streaming project. My hadoop cluster has Kerberos enabled. I have added some properties while creating hiveContext- @ transient private var instance : HiveContext = _ def getHiveContext(sparkContext: SparkContext, properties: Properties): HiveContext = { synchronized { val configuration = new Configuration configuration.addResource( "/etc/hadoop/conf/hdfs-site.xml" ) UserGroupInformation.setConfiguration(configuration) UserGroupInformation.getCurrentUser.setAuthenticationMethod(AuthenticationMethod.KERBEROS) if (instance == null ) { System.setProperty( "hive.metastore.uris" , properties.getProperty( "hive.metastore.uris" )); if (properties.getProperty( "kerberosSecurity" ).toBoolean) { System.setProperty( "hive.metastore.sasl.enabled" , "true" ) System.setProperty( "hive.metastore.kerberos.keytab.file" , sparkContext.getConf.get( "spark.yarn.keytab" )) System.setProperty( "hive.security.authorization.enabled" , "false" ) System.setProperty( "hive.metastore.kerberos.principal" , properties.getProperty( "hive.metastore.kerberos.principal" )) System.setProperty( "hive.metastore.execute.setugi" , "true" ) } UserGroupInformation.loginUserFromKeytabAndReturnUGI( properties.getProperty( "hadoop.kerberos.principal" ), sparkContext.getConf.get( "spark.yarn.keytab" )) .doAs( new PrivilegedExceptionAction[HiveContext]() { @Override def run(): HiveContext = { instance = new HiveContext(sparkContext) instance } }) } UserGroupInformation.loginUserFromKeytabAndReturnUGI( properties.getProperty( "hadoop.kerberos.principal" ), sparkContext.getConf.get( "spark.yarn.keytab" )) .doAs( new PrivilegedExceptionAction[HiveContext]() { @Override def run(): HiveContext = { instance } }) } } Program run fine. Spark can access hive to save the data into hive in every 5 mins. But after about 24hr or 30hr my spark program gives error regarding kerbores authentication. Caused by: java.io.IOException: org.apache.hadoop.security.authentication.client.AuthenticationException: org.apache.hadoop.security.token.SecretManager$InvalidToken: token (kms-dt owner=svc-dev, renewer=yarn, realUser=, issueDate=1518875523811, maxDate=1519480323811, sequenceNumber=85825, masterKeyId=1062) can't be found in cache at org.apache.hadoop.crypto.key.kms.LoadBalancingKMSClientProvider.decryptEncryptedKey(LoadBalancingKMSClientProvider.java:216) at org.apache.hadoop.crypto.key.KeyProviderCryptoExtension.decryptEncryptedKey(KeyProviderCryptoExtension.java:388) at org.apache.hadoop.hdfs.DFSClient.decryptEncryptedDataEncryptionKey(DFSClient.java:1440) at org.apache.hadoop.hdfs.DFSClient.createWrappedInputStream(DFSClient.java:1510) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:328) at org.apache.hadoop.hdfs.DistributedFileSystem$4.doCall(DistributedFileSystem.java:322) at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81) at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:322) at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:783) at parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:407) at parquet.hadoop.ParquetFileReader$2.call(ParquetFileReader.java:238) ... 5 more Can anybody help me out from this problem?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark
-
Kerberos
11-30-2017
04:30 AM
I have to create a partitioned table in hive. Data have 100 customers which have millions of records. So I have created a partitioned table.
create table ptable ( foo String, bar String) PARTITIONED BY (customer_name String, studio_name String, ack_name String) STORED AS PARQUET LOCATION '/user/hive/warehouse/lucy';
Table successfully created; Insert the data in table
SET hive.exec.dynamic.partition=true; SET hive.exec.dynamic.partition.mode=nonstrict; INSERT OVERWRITE TABLE ptable PARTITION(customer_name, studio_name, ack_name) SELECT foo, bar, customer_name, studio_name, ack_name FROM stable;
it gives error >ERROR : Status: Failed >ERROR : FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask >INFO : Completed executing command(queryId=hive_20171130115757_fcbf193e-9b07-4946-b051-61005b7d0cbf); Time taken: 91.358 seconds Error: Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask (state=08S01,code=3)
I have give more memory to insert query
SET mapreduce.map.memory.mb=7000; SET mapreduce.map.java.opts=-Xmx7200m; SET mapreduce.reduce.memory.mb=7000; SET mapreduce.reduce.java.opts=-Xmx7200m;
still getting the same error.
When I tried with limit in query it works well.
INSERT OVERWRITE TABLE ptable PARTITION(customer_name, studio_name, ack_name) SELECT foo, bar, customer_name, studio_name, ack_name FROM stable limit 100;
Is there any properties to give Hive?
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark