Member since
01-11-2016
34
Posts
8
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5088 | 06-21-2017 06:09 AM | |
34097 | 06-16-2017 03:52 PM | |
19319 | 12-12-2016 02:30 PM | |
2969 | 09-01-2016 08:40 AM | |
2275 | 05-09-2016 05:01 PM |
03-26-2018
02:56 PM
Hi
I am building a dashboard based on streaming data.
I use Cloudera 5.10, Spark 2.1 and Kafka 0.10.
I wanted to give little boost to my app and one of optimizations was to put checkpoint dir to SSD drives.
I have added fresh and empty nodes (4 of them) with volumes prefixed as SSD (256GB each). All assigned to the same rack.
I have set policy on hdfs:///spark/checkpoints/myapp to ALL_SSD
hdfs storagePolicies -setStragePolicy -path hdfs:///spark/checkpoints/myapp -policy ALL_SSD
I have changed replication factor for this dir from default 3 to 2 with command
hdfs dfs -setrep -w 2 hdfs:///spark/checkpoints/myapp
And after I star my app
hdfs fsck hdfs:///spark/checkpoints/myapp -files -blocks -locations
says that blocks has 3 replicas (2 on SSD and 1 on DISK).
Problem 1:
2SSD and 1DISK instead expected 3SSD (if replication factor =3) - I suspect problem with racks, because according to replication algoritm (2copies in the same rack and 3rd to the node in the other rack), Fallback storages for replication setting (DISK) for ALL_SSD policy could have been used here. I will try to reorganize nodes placement and spread them over 2 racks - does any one has some other ideas why I got 1 DISK?
Problem 2:
Why I have 3 replicas despite I've set manually setrep -w 2 at the parent directory?
Best Regards
Artur
... View more
Labels:
- Labels:
-
Apache Kafka
-
Apache Spark
-
HDFS
11-25-2017
01:25 AM
2 Kudos
OK, the problem is that for some reason spark-sql-kafka-0-10_2.11-2.1.0.cloudera1.jar is not loaded and lookupDataSource do not see KafkaSourceProvider as extended class of trait DataSourceRegister, so there is no override def shortName () : String = " kafka " and when not found it sets default data source appending to given datasource provider, and that is why I got class not found kafka.DefaultSource. The solution is to 1. get from somewhere this jar spark-sql-kafka-0-10_2.11-2.1.0.cloudera1.jar a) manually download from Cloudera repo https://repository.cloudera.com/cloudera/cloudera-repos/org/apache/spark/spark-sql-kafka-0-10_2.11/2.1.0.cloudera1/spark-sql-kafka-0-10_2.11-2.1.0.cloudera1.jar or b) run spark2-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.1.0 as Jacek Laskowski explained: https://github.com/jaceklaskowski/spark-structured-streaming-book/blob/master/spark-sql-streaming-KafkaSource.adoc and 2. Add run your app via spark2-submit --jars ~/.ivy2/cache/org.apache.spark/spark-sql-kafka-0-10_2.11/jars/spark-sql-kafka-0-10_2.11-2.1.0.cloudera1.jar Very important: remember to set proper Kafka version (in this case 0.10) via export or in Spark2 service configuration in Cloudera Manager. https://www.cloudera.com/documentation/spark2/latest/topics/spark2_kafka.html
... View more
11-25-2017
01:02 AM
Hi I had the same problem, but I've copied solution from another problem, where I was setting kerberized Kafka with Sentry. To run Spark programm with kerberized kafka you have to switch spark tu use Kafka 0.10. You can do this in 2 ways: 1. Before spark2-submit you have to export kafka version $ export SPARK_KAFKA_VERSION=0.10 $ spark2-submit ... 2. Set default Kafka version in spark2 service configuration in Cloudera Manager I've also run structured streaming ETL and had the same problem as you with version 0.9. After I've set export my programm started properly. https://www.cloudera.com/documentation/spark2/latest/topics/spark2_kafka.html
... View more
11-23-2017
08:43 AM
Hi
I use CDH 5.10.1, Spark 2.1 and Kafka 2.1
When I try simple program for ETL:
val mystream = spark .readStream .format( "kafka" ) .option( "kafka.bootstrap.servers" , "mybroker:9092" ) .option( "subscribe" , "mytopic" ) .load()
With dependencies in build.sbt:
scalaVersion := "2.11.8" lazy val spark = Seq ( "spark-core" , "spark-hive" , "spark-streaming" , "spark-sql" , "spark-streaming-kafka-0-10" ).map( "org.apache.spark" %% _ % "2.1.0.cloudera1" % "provided" ) libraryDependencies += "org.apache.spark" %% "spark-sql-kafka-0-10" % "2.1.0.cloudera1" libraryDependencies += "org.apache.kafka" % "kafka-clients" % "0.10.0-kafka-2.1.0" libraryDependencies += "org.apache.kafka" %% "kafka" % "0.10.0-kafka-2.1.0" libraryDependencies ++= spark resolvers ++= Seq ( "Cloudera Repository" at "https://repository.cloudera.com/artifactory/cloudera-repos/" )
When submitting my app, I get error:
Exception in thread "main" java.lang.ClassNotFoundException: Failed to find data source: kafka. Please find packages at http://spark.apache.org/third-party-projects.html
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:594)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass$lzycompute(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.providingClass(DataSource.scala:86)
at org.apache.spark.sql.execution.datasources.DataSource.sourceSchema(DataSource.scala:197)
at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo$lzycompute(DataSource.scala:87)
at org.apache.spark.sql.execution.datasources.DataSource.sourceInfo(DataSource.scala:87)
at org.apache.spark.sql.execution.streaming.StreamingRelation$.apply(StreamingRelation.scala:30)
at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:124)
at myetl.Main$.main(Main.scala:20)
at myetl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: kafka.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25$$anonfun$apply$13.apply(DataSource.scala:579)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25$$anonfun$apply$13.apply(DataSource.scala:579)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25.apply(DataSource.scala:579)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$25.apply(DataSource.scala:579)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:579)
... 18 more
Please help...
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Kafka
-
Apache Spark
08-11-2017
03:16 AM
So, summarizing: When starting Spark with --deploy-mode client one must distribute Jaas file and keytab to every machine. When starting Spark with --deploy-mode cluster one must distribute Jaas file and keytab with --files and/or --keytab.
... View more
06-21-2017
06:09 AM
Well as far as I can read the code I've cited, there is problem when Kafka wants to list roles, and it want to do this when caching Sentry privileges is enabled. When I: 1. In Kafka configuration disabled sentry.kafka.caching.enable 2. In Sentry configuration deleted kafka group from sentry.service.admin.group, so the configuration is: root@node1:~# cd `ls -dt /var/run/cloudera-scm-agent/process/*sentry* | head -n1`
root@node1:/var/run/cloudera-scm-agent/process/1442-sentry-SENTRY_SERVER# grep -A 1 -E "sentry.service.(allow.connect|admin.group)" sentry-site.xml
<name>sentry.service.admin.group</name>
<value>hive,impala,hue,sudo</value>
--
<name>sentry.service.allow.connect</name>
<value>hive,impala,hue,hdfs,solr,kafka</value>
root@node1:/var/run/cloudera-scm-agent/process/1442-sentry-SENTRY_SERVER# 3. Depoly client configuration and restart dependant services After this steps Kafka started properly, so turning off caching of Sentry privileges was a workaround for me to start Kafka without errors. Though I have problems when using kafka-sentry tool: root@node1:~# kinit isegrim
Password for isegrim@TEST.COM:
root@node1:~# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: isegrim@TEST.COM
Valid starting Expires Service principal
21/06/2017 15:01 22/06/2017 01:01 krbtgt/TEST.COM@TEST.COM
renew until 28/06/2017 15:01
root@node1:~# kafka-sentry --config `ls -dt /var/run/cloudera-scm-agent/process/*sentry* | head -n1` -lp -r myrole
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[2017-06-21 15:02:16,992] ERROR Config key sentry.service.client.server.rpc-address is required (org.apache.sentry.provider.db.generic.tools.SentryShellKafka)
java.lang.NullPointerException: Config key sentry.service.client.server.rpc-address is required
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:229)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericServiceClientDefaultImpl.<init>(SentryGenericServiceClientDefaultImpl.java:123)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericServiceClientFactory.create(SentryGenericServiceClientFactory.java:31)
at org.apache.sentry.provider.db.generic.tools.SentryShellKafka.run(SentryShellKafka.java:51)
at org.apache.sentry.provider.db.tools.SentryShellCommon.executeShell(SentryShellCommon.java:241)
at org.apache.sentry.provider.db.generic.tools.SentryShellKafka.main(SentryShellKafka.java:96)
The operation failed. Message: Config key sentry.service.client.server.rpc-address is required
root@node1:~# I can't see this configuration option in CM, but I see rpc-address is configured in CDH 5.10 Sentry service configuration, but without explanation what exactly address should it be or I don't see this: https://www.cloudera.com/documentation/enterprise/5-10-x/topics/sg_sentry_service_config.html Beside that I have that address set and working: root@node1:~# grep -A 1 rpc `ls -dt /var/run/cloudera-scm-agent/process/*sentry* | head -n1`/sentry-site.xml
<name>sentry.service.server.rpc-address</name>
<value>node1</value>
--
<name>sentry.service.server.rpc-port</name>
<value>8038</value>
root@node1:~#
root@node1:~# ps -ef | grep `netstat -anpt | grep LISTEN | grep ':8038' | awk '{print $7}' | awk -F '/' '{print $1}'`
sentry 4599 2654 0 14:35 ? 00:00:20 /usr/lib/jvm/java-8-oracle/jre/bin/java -Xmx1000m -Dhadoop.log.dir=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xms268435456 -Xmx268435456 -XX:OnOutOfMemoryError=/usr/lib/cmf/service/common/killparent.sh -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/sentry/lib/sentry-core-common-1.5.1-cdh5.10.0.jar org.apache.sentry.SentryMain --command service --log4jConf /run/cloudera-scm-agent/process/1442-sentry-SENTRY_SERVER/sentry-log4j.properties -conffile /run/cloudera-scm-agent/process/1442-sentry-SENTRY_SERVER/sentry-site.xml
sentry 4616 4599 0 14:35 ? 00:00:00 python2.7 /usr/lib/cmf/agent/build/env/bin/cmf-redactor /usr/lib/cmf/service/sentry/sentry.sh
root@node1:~# One more update, when I run kafka-sentry command when I am logged as kafka user in Kerberos it gives me the same error as before disabling sentry privileges caching in Kafka: root@node2:~# cd `ls -dt /var/run/cloudera-scm-agent/process/*kafka* | head -n1`
root@node2:/var/run/cloudera-scm-agent/process/1445-kafka-KAFKA_BROKER# kinit -kt kafka.keytab kafka/node2@TEST.COM
root@node2:/var/run/cloudera-scm-agent/process/1445-kafka-KAFKA_BROKER# kafka-sentry -lp -r zto
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/06/21 17:18:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/06/21 17:18:10 ERROR tools.SentryShellKafka: Access denied to kafka. Server Stacktrace: org.apache.sentry.provider.db.SentryAccessDeniedException: Access denied to kafka
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor$10.handle(SentryGenericPolicyProcessor.java:607)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.requestHandle(SentryGenericPolicyProcessor.java:201)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.list_sentry_privileges_by_role(SentryGenericPolicyProcessor.java:599)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_privileges_by_role.getResult(SentryGenericPolicyService.java:977)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_privileges_by_role.getResult(SentryGenericPolicyService.java:962)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessorWrapper.process(SentryGenericPolicyProcessorWrapper.java:37)
at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:123)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
org.apache.sentry.provider.db.SentryAccessDeniedException: Access denied to kafka. Server Stacktrace: org.apache.sentry.provider.db.SentryAccessDeniedException: Access denied to kafka
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor$10.handle(SentryGenericPolicyProcessor.java:607)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.requestHandle(SentryGenericPolicyProcessor.java:201)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.list_sentry_privileges_by_role(SentryGenericPolicyProcessor.java:599)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_privileges_by_role.getResult(SentryGenericPolicyService.java:977)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_privileges_by_role.getResult(SentryGenericPolicyService.java:962)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessorWrapper.process(SentryGenericPolicyProcessorWrapper.java:37)
at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:123)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at org.apache.sentry.service.thrift.Status.throwIfNotOk(Status.java:113)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericServiceClientDefaultImpl.listPrivilegesByRoleName(SentryGenericServiceClientDefaultImpl.java:484)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericServiceClientDefaultImpl.listPrivilegesByRoleName(SentryGenericServiceClientDefaultImpl.java:494)
at org.apache.sentry.provider.db.generic.tools.command.ListPrivilegesByRoleCmd.execute(ListPrivilegesByRoleCmd.java:45)
at org.apache.sentry.provider.db.generic.tools.SentryShellKafka.run(SentryShellKafka.java:83)
at org.apache.sentry.provider.db.tools.SentryShellCommon.executeShell(SentryShellCommon.java:241)
at org.apache.sentry.provider.db.generic.tools.SentryShellKafka.main(SentryShellKafka.java:96)
The operation failed. Message: Access denied to kafka. Server Stacktrace: org.apache.sentry.provider.db.SentryAccessDeniedException: Access denied to kafka
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor$10.handle(SentryGenericPolicyProcessor.java:607)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.requestHandle(SentryGenericPolicyProcessor.java:201)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.list_sentry_privileges_by_role(SentryGenericPolicyProcessor.java:599)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_privileges_by_role.getResult(SentryGenericPolicyService.java:977)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_privileges_by_role.getResult(SentryGenericPolicyService.java:962)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessorWrapper.process(SentryGenericPolicyProcessorWrapper.java:37)
at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:123)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
root@node2:/var/run/cloudera-scm-agent/process/1445-kafka-KAFKA_BROKER# When I add kafka group to Sentry admin groups (sentry.service.admin.group) it looks like everything is working, but only from kerberos logged user kafka: root@node2:~# klist Ticket cache: FILE:/tmp/krb5cc_0 Default principal: kafka/node2@TEST.COM Valid starting Expires Service principal 21/06/2017 17:16 22/06/2017 03:16 krbtgt/TEST.COM@TEST.COM renew until 28/06/2017 17:16 root@node2:~# kafka-sentry -lp -r myrole
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/06/21 17:31:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
root@node2:~#
... View more
06-21-2017
03:35 AM
Hi I've checked the code and made some tests and it looks like Sentry beside sentry.service.allow.connect setting needs also to add group kafka as admin group in sentry.service.admin.group setting, which can be some security problem, because anyone in group kafka will be able to do anything on the cluster. The method handle() from org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor (from the Kafka and Sentry the stack trace) requests that user should be in one of Sentry admin group, and after I added kafka group Kafka brokers started properly without errors: https://github.com/cloudera/sentry/blob/cdh5-1.5.1_5.10.0/sentry-provider/sentry-provider-db/src/main/java/org/apache/sentry/provider/db/generic/service/thrift/SentryGenericPolicyProcessor.java public Response<Set<TSentryRole>> handle() throws Exception {
validateClientVersion(request.getProtocol_version());
Set<String> groups = getRequestorGroups(conf, request.getRequestorUserName());
if (AccessConstants.ALL.equalsIgnoreCase(request.getGroupName())) {
//check all groups which requestorUserName belongs to
} else {
boolean admin = inAdminGroups(groups);
//Only admin users can list all roles in the system ( groupname = null)
//Non admin users are only allowed to list only groups which they belong to
if(!admin && (request.getGroupName() == null || !groups.contains(request.getGroupName()))) {
throw new SentryAccessDeniedException(ACCESS_DENIAL_MESSAGE + request.getRequestorUserName());
}
groups.clear();
groups.add(request.getGroupName());
} I don't see sense of adding group kafka to sentry.service.admin.group because as Sentry documentation says: https://cwiki.apache.org/confluence/display/SENTRY/Sentry+Service+Configuration sentry.service.admin.group - List of groups allowed to make policy updates And since in kafka group we want only to be the kafka service, which should only check permissions at Sentry, there is no need for write permission to make policy changes, because kafka itself should not do any policy changes. I've double checked eventual typos but I found none. I've done copy/paste as well as entering "kafka" by finger in every field in CM and restart cluster and deploy client configuration, but the above result was the olny one.
... View more
06-20-2017
01:18 PM
Hi pdvorak! Thank you for your fast reply. Yest for me this is also very strange, that is why I've decided to turn to community. I have 4 nodes, where node 1 has Sentry Server, and node{2..4} are worker nodes (HDFS DataNode + YARN Node Manager). id kafka: isegrim@node1:~$ for i in {1..4}; do ssh node$i "id kafka"; done
uid=998(kafka) gid=999(kafka) groups=999(kafka)
uid=998(kafka) gid=999(kafka) groups=999(kafka)
uid=998(kafka) gid=999(kafka) groups=999(kafka)
uid=998(kafka) gid=999(kafka) groups=999(kafka)
isegrim@node1:~$ StackTrace: isegrim@node1:~$ grep '2017-06-20 16:01:30.* ERROR' -A 16 /var/log/sentry/hadoop-cmf-sentry-SENTRY_SERVER-node1.log.out
2017-06-20 16:01:30,840 ERROR org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor: Access denied to kafka
org.apache.sentry.provider.db.SentryAccessDeniedException: Access denied to kafka
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor$9.handle(SentryGenericPolicyProcessor.java:575)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.requestHandle(SentryGenericPolicyProcessor.java:201)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.list_sentry_roles_by_group(SentryGenericPolicyProcessor.java:563)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_roles_by_group.getResult(SentryGenericPolicyService.java:957)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_roles_by_group.getResult(SentryGenericPolicyService.java:942)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessorWrapper.process(SentryGenericPolicyProcessorWrapper.java:37)
at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:123)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
2017-06-20 16:01:39,091 ERROR org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor: Access denied to kafka
org.apache.sentry.provider.db.SentryAccessDeniedException: Access denied to kafka
isegrim@node1:~$ Confirmation of sentry.service.allow.connect setting: root@node1:~# cd `ls -dt /var/run/cloudera-scm-agent/process/*sentry* | tail -n1`
root@node1:/var/run/cloudera-scm-agent/process/1365-sentry-SENTRY_SERVER# grep -A 1 sentry.service.allow.connect sentry-site.xml
<name>sentry.service.allow.connect</name>
<value>hive,impala,hue,hdfs,solr,kafka</value>
root@node1:/var/run/cloudera-scm-agent/process/1365-sentry-SENTRY_SERVER# Kafka configuration for sentry (in CM). Kafka configuration - users (in CM). Sentry configuration - sentry.service.allow.connect (in CM). I have no idea what have I done wrong or didn't do. Please help. Best Regards
... View more
06-20-2017
08:35 AM
Hello, I have CDH 5.10.1, KAFKA2.1 (0.10) - all of them kerberized. I wanted to use sentry with Kafka as this procedure says: https://www.cloudera.com/documentation/kafka/latest/topics/kafka_security.html#using_kafka_with_sentry As procedure says, I've made all 5 pointes and then deploy client configuration and restart depending services and then Kafka did not start, giving error: 2017-06-20 16:01:30,892 ERROR org.apache.sentry.kafka.binding.KafkaAuthBindingSingleton: Unable to create KafkaAuthBinding
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.sentry.kafka.binding.KafkaAuthBinding.createAuthProvider(KafkaAuthBinding.java:194)
at org.apache.sentry.kafka.binding.KafkaAuthBinding.<init>(KafkaAuthBinding.java:97)
at org.apache.sentry.kafka.binding.KafkaAuthBindingSingleton.configure(KafkaAuthBindingSingleton.java:63)
at org.apache.sentry.kafka.authorizer.SentryKafkaAuthorizer.configure(SentryKafkaAuthorizer.java:120)
at kafka.server.KafkaServer$$anonfun$startup$3.apply(KafkaServer.scala:211)
at kafka.server.KafkaServer$$anonfun$startup$3.apply(KafkaServer.scala:209)
at scala.Option.map(Option.scala:146)
at kafka.server.KafkaServer.startup(KafkaServer.scala:209)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:37)
at kafka.Kafka$.main(Kafka.scala:67)
at com.cloudera.kafka.wrap.Kafka$.main(Kafka.scala:76)
at com.cloudera.kafka.wrap.Kafka.main(Kafka.scala)
Caused by: java.lang.RuntimeException: Failed to get privileges from Sentry to build cache.
at org.apache.sentry.provider.db.generic.SentryGenericProviderBackend.initialize(SentryGenericProviderBackend.java:89)
at org.apache.sentry.policy.kafka.SimpleKafkaPolicyEngine.<init>(SimpleKafkaPolicyEngine.java:44)
... 16 more
Caused by: org.apache.sentry.provider.db.SentryAccessDeniedException: Access denied to kafka. Server Stacktrace: org.apache.sentry.provider.db.SentryAccessDeniedException: Access denied to kafka
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor$9.handle(SentryGenericPolicyProcessor.java:575)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.requestHandle(SentryGenericPolicyProcessor.java:201) The same messages are in Setnry log. I've checked Sentry configuration and there is configured kafka user as one of service users in sentry.service.allow.connect setting. kafka is local (not LDAP) user, that has the same uid and gid in Linux on every cluster node. Can some one tell me what else should I do to let kafka user to get be allowed to query sentry? Best Regards
... View more
Labels:
- Labels:
-
Apache Sentry
-
Kerberos
06-17-2017
02:41 AM
No, no, jaas.conf before distributing it to worker nodes was: KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="./isegrim.keytab"
principal="isegrim@TEST.COM";
}; And after I have figured out, that KafkaConsumer constructor looks for jaas.conf and keytab in executor container's local machine's filesystem instead HDFS application CWD, then I've distributed keytab to linux FS and I've changed path to keytab from ./isegrim.keytab (I thought it will be local to Spark application HDFS CWD) to /home/isegrim/isegrim.keytab. I agree that this should not work like this. Now I have to take care of distributing and securing keytab and jaas.conf files. I think it must be something with the Kafka Consumer constructor, which don't know that those files are distributed with the application in it's CWD on HDFS and tries to look for them in local machine's filesystem. If I will have more time I'll try to check the code of those constructors to figure it out.
... View more