About pdvorak

RajeshBodolla · ‎12-28-2017

Had to go with sentry and hdfs. Sentry is tightly coupled with hdfs and has a mandatory config "HDFS Service" so you need to have hdfs. you can configure hdfs and sentry and stop hdfs once sentry is completely configured

vitortoledo · ‎11-09-2017

teravaidya By default, the Flume agent plugins reside in this directories: /usr/lib/flume-ng/plugins.d /var/lib/flume-ng/plugins.d

pdvorak · ‎09-20-2017

You shouldn't have to reindex the whole set of documents, unless you need that new field to be added to those existing documents. New documents that have that field would be searchable with that field, but older documents would not be returned. Reindexing would consist of removing the existing documents in the solr collection, and re running your indexing application (MRIT, solrj etc) to index all the original documents again. Alternatively you could have a solrj application that reads the old documents and adds the value to the document for the newly created field. Of course, you should test this in a QA environment to confirm the desired behavior. -pd

pdvorak · ‎09-20-2017

Thanks for the clarification, the original comments said your flume file channel was running out of space. With regards to the hdfs sink, once flume delivers to the hdfs sink, it no longer controls those files. Whatever post processing you are doing that uses those files should be responsible for cleaning up those folders. There isn't functionality within the flume sink to clean up old folders or expire data that has been delivered already. You could run a simple cron job that removes directories in hdfs older than a month, or run an oozie job that does the same. HTH -pd

pdvorak · ‎09-19-2017

As I stated in my recent comment, the flume kafka client was upgraded as a part of the CDH5.8 upgrade to be able to use the new consumer API, which supports secure communication with kerberos. Versions prior to CDH5.8 use the old api which doesn't support kerberos or SSL. You will have to upgrade to get this new functionality, or run flume outside of Cloudera Manager, using tarballs or RPM's. -pd

pdvorak · ‎09-08-2017

Your best bet would to use sentry to provide the authorization with kerberos and AD. You can use sssd on the linux nodes to make the AD users and groups available to kafka: https://www.cloudera.com/documentation/enterprise/latest/topics/sg_auth_overview.html https://www.cloudera.com/documentation/kafka/latest/topics/kafka_security.html -pd

pdvorak · ‎09-07-2017

You are correct, SASL_PLAINTEXT only provides authentication, not encryption. You'll want SASL_SSL if you need encrypted traffic as well. You can set inter.broker.protocol to a different value if you'd like to only encrypt client/server traffic, but if you leave that to inferred in CM, it will use whatever your listener value is set to. -pd

ganeshkumarj · ‎08-01-2017

Following blog helped me to setup disaster recovery for solr https://blog.cloudera.com/blog/2017/05/how-to-backup-and-disaster-recovery-for-apache-solr-part-i/

pdvorak · ‎07-25-2017

the hbase-indexer morphlines.conf is managed by CM, and will automatically be distributed to each node in the /var/run/cloudera-scm-agent/process directory when hbase-indexer starts. You'll want to specify a relative path name in the morphline-hbase-mapper.xml, and it will pick it up from the process directory: https://www.cloudera.com/documentation/enterprise/latest/topics/search_hbase_batch_indexer.html#concept_q3l_2tb_4r -pd

Isegrim · ‎06-21-2017

Well as far as I can read the code I've cited, there is problem when Kafka wants to list roles, and it want to do this when caching Sentry privileges is enabled. When I: 1. In Kafka configuration disabled sentry.kafka.caching.enable 2. In Sentry configuration deleted kafka group from sentry.service.admin.group, so the configuration is: root@node1:~# cd `ls -dt /var/run/cloudera-scm-agent/process/*sentry* | head -n1` root@node1:/var/run/cloudera-scm-agent/process/1442-sentry-SENTRY_SERVER# grep -A 1 -E "sentry.service.(allow.connect|admin.group)" sentry-site.xml <name>sentry.service.admin.group</name> <value>hive,impala,hue,sudo</value> -- <name>sentry.service.allow.connect</name> <value>hive,impala,hue,hdfs,solr,kafka</value> root@node1:/var/run/cloudera-scm-agent/process/1442-sentry-SENTRY_SERVER# 3. Depoly client configuration and restart dependant services After this steps Kafka started properly, so turning off caching of Sentry privileges was a workaround for me to start Kafka without errors. Though I have problems when using kafka-sentry tool: root@node1:~# kinit isegrim Password for isegrim@TEST.COM: root@node1:~# klist Ticket cache: FILE:/tmp/krb5cc_0 Default principal: isegrim@TEST.COM Valid starting Expires Service principal 21/06/2017 15:01 22/06/2017 01:01 krbtgt/TEST.COM@TEST.COM renew until 28/06/2017 15:01 root@node1:~# kafka-sentry --config `ls -dt /var/run/cloudera-scm-agent/process/*sentry* | head -n1` -lp -r myrole SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] [2017-06-21 15:02:16,992] ERROR Config key sentry.service.client.server.rpc-address is required (org.apache.sentry.provider.db.generic.tools.SentryShellKafka) java.lang.NullPointerException: Config key sentry.service.client.server.rpc-address is required at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:229) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericServiceClientDefaultImpl.<init>(SentryGenericServiceClientDefaultImpl.java:123) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericServiceClientFactory.create(SentryGenericServiceClientFactory.java:31) at org.apache.sentry.provider.db.generic.tools.SentryShellKafka.run(SentryShellKafka.java:51) at org.apache.sentry.provider.db.tools.SentryShellCommon.executeShell(SentryShellCommon.java:241) at org.apache.sentry.provider.db.generic.tools.SentryShellKafka.main(SentryShellKafka.java:96) The operation failed. Message: Config key sentry.service.client.server.rpc-address is required root@node1:~# I can't see this configuration option in CM, but I see rpc-address is configured in CDH 5.10 Sentry service configuration, but without explanation what exactly address should it be or I don't see this: https://www.cloudera.com/documentation/enterprise/5-10-x/topics/sg_sentry_service_config.html Beside that I have that address set and working: root@node1:~# grep -A 1 rpc `ls -dt /var/run/cloudera-scm-agent/process/*sentry* | head -n1`/sentry-site.xml <name>sentry.service.server.rpc-address</name> <value>node1</value> -- <name>sentry.service.server.rpc-port</name> <value>8038</value> root@node1:~# root@node1:~# ps -ef | grep `netstat -anpt | grep LISTEN | grep ':8038' | awk '{print $7}' | awk -F '/' '{print $1}'` sentry 4599 2654 0 14:35 ? 00:00:20 /usr/lib/jvm/java-8-oracle/jre/bin/java -Xmx1000m -Dhadoop.log.dir=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xms268435456 -Xmx268435456 -XX:OnOutOfMemoryError=/usr/lib/cmf/service/common/killparent.sh -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/sentry/lib/sentry-core-common-1.5.1-cdh5.10.0.jar org.apache.sentry.SentryMain --command service --log4jConf /run/cloudera-scm-agent/process/1442-sentry-SENTRY_SERVER/sentry-log4j.properties -conffile /run/cloudera-scm-agent/process/1442-sentry-SENTRY_SERVER/sentry-site.xml sentry 4616 4599 0 14:35 ? 00:00:00 python2.7 /usr/lib/cmf/agent/build/env/bin/cmf-redactor /usr/lib/cmf/service/sentry/sentry.sh root@node1:~# One more update, when I run kafka-sentry command when I am logged as kafka user in Kerberos it gives me the same error as before disabling sentry privileges caching in Kafka: root@node2:~# cd `ls -dt /var/run/cloudera-scm-agent/process/*kafka* | head -n1` root@node2:/var/run/cloudera-scm-agent/process/1445-kafka-KAFKA_BROKER# kinit -kt kafka.keytab kafka/node2@TEST.COM root@node2:/var/run/cloudera-scm-agent/process/1445-kafka-KAFKA_BROKER# kafka-sentry -lp -r zto SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 17/06/21 17:18:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 17/06/21 17:18:10 ERROR tools.SentryShellKafka: Access denied to kafka. Server Stacktrace: org.apache.sentry.provider.db.SentryAccessDeniedException: Access denied to kafka at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor$10.handle(SentryGenericPolicyProcessor.java:607) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.requestHandle(SentryGenericPolicyProcessor.java:201) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.list_sentry_privileges_by_role(SentryGenericPolicyProcessor.java:599) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_privileges_by_role.getResult(SentryGenericPolicyService.java:977) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_privileges_by_role.getResult(SentryGenericPolicyService.java:962) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessorWrapper.process(SentryGenericPolicyProcessorWrapper.java:37) at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:123) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) org.apache.sentry.provider.db.SentryAccessDeniedException: Access denied to kafka. Server Stacktrace: org.apache.sentry.provider.db.SentryAccessDeniedException: Access denied to kafka at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor$10.handle(SentryGenericPolicyProcessor.java:607) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.requestHandle(SentryGenericPolicyProcessor.java:201) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.list_sentry_privileges_by_role(SentryGenericPolicyProcessor.java:599) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_privileges_by_role.getResult(SentryGenericPolicyService.java:977) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_privileges_by_role.getResult(SentryGenericPolicyService.java:962) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessorWrapper.process(SentryGenericPolicyProcessorWrapper.java:37) at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:123) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) at org.apache.sentry.service.thrift.Status.throwIfNotOk(Status.java:113) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericServiceClientDefaultImpl.listPrivilegesByRoleName(SentryGenericServiceClientDefaultImpl.java:484) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericServiceClientDefaultImpl.listPrivilegesByRoleName(SentryGenericServiceClientDefaultImpl.java:494) at org.apache.sentry.provider.db.generic.tools.command.ListPrivilegesByRoleCmd.execute(ListPrivilegesByRoleCmd.java:45) at org.apache.sentry.provider.db.generic.tools.SentryShellKafka.run(SentryShellKafka.java:83) at org.apache.sentry.provider.db.tools.SentryShellCommon.executeShell(SentryShellCommon.java:241) at org.apache.sentry.provider.db.generic.tools.SentryShellKafka.main(SentryShellKafka.java:96) The operation failed. Message: Access denied to kafka. Server Stacktrace: org.apache.sentry.provider.db.SentryAccessDeniedException: Access denied to kafka at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor$10.handle(SentryGenericPolicyProcessor.java:607) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.requestHandle(SentryGenericPolicyProcessor.java:201) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.list_sentry_privileges_by_role(SentryGenericPolicyProcessor.java:599) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_privileges_by_role.getResult(SentryGenericPolicyService.java:977) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_privileges_by_role.getResult(SentryGenericPolicyService.java:962) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessorWrapper.process(SentryGenericPolicyProcessorWrapper.java:37) at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:123) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) root@node2:/var/run/cloudera-scm-agent/process/1445-kafka-KAFKA_BROKER# When I add kafka group to Sentry admin groups (sentry.service.admin.group) it looks like everything is working, but only from kerberos logged user kafka: root@node2:~# klist Ticket cache: FILE:/tmp/krb5cc_0 Default principal: kafka/node2@TEST.COM Valid starting Expires Service principal 21/06/2017 17:16 22/06/2017 03:16 krbtgt/TEST.COM@TEST.COM renew until 28/06/2017 17:16 root@node2:~# kafka-sentry -lp -r myrole SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 17/06/21 17:31:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable root@node2:~#

Online	Offline
Last Visited	‎01-08-2020 04:37 PM

Member Since	‎01-09-2014 08:15 AM
Last Visited	‎01-08-2020 04:37 PM
Posts	283
Kudos received	70

Cloudera Community

Re: spooldir channel error - too many files. - how...

Re: How to configure Flume with Kafka channel with...

Re: How to configure Flume with Kafka channel with...

Re: Solrcloud Replica Names

Re: flume kafkasource, hdfs sink remove avro field

Re: Does kafka2.2.0 in CDH 5.11.2 support ACL's on...

Re: ActiveMQ to Flume to HDFS

Re: Adding new field to the schema.xml

Re: Flume - File Channel getting full

Re: Kafka.properties override for listeners proper...

Re: Kafka ACL authorizer for Active Directory

Re: SASL_PLAINTEXT

Re: Solr disaster recovery at CDH 5.4.8

Re: hbase-indexer's configuration in Zookeeper?

Re: Problem starting kerberized Kafka with Sentry ...