Member since
01-09-2014
283
Posts
70
Kudos Received
50
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1836 | 06-19-2019 07:50 AM | |
2900 | 05-01-2019 08:07 AM | |
2949 | 04-10-2019 08:49 AM | |
2908 | 03-20-2019 09:30 AM | |
2454 | 01-23-2019 10:58 AM |
12-28-2017
03:21 AM
Had to go with sentry and hdfs. Sentry is tightly coupled with hdfs and has a mandatory config "HDFS Service" so you need to have hdfs. you can configure hdfs and sentry and stop hdfs once sentry is completely configured
... View more
11-09-2017
05:06 AM
teravaidya By default, the Flume agent plugins reside in this directories: /usr/lib/flume-ng/plugins.d /var/lib/flume-ng/plugins.d
... View more
09-20-2017
12:30 PM
You shouldn't have to reindex the whole set of documents, unless you need that new field to be added to those existing documents. New documents that have that field would be searchable with that field, but older documents would not be returned. Reindexing would consist of removing the existing documents in the solr collection, and re running your indexing application (MRIT, solrj etc) to index all the original documents again. Alternatively you could have a solrj application that reads the old documents and adds the value to the document for the newly created field. Of course, you should test this in a QA environment to confirm the desired behavior. -pd
... View more
09-20-2017
12:24 PM
Thanks for the clarification, the original comments said your flume file channel was running out of space. With regards to the hdfs sink, once flume delivers to the hdfs sink, it no longer controls those files. Whatever post processing you are doing that uses those files should be responsible for cleaning up those folders. There isn't functionality within the flume sink to clean up old folders or expire data that has been delivered already. You could run a simple cron job that removes directories in hdfs older than a month, or run an oozie job that does the same. HTH -pd
... View more
09-19-2017
12:16 PM
As I stated in my recent comment, the flume kafka client was upgraded as a part of the CDH5.8 upgrade to be able to use the new consumer API, which supports secure communication with kerberos. Versions prior to CDH5.8 use the old api which doesn't support kerberos or SSL. You will have to upgrade to get this new functionality, or run flume outside of Cloudera Manager, using tarballs or RPM's. -pd
... View more
09-08-2017
08:51 AM
1 Kudo
Your best bet would to use sentry to provide the authorization with kerberos and AD. You can use sssd on the linux nodes to make the AD users and groups available to kafka: https://www.cloudera.com/documentation/enterprise/latest/topics/sg_auth_overview.html https://www.cloudera.com/documentation/kafka/latest/topics/kafka_security.html -pd
... View more
09-07-2017
11:22 AM
1 Kudo
You are correct, SASL_PLAINTEXT only provides authentication, not encryption. You'll want SASL_SSL if you need encrypted traffic as well. You can set inter.broker.protocol to a different value if you'd like to only encrypt client/server traffic, but if you leave that to inferred in CM, it will use whatever your listener value is set to. -pd
... View more
08-01-2017
12:35 AM
Following blog helped me to setup disaster recovery for solr https://blog.cloudera.com/blog/2017/05/how-to-backup-and-disaster-recovery-for-apache-solr-part-i/
... View more
07-25-2017
07:43 AM
the hbase-indexer morphlines.conf is managed by CM, and will automatically be distributed to each node in the /var/run/cloudera-scm-agent/process directory when hbase-indexer starts. You'll want to specify a relative path name in the morphline-hbase-mapper.xml, and it will pick it up from the process directory: https://www.cloudera.com/documentation/enterprise/latest/topics/search_hbase_batch_indexer.html#concept_q3l_2tb_4r -pd
... View more
06-21-2017
06:09 AM
Well as far as I can read the code I've cited, there is problem when Kafka wants to list roles, and it want to do this when caching Sentry privileges is enabled. When I: 1. In Kafka configuration disabled sentry.kafka.caching.enable 2. In Sentry configuration deleted kafka group from sentry.service.admin.group, so the configuration is: root@node1:~# cd `ls -dt /var/run/cloudera-scm-agent/process/*sentry* | head -n1`
root@node1:/var/run/cloudera-scm-agent/process/1442-sentry-SENTRY_SERVER# grep -A 1 -E "sentry.service.(allow.connect|admin.group)" sentry-site.xml
<name>sentry.service.admin.group</name>
<value>hive,impala,hue,sudo</value>
--
<name>sentry.service.allow.connect</name>
<value>hive,impala,hue,hdfs,solr,kafka</value>
root@node1:/var/run/cloudera-scm-agent/process/1442-sentry-SENTRY_SERVER# 3. Depoly client configuration and restart dependant services After this steps Kafka started properly, so turning off caching of Sentry privileges was a workaround for me to start Kafka without errors. Though I have problems when using kafka-sentry tool: root@node1:~# kinit isegrim
Password for isegrim@TEST.COM:
root@node1:~# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: isegrim@TEST.COM
Valid starting Expires Service principal
21/06/2017 15:01 22/06/2017 01:01 krbtgt/TEST.COM@TEST.COM
renew until 28/06/2017 15:01
root@node1:~# kafka-sentry --config `ls -dt /var/run/cloudera-scm-agent/process/*sentry* | head -n1` -lp -r myrole
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
[2017-06-21 15:02:16,992] ERROR Config key sentry.service.client.server.rpc-address is required (org.apache.sentry.provider.db.generic.tools.SentryShellKafka)
java.lang.NullPointerException: Config key sentry.service.client.server.rpc-address is required
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:229)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericServiceClientDefaultImpl.<init>(SentryGenericServiceClientDefaultImpl.java:123)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericServiceClientFactory.create(SentryGenericServiceClientFactory.java:31)
at org.apache.sentry.provider.db.generic.tools.SentryShellKafka.run(SentryShellKafka.java:51)
at org.apache.sentry.provider.db.tools.SentryShellCommon.executeShell(SentryShellCommon.java:241)
at org.apache.sentry.provider.db.generic.tools.SentryShellKafka.main(SentryShellKafka.java:96)
The operation failed. Message: Config key sentry.service.client.server.rpc-address is required
root@node1:~# I can't see this configuration option in CM, but I see rpc-address is configured in CDH 5.10 Sentry service configuration, but without explanation what exactly address should it be or I don't see this: https://www.cloudera.com/documentation/enterprise/5-10-x/topics/sg_sentry_service_config.html Beside that I have that address set and working: root@node1:~# grep -A 1 rpc `ls -dt /var/run/cloudera-scm-agent/process/*sentry* | head -n1`/sentry-site.xml
<name>sentry.service.server.rpc-address</name>
<value>node1</value>
--
<name>sentry.service.server.rpc-port</name>
<value>8038</value>
root@node1:~#
root@node1:~# ps -ef | grep `netstat -anpt | grep LISTEN | grep ':8038' | awk '{print $7}' | awk -F '/' '{print $1}'`
sentry 4599 2654 0 14:35 ? 00:00:20 /usr/lib/jvm/java-8-oracle/jre/bin/java -Xmx1000m -Dhadoop.log.dir=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/logs -Dhadoop.log.file=hadoop.log -Dhadoop.home.dir=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop -Dhadoop.id.str= -Dhadoop.root.logger=INFO,console -Djava.library.path=/opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/hadoop/lib/native -Dhadoop.policy.file=hadoop-policy.xml -Djava.net.preferIPv4Stack=true -Xms268435456 -Xmx268435456 -XX:OnOutOfMemoryError=/usr/lib/cmf/service/common/killparent.sh -Dhadoop.security.logger=INFO,NullAppender org.apache.hadoop.util.RunJar /opt/cloudera/parcels/CDH-5.10.0-1.cdh5.10.0.p0.41/lib/sentry/lib/sentry-core-common-1.5.1-cdh5.10.0.jar org.apache.sentry.SentryMain --command service --log4jConf /run/cloudera-scm-agent/process/1442-sentry-SENTRY_SERVER/sentry-log4j.properties -conffile /run/cloudera-scm-agent/process/1442-sentry-SENTRY_SERVER/sentry-site.xml
sentry 4616 4599 0 14:35 ? 00:00:00 python2.7 /usr/lib/cmf/agent/build/env/bin/cmf-redactor /usr/lib/cmf/service/sentry/sentry.sh
root@node1:~# One more update, when I run kafka-sentry command when I am logged as kafka user in Kerberos it gives me the same error as before disabling sentry privileges caching in Kafka: root@node2:~# cd `ls -dt /var/run/cloudera-scm-agent/process/*kafka* | head -n1`
root@node2:/var/run/cloudera-scm-agent/process/1445-kafka-KAFKA_BROKER# kinit -kt kafka.keytab kafka/node2@TEST.COM
root@node2:/var/run/cloudera-scm-agent/process/1445-kafka-KAFKA_BROKER# kafka-sentry -lp -r zto
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/06/21 17:18:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/06/21 17:18:10 ERROR tools.SentryShellKafka: Access denied to kafka. Server Stacktrace: org.apache.sentry.provider.db.SentryAccessDeniedException: Access denied to kafka
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor$10.handle(SentryGenericPolicyProcessor.java:607)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.requestHandle(SentryGenericPolicyProcessor.java:201)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.list_sentry_privileges_by_role(SentryGenericPolicyProcessor.java:599)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_privileges_by_role.getResult(SentryGenericPolicyService.java:977)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_privileges_by_role.getResult(SentryGenericPolicyService.java:962)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessorWrapper.process(SentryGenericPolicyProcessorWrapper.java:37)
at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:123)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
org.apache.sentry.provider.db.SentryAccessDeniedException: Access denied to kafka. Server Stacktrace: org.apache.sentry.provider.db.SentryAccessDeniedException: Access denied to kafka
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor$10.handle(SentryGenericPolicyProcessor.java:607)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.requestHandle(SentryGenericPolicyProcessor.java:201)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.list_sentry_privileges_by_role(SentryGenericPolicyProcessor.java:599)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_privileges_by_role.getResult(SentryGenericPolicyService.java:977)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_privileges_by_role.getResult(SentryGenericPolicyService.java:962)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessorWrapper.process(SentryGenericPolicyProcessorWrapper.java:37)
at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:123)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at org.apache.sentry.service.thrift.Status.throwIfNotOk(Status.java:113)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericServiceClientDefaultImpl.listPrivilegesByRoleName(SentryGenericServiceClientDefaultImpl.java:484)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericServiceClientDefaultImpl.listPrivilegesByRoleName(SentryGenericServiceClientDefaultImpl.java:494)
at org.apache.sentry.provider.db.generic.tools.command.ListPrivilegesByRoleCmd.execute(ListPrivilegesByRoleCmd.java:45)
at org.apache.sentry.provider.db.generic.tools.SentryShellKafka.run(SentryShellKafka.java:83)
at org.apache.sentry.provider.db.tools.SentryShellCommon.executeShell(SentryShellCommon.java:241)
at org.apache.sentry.provider.db.generic.tools.SentryShellKafka.main(SentryShellKafka.java:96)
The operation failed. Message: Access denied to kafka. Server Stacktrace: org.apache.sentry.provider.db.SentryAccessDeniedException: Access denied to kafka
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor$10.handle(SentryGenericPolicyProcessor.java:607)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.requestHandle(SentryGenericPolicyProcessor.java:201)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessor.list_sentry_privileges_by_role(SentryGenericPolicyProcessor.java:599)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_privileges_by_role.getResult(SentryGenericPolicyService.java:977)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyService$Processor$list_sentry_privileges_by_role.getResult(SentryGenericPolicyService.java:962)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.sentry.provider.db.generic.service.thrift.SentryGenericPolicyProcessorWrapper.process(SentryGenericPolicyProcessorWrapper.java:37)
at org.apache.thrift.TMultiplexedProcessor.process(TMultiplexedProcessor.java:123)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
root@node2:/var/run/cloudera-scm-agent/process/1445-kafka-KAFKA_BROKER# When I add kafka group to Sentry admin groups (sentry.service.admin.group) it looks like everything is working, but only from kerberos logged user kafka: root@node2:~# klist Ticket cache: FILE:/tmp/krb5cc_0 Default principal: kafka/node2@TEST.COM Valid starting Expires Service principal 21/06/2017 17:16 22/06/2017 03:16 krbtgt/TEST.COM@TEST.COM renew until 28/06/2017 17:16 root@node2:~# kafka-sentry -lp -r myrole
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/opt/cloudera/parcels/KAFKA-2.1.1-1.2.1.1.p0.18/lib/kafka/libs/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/06/21 17:31:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
root@node2:~#
... View more