About smartninja723

smartninja723 · ‎09-15-2016

Guys, We have setup Kerberized cluster (HDP 2.4.x) and have setup Kafka Broker(0.9.x) with SASL (kerberization). What are the steps required to connect third party tool (producers/publishers) to connect to Kafka? Going through the link : https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_secure-kafka-ambari/content/ch_secure-kafka-config-options.html What I understand is : this tool needs access to JAAS.conf file. For now I've copied the /usr/hdp/current/kafka-broker/config/kafka_client_jaas.conf and shared with the third party tool and kept on the classpath. Do we need anything else also in place? Regards, SS

smartninja723 · ‎09-08-2016

Thank you.

smartninja723 · ‎09-08-2016

Thank you for the nice cheat sheet. I configured according to the cheat sheet above on secure HDP 2.4.2 + Ambari 2.2 cluster. I could send the messages messages using console producer. <Broker_home>/bin/kafka-console-producer.sh --broker-list <KAFKA_BROKER>:6667 --topic test --security-protocol SASL_PLAINTEXT When I am trying to consume the messages (on the same machine) I get error. Starting like this : /usr/hdp/current/kafka-broker/bin/kafka-console-consumer.sh --zookeeper ZK1:2181,ZK-2:2181,ZK-3:2181 --topic test --from-beginning --security-protocol SASL_PLAINTEXT Error stack is : [2016-09-08 13:59:26,167] WARN [console-consumer-39119_HOST_NAME-1473343165849-9f1b8f0d-leader-finder-thread], Failed to find leader for Set([test,0], [test,1]) (kafka.consumer.ConsumerFetcherManager$LeaderFinderThread) kafka.common.BrokerEndPointNotAvailableException: End point PLAINTEXT not found for broker 0 at kafka.cluster.Broker.getBrokerEndPoint(Broker.scala:141) at kafka.utils.ZkUtils$$anonfun$getAllBrokerEndPointsForChannel$1.apply(ZkUtils.scala:180) at kafka.utils.ZkUtils$$anonfun$getAllBrokerEndPointsForChannel$1.apply(ZkUtils.scala:180) What do you think has gone wrong? Regards, SS PS : Do we have another WIKI page for Best practices around Kafka? @Sriharsha Chintalapani, @Andrew Grande, @Vadim Vaks , @Predrag Minovic, @

smartninja723 · ‎09-07-2016

Hi there, My question is quite similar to the https://community.hortonworks.com/questions/40457/kafka-producer-giving-error-when-running-from-a-di.html and https://community.hortonworks.com/questions/23775/unable-to-produce-message.html But this fails at the very first step : while trying to sending messages to Kafka topic I am using the below commands to start the producer: bin/kafka-console-producer.sh --broker-list <HOST_FQDN>:6667 --topic test This FQDN is copied from the $ hostname -f , also verified, the <HOST_FQDN> is the one matching in <KAFKA_BROKER_HOME>/config/server.properites advertised.listeners=PLAINTEXTSASL://<HOST_FQDN_SAME_AS_HOSTNAME_F>:6667 Now when I start the server I see : bin/kafka-console-producer.sh --broker-list <BROKER_FQDN>:6667 --topic test Note : test is a valid topic created previously. HI [2016-09-07 18:10:15,713] WARN Fetching topic metadata with correlation id 0 for topics [Set(test)] from broker [BrokerEndPoint(0,<BROKER_FQDN>,6667)] failed (kafka.client.ClientUtils$) java.io.EOFException at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:83) at kafka.network.BlockingChannel.readCompletely(BlockingChannel.scala:140) at kafka.network.BlockingChannel.receive(BlockingChannel.scala:131) at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:79) at kafka.producer.SyncProducer.kafka$producer$SyncProducer$doSend(SyncProducer.scala:76) at kafka.producer.SyncProducer.send(SyncProducer.scala:121) at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:59) at kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82) at kafka.producer.async.DefaultEventHandler$anonfun$handle$1.apply$mcV$sp(DefaultEventHandler.scala:68) at kafka.utils.CoreUtils$.swallow(CoreUtils.scala:79) at kafka.utils.Logging$class.swallowError(Logging.scala:106) at kafka.utils.CoreUtils$.swallowError(CoreUtils.scala:51) at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:68) at kafka.producer.async.ProducerSendThread.tryToHandle(ProducerSendThread.scala:105) at kafka.producer.async.ProducerSendThread$anonfun$processEvents$3.apply(ProducerSendThread.scala:88) at kafka.producer.async.ProducerSendThread$anonfun$processEvents$3.apply(ProducerSendThread.scala:68) at scala.collection.immutable.Stream.foreach(Stream.scala:547) at kafka.producer.async.ProducerSendThread.processEvents(ProducerSendThread.scala:67) at kafka.producer.async.ProducerSendThread.run(ProducerSendThread.scala:45) [2016-09-07 18:10:15,716] ERROR fetching topic metadata for topics [Set(test)] from broker [ArrayBuffer(BrokerEndPoint(0,<BROKER_FQDN>,6667))] failed (kafka.utils.CoreUtils$) Also checked kafka.out and server.log files, does not show any errors/exceptions It would be really helpful if someone can help me understand missing bit. Thanks, SS

smartninja723 · ‎08-23-2016

Guys, I have a few questions related to Spark cache and would like to know your inputs on the same. 1) How much cache memory can available to each of the executor nodes? Is there a way to control it? 2) We want to restrict the developers from persisting any data to the disk. Is there any configuration can we change to disable non -memory caching? This is to make sure by mistake, any secure data is not spilled to the disk. 3) If point#2 cannot be achieved, is there a way to make sure that spillage (In case developers use Memory_And_Disk option) happens only to a secure directory and data is encrypted? 4) For streaming data, processing with Spark how secure is it, can encryption be applied to data in flight? 5) If the developers decide to cache steaming RDDs, how secure is it? And same case point#2 above. Thanks, SS

smartninja723 · ‎08-18-2016

Guys, I was going through articles for Spark ML, found references that suggests to have netlib-java for setting up Spark-MLlib if we plan to run ML applications in Java/Scala. Another posts/article suggests to install Anaconda libraries for using Spark with Python. I ran simple programs and used Spark SQL without Anaconda, was wondering do we really need Anaconda packages for Spark Python for MLlib usage? It would be great if someone could kindly comment on the netlib-java and Anacoda dependencies with respect to Spark and Spark MLlib use cases. Thanks, SS

smartninja723 · ‎08-17-2016

Thanks @Michael Young. By any chance do you know the time line for security components integration or when is it in road map? BTW : I was checking tech preview of HDP 2.5. and I heard its due some time in late Aug or early Sept. Want to know if we have list of features/fixes coming in HDP 2.5 for Zeppelin? Thanks again. SS

smartninja723 · ‎08-17-2016

Hi guys, We are planning to setup Zeppelin for interactive usage with Spark. I see that we can configure it as Ambari Service ( described on http://hortonworks.com/hadoop-tutorial/apache-zeppelin). However, I wonder is the integration mature is enough to be used in production which has Kerberized environment, AD integrated with all other HDP components in place? Or still this integration is in tech preview (as mentioned in the Hortonworks blog)? Thanks, SS

smartninja723 · ‎08-16-2016

Hi all, We have HDP 2.4.2 cluster configured with Spark. I did run smoke tests (spark PI, shell, Spark SQL) for various components. I am looking forward to a few smoke tests to prove that spark has been configured with ML libraries. Moreover, how to make sure that Spark ML configurations are optimized? I was planning to run a couple of samples from https://spark.apache.org/docs/1.6.1/mllib-guide.html to make sure ML libs are configured. Is that enough? Thanks, SS

smartninja723 · ‎08-15-2016

Thanks fro getting back . @Alex Miller Here is the connect using Curl to connect the Knox server: curl -i -k -u admin:P@ssword 'https://<Knox_SERVER_Hostname>:<KNOX_PORT>/gateway/default/templeton/v1/status' RHEL : Oracle Linux Server release 6.7 Curl Version : 7.19.7 JDK : openjdk version "1.8.0_71" OpenJDK Runtime Environment (build 1.8.0_71-b15)

Online	Offline
Last Visited	‎08-14-2019 10:39 AM

Member Since	‎02-24-2016 02:02 PM
Last Visited	‎08-14-2019 10:39 AM
Posts	175
Kudos received	56

Cloudera Community

Re: HDPCA Practice Exam VM not able to connect

Re: Can we not have HS2 and Spark Thrift Server (S...

Re: Weird error while converting RDD[CaseClass] to...

Connecting third party tool to Secure Kafka cluste...

Re: Unable to push any messages to Kafka

Re: Kafka SSL + KERBEROS - Cheat Sheet Settings/Co...

Unable to push any messages to Kafka

Questions Around Spark Cache/spillage to the disk

netlib-java and Anaconda for Spark ML?

Re: Is Zeppelin in HDP mature as service for Produ...

Is Zeppelin in HDP mature as service for Productio...

Spark ML smoke test?

Re: Knox starts but fails to handshake no cipher s...