Member since 
    
	
		
		
		09-23-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                70
            
            
                Posts
            
        
                87
            
            
                Kudos Received
            
        
                7
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 5420 | 09-20-2016 09:11 AM | |
| 4976 | 05-17-2016 11:58 AM | |
| 3263 | 04-18-2016 07:27 PM | |
| 3207 | 04-14-2016 08:25 AM | |
| 3156 | 03-24-2016 07:16 PM | 
			
    
	
		
		
		09-26-2016
	
		
		03:05 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 Controlling the environment of an application is vital for it's 
functionality and stability. Especially in a distributed environment it 
is important for developers to have control over the version of 
dependencies. In such an scenario it's a critical task to ensure 
possible conflicting requirements of multiple applications are not 
disturbing each other.  That is why frameworks like YARN
 ensure that each application is executed in a self-contained 
environment - typically in a Linux Container or Docker Container - that 
is controlled by the developer. In this post we show what this means for
 Python environments being used by Spark.    YARN Application Deployment  As mentioned earlier does YARN
 execute each application in a self-contained environment on each host. 
This ensures the execution in a controlled environment managed by 
individual developers. The way this works in a nutshell is that the 
dependency of an application are distributed to each node typically via HDFS.       This
 figure simplifies the fact that HDFS is actually being used to 
distribute the application. See HDFS distributed cache for reference.   The files are uploaded to a staging folder /user/${username}/.${application} of the submitting user in HDFS. Because of the distributed architecture of HDFS
 it is ensured that multiple nodes have local copies of the files. In 
fact to ensure that a large fraction of the cluster has a local copy of 
application files and does not need to download them over the network, 
the HDFS replication factor is set much higher for this files than 3. 
Often a number between 10 and 20 is chosen for the replication factor.  During
 the preparation of the container on a node you will notice in logs 
similar commands to the below example are being executed:  ln -sf "/hadoop/yarn/local/usercache/vagrant/filecache/72/pyspark.zip" "pyspark.zip"  The folder /hadoop/yarn/local/ is the configured location on each node where YARN
 stores it's needed files and logs locally. Creating a symbolic link 
like this inside the container makes the content of the zip file 
available. It is being referenced as "pyspark.zip".  Using Conda Env  For
 application developers this means that they can package and ship their 
controlled environment with each application. Other solutions like NFS or Amazon EFS
 shares are not needed, especially since solutions like shared folders 
makes for a bad architecture that is not designed to scale very well 
making the development of application less agile.  The following example demonstrate the use of conda env to transport a python environment with a PySpark application needed to be executed. This sample application uses the NLTK package with the additional requirement of making tokenizer and tagger resources available to the application as well.  Our sample application:  import os
import sys
from pyspark import SparkContext
from pyspark import SparkConf
conf = SparkConf()
conf.setAppName("spark-ntlk-env")
sc = SparkContext(conf=conf)
data = sc.textFile('hdfs:///user/vagrant/1970-Nixon.txt')
def word_tokenize(x):
    import nltk
    return nltk.word_tokenize(x)
def pos_tag(x):
    import nltk
    return nltk.pos_tag([x])
words = data.flatMap(word_tokenize)
words.saveAsTextFile('hdfs:///user/vagrant/nixon_tokens')
pos_word = words.map(pos_tag)
pos_word.saveAsTextFile('hdfs:///user/vagrant/nixon_token_pos')  Preparing the sample input data  For our example we are using the provided samples of NLTK (http://www.nltk.org/nltk_data/) and upload them to HDFS:  (nltk_env)$ python -m nltk.downloader -d nltk_data all
(nltk_env)$ hdfs dfs -put nltk_data/corpora/state_union/1970-Nixon.txt /user/vagrant/  No Hard (Absolute) Links!  Before
 we actually go and create our environment lets first take a quick 
moment to recap on how an environment is typically being composed. On a 
machine the environment is made out of variables linking to different 
target folders containing executable or other resource files. So if you 
execute a command it is either referenced from your PATH, PYTHON_LIBRARY, or any other defined variable. These variables link to files in directories like /usr/bin, /usr/local/bin or any other referenced location. They are called hard links or absolute reference as they start from root /.  Environments
 using hard links are not easily transportable as they make strict 
assumption about the the overall execution engine (your OS for example) 
they are being used in. Therefor it is necessary to use relative links 
in a transportable/relocatable environment.  This is especially true for conda env as it creates hard links by default. By making the conda env relocatable it can be used in a application by referencing it from the application root . (current dir) instead of the overall root /. By using the --copy options during the creation of the environment packages are being copied instead of being linked.  Creating our relocatable environment together with nltk and numpy:  conda create -n nltk_env --copy -y -q python=3 nltk numpy
Fetching package metadata .......
Solving package specifications: ..........
Package plan for installation in environment /home/datalab_user01/anaconda2/envs/nltk_env:
The following packages will be downloaded:
    package                    |            build
    ---------------------------|-----------------
    python-3.5.2               |                0        17.2 MB
    nltk-3.2.1                 |           py35_0         1.8 MB
    numpy-1.11.1               |           py35_0         6.1 MB
    setuptools-23.0.0          |           py35_0         460 KB
    wheel-0.29.0               |           py35_0          82 KB
    pip-8.1.2                  |           py35_0         1.6 MB
    ------------------------------------------------------------
                                           Total:        27.2 MB
The following NEW packages will be INSTALLED:
    mkl:        11.3.3-0      (copy)
    nltk:       3.2.1-py35_0  (copy)
    numpy:      1.11.1-py35_0 (copy)
    openssl:    1.0.2h-1      (copy)
    pip:        8.1.2-py35_0  (copy)
    python:     3.5.2-0       (copy)
    readline:   6.2-2         (copy)
    setuptools: 23.0.0-py35_0 (copy)
    sqlite:     3.13.0-0      (copy)
    tk:         8.5.18-0      (copy)
    wheel:      0.29.0-py35_0 (copy)
    xz:         5.2.2-0       (copy)
    zlib:       1.2.8-3       (copy)
#
# To activate this environment, use:
# $ source activate nltk_env
#
# To deactivate this environment, use:
# $ source deactivate
#
 
  This also works for different python version 3.x or 2.x!   Zip it and Ship it!  Now that we have our relocatable environment all set we are able to package it and ship it as part of our sample PySpark job.  $ cd ~/anaconda2/envs/
$ zip -r nltk_env.zip nltk_env  Making this available in during the execution of our application in a YARN container we have for one distribute the package and for second change the default environment of Spark for python to your location.  The variable controlling the python environment for python applications in Spark is named PYSPARK_PYTHON.  PYSPARK_PYTHON=./NLTK/nltk_env/bin/python spark-submit --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./NLTK/nltk_env/bin/python --master yarn-cluster --archives nltk_env.zip#NLTK spark_nltk_sample.py  Our virtual environment is linked by NLTK that is why the path in PYSPARK_PYTHON is pointing to ./NLTK/content/of/zip/... . The exact command being executed during container creation is something like this:  ln -sf "/hadoop/yarn/local/usercache/vagrant/filecache/71/nltk_env.zip" "NLTK"  Shipping additional resources with an application is controlled by the --files and --archives options as shown here.  The options being used here are documented in Spark Yarn Configuration and Spark Environment Variables for reference.  Packaging tokenizer and taggers  Doing just the above will unfortunately fail, because using the NLTK
 parser in the way we are using it in the example program has some 
additional dependencies. If you have followed the above steps executing 
submitting it to your YARN cluster will result in the following exception at container level:  Caused by: org.apache.spark.api.python.PythonException: Traceback (most recent call last):
...
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource 'taggers/averaged_perceptron_tagger/averaged_perceptron
  _tagger.pickle' not found.  Please use the NLTK Downloader to
  obtain the resource:  >>> nltk.download()
  Searched in:
    - '/home/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************
 at org.apache.spark.api.python.PythonRunner$anon$1.read(PythonRDD.scala:166)
 at org.apache.spark.api.python.PythonRunner$anon$1.<init>(PythonRDD.scala:207)
 at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:125)
 at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:70)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
 at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:300)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:264)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
 at org.apache.spark.scheduler.Task.run(Task.scala:88)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 ... 1 more  The problem is that NLTK expects the follwoing resource in the tokenizers/punkt/english.pickle to be available in either of the following locations:  Searched in:
    - '/home/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'  The good thing about this is, 
that we by now should now how we can ship the required dependency to our
 application. We can do it the same way we did with our python 
environment.  Again it is imporant to reensure yourself how the resource is going to be referenced. NLTK expects it by default in the current location under tokenizers/punkt/english.pickle that is why we navigate into the folder for packaging and reference the zip file wiht tokenizer.zip#tokenizer.  (nltk_env)$ cd nltk_data/tokenizers/
(nltk_env)$ zip -r ../../tokenizers.zip *
(nltk_env)$ cd ../../
(nltk_env)$ cd nltk_data/taggers/
(nltk_env)$ zip -r ../../taggers.zip *
(nltk_env)$ cd ../../  At a later point our program will expect a tagger in the same fashion already demonstrated in the above snippet.  Using YARN Locations  We can ship those zip resources the same way we shipped our conda env. In addition environment variable can be used to control resource discovery and allocation.  For NLTK you can use the environment variable NLTK_DATA to control the path. Setting this in Spark can be done similar to the way we set PYSPARK_PYTHON:  --conf spark.yarn.appMasterEnv.NLTK_DATA=./  Additionally YARN exposes the container path via the environment variable PWD. This can be used in NLTK as to add it to the search path as follows:  def word_tokenize(x):
    import nltk
    nltk.data.path.append(os.environ.get('PWD'))
    return nltk.word_tokenize(x)  The submission of your application becomes now:  PYSPARK_PYTHON=./NLTK/nltk_env/bin/python spark-submit --conf spark.yarn.appMasterEnv.PYSPARK_PYTHON=./NLTK/nltk_env/bin/python --conf spark.yarn.appMasterEnv.NLTK_DATA=./ --master yarn-cluster --archives nltk_env.zip#NLTK,tokenizers.zip#tokenizers,taggers.zip#taggers spark_nltk_sample.py  The expected result should be something like the following:  (nltk_env)$ hdfs dfs -cat /user/datalab_user01/nixon_tokens/* | head -n 20
Annual
Message
to
the
Congress
on
the
State
of
the
Union
.
January
22
,
1970
Mr.
Speaker
,
Mr.  And:  (nltk_env)$ hdfs dfs -cat /user/datalab_user01/nixon_token_pos/* | head -n 20
[(u'Annual', 'JJ')]
[(u'Message', 'NN')]
[(u'to', 'TO')]
[(u'the', 'DT')]
[(u'Congress', 'NNP')]
[(u'on', 'IN')]
[(u'the', 'DT')]
[(u'State', 'NNP')]
[(u'of', 'IN')]
[(u'the', 'DT')]
[(u'Union', 'NN')]
[(u'.', '.')]
[(u'January', 'NNP')]
[(u'22', 'CD')]
[(u',', ',')]
[(u'1970', 'CD')]
[(u'Mr.', 'NNP')]
[(u'Speaker', 'NN')]
[(u',', ',')]
[(u'Mr.', 'NNP')]  Further Readings   PySpark Internals
  Spark NLTK Example
  NLTK Data
  Virtualenv in Hadoop Streaming
  Conda Intro
  Conda Env with Spark
  Python Env support in Spark (SPARK-13587)   Post was first published here: http://henning.kropponline.de/2016/09/24/running-pyspark-with-conda-env/ 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		09-20-2016
	
		
		09:52 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Well, you would have the login, but not the kerberos init. You would still have two realms with user credentials the KRB5 realm and the LDAP realm depending on your setup.  Actually the KRB5 realm can be included inside LDAP or put differently Kerberos can be configured to use LDAP as it's user DB, that would give you the possibility to combine both. This essential is what FreeIPA is. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-20-2016
	
		
		09:49 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Yes with pam_ldap integration: http://www.tldp.org/HOWTO/archived/LDAP-Implementation-HOWTO/pamnss.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-20-2016
	
		
		09:11 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 A list of recommended tools are:  
 SSSD https://fedorahosted.org/sssd/ / https://help.ubuntu.com/lts/serverguide/sssd-ad.html  FreeIPA (introduces additional AD and need to establish Trust between the two) https://www.freeipa.org/page/Main_Page  Winutils  Centrify (commercial) https://www.centrify.com/  VAS / Quest (commercial) https://software.dell.com/products/authentication-services/  ....   Please check the material of this workshop for reference:   https://community.hortonworks.com/articles/1143/cheatsheet-on-configuring-authentication-authoriza.html
https://community.hortonworks.com/repos/4465/workshops-on-how-to-setup-security-on-hadoop-using.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-19-2016
	
		
		02:29 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 This does not sound like a good idea. Edge nodes by definition typically just hold client programs no services like datanode or nodemanager. YARN would manage the resource allocation based on data and utilization of the nodes, that is why it often also is not a good idea to run nodemanagers without datanodes on one machine.  Concerning your "But can i .. bring the data 
to the edge .. run the task if all other nodes are busy?" YARN does the resource negotiation and scheduling for distributed frameworks like MR and Spark. I would advise to not do this manually but let YARN do this for you.  I hope this helps?  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-15-2016
	
		
		05:43 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Currently Spark does not support the deployment to YARN from a SparkContext. Use spark-submit instead. For unit testing it is recommended to use [local] runner.  The problem is that you can not set the Hadoop conf from outside the SparkContext, it is received from *-site.xml config under HADOOP_HOME during the spark-submit. So you can not point to your remote cluster in Eclipse unless you setup the correct *-site.conf on your laptop and use spark-submit.  SparkSubmit is available as a Java class, but I doubt that 
you will achieve what your are looking for with it. But you would be able to 
launch a spark job from Eclipse to a remote cluster, if this is sufficient for you. Have a look at the Oozie Spark launcher as an example.  SparkContext is dramatically changing in Spark 2 in favor I think of SparkClient to support multiple SparkContexts. I am not sure what the situation is with that. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-15-2016
	
		
		11:48 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 @Smart Solutions could you please check if this article is of any help for you:   https://community.hortonworks.com/content/kbentry/56704/secure-kafka-java-producer-with-kerberos.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-15-2016
	
		
		11:33 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		6 Kudos
		
	
				
		
	
		
					
							 The most recent release of Kafka 0.9 with it's comprehensive security
 implementation has reached an important milestone. In his blog post Kafka Security 101 Ismael from Confluent describes the security features part of the release very well.  As a part II of the here published post about Kafka Security with Kerberos
 this post discussed a sample implementation of a Java Kafka producer 
with authentication. It is part of a mini series of posts discussing secure HDP clients, connecting services to a secured cluster, and kerberizing the HDP Sandbox (Download HDP Sandbox). In this effort at the end of this post we will also create a Kafka Servlet to publish messages to a secured broker.  Kafka provides SSL and Kerberos authentication. Only Kerberos is discussed here.    Kafka
 from now on supports four different communication protocols between 
Consumers, Producers, and Brokers. Each protocol considers different 
security aspects, while PLAINTEXT is the old insecure communication 
protocol.   PLAINTEXT (non-authenticated, non-encrypted)  SSL (SSL authentication, encrypted)  PLAINTEXT+SASL (authentication, non-encrypted)  SSL+SASL (encrypted authentication, encrypted transport)   A
 Kafka client needs to be configured to use the protocol of the 
corresponding broker. This tells the client to use authentication for 
communication with the broker:  Properties props = new Properties();
props.put("security.protocol", "PLAINTEXTSASL");  Making use of Kerberos authentication in Java is provided by the Java Authentication and Authorization Service (JAAS)
 which is a pluggable authentication method similar to PAM supporting 
multiple authentication methods. In this case the authentication method 
being used is GSS-API for Kerberos.  Demo Setup  For
 JAAS a proper configuration of GSS would be needed in addition to being
 in possession of proper credentials, obviously. Some credentials can be
 created with MIT Kerberos like this:  (as root)
$ kadmin.local -q "addprinc -pw hadoop kafka-user" 
$ kadmin.local -q "xst -k /home/kafka-user/kafka-user.keytab kafka-user@MYCORP.NET"
(Creating a keytab will make the existing password invalid. To change your password back to hadoop use as root:)
$ kadmin.local -q "cpw -pw hadoop hdfs-user"  The last line is 
not necessarily needed as it creates us a so called keytab - basically 
an encrypted password of the user - that can be used for password less 
authentication for example for automated services. We will make use of 
that here as well.  First we need to prepare a test topic to publish messages with proper privileges for our kafka-user:  # Become Kafka admin
$ kinit -kt /etc/security/keytabs/kafka.service.keytab kafka/one.hdp@MYCORP.NET
# Set privileges for kafka-user
$ /usr/hdp/current/kafka-broker/bin/kafka-acls.sh --add --allow-principals user:kafka-user --operation ALL --topic test --authorizer-properties zookeeper.connect=one.hdp:2181
Adding following acls for resource: Topic:test 
  user:kafka-user has Allow permission for operations: All from hosts: * 
Following is list of acls for resource: Topic:test 
  user:kafka-user has Allow permission for operations: All from hosts: *  As a sample producer we will use this:  package hdp.sample;
import java.util.Date;
import java.util.Properties;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
public class KafkaProducer {
    public static void main(String... args) {
        String topic = args[1];
        Properties props = new Properties();
        props.put("metadata.broker.list", args[0]);
        props.put("serializer.class", "kafka.serializer.StringEncoder");
        props.put("request.required.acks", "1");
        props.put("security.protocol", "PLAINTEXTSASL");
        ProducerConfig config = new ProducerConfig(props);
        Producer producer = new Producer<String, String>(config);
        for (int i = 0; i < 10; i++){
            producer.send(new KeyedMessage<String, String>(topic, "Test Date: " + new Date()));
        }
    }
}  With this setup we can go ahead demonstrating two ways to use a
 JAAS context to authenticate with the Kafka broker. At first we will 
configure a context to use the existing privileges possessed by the 
executing user. Next we use a so called keytab to demonstrate a 
password-less login for automated producer processes. At last we will 
look at a Servlet implementation provided here.  Authentication with User Login  To
 configure a JAAS config with userKeyTab set to false and useTicketCache
 to true, so that the privileges of the current users are being used.  KafkaClient {
 com.sun.security.auth.module.Krb5LoginModule required
 useKeyTab=false
 useTicketCache=true
 serviceName="kafka";
};  We store this in a file under /home/kafka-user/kafka-jaas.conf and exeute the broker like this:  # list current user context
$ klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: kafka-user@MYCORP.NET
Valid starting       Expires              Service principal
21.02.2016 16:13:13  22.02.2016 16:13:13  krbtgt/MYCORP.NET@MYCORP.NET
# execute java producer
$ java -Djava.security.auth.login.config=/home/kafka-user/kafka-jaas.conf -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=false -cp hdp-kafka-sample-1.0-SNAPSHOT.jar:/usr/hdp/current/kafka-broker/libs/* hdp.sample.KafkaProducer one.hdp:6667 test
# consume sample messages for test
$ /usr/hdp/current/kafka-broker/bin/kafka-simple-consumer-shell.sh --broker-list one.hdp:6667 --topic test --security-protocol PLAINTEXTSASL --partition 0
{metadata.broker.list=one.hdp:6667, request.timeout.ms=1000, client.id=SimpleConsumerShell, security.protocol=PLAINTEXTSASL}
Test Date: Sun Feb 21 16:12:05 UTC 2016
Test Date: Sun Feb 21 16:12:06 UTC 2016
Test Date: Sun Feb 21 16:12:06 UTC 2016
Test Date: Sun Feb 21 16:12:06 UTC 2016
Test Date: Sun Feb 21 16:12:06 UTC 2016
Test Date: Sun Feb 21 16:12:06 UTC 2016
Test Date: Sun Feb 21 16:12:06 UTC 2016
Test Date: Sun Feb 21 16:12:06 UTC 2016
Test Date: Sun Feb 21 16:12:06 UTC 2016
Test Date: Sun Feb 21 16:12:06 UTC 2016  Using Keytab to Login  Next
 we will configure the JAAS context to use a generated keytab file 
instead of the security context of the executing user. Before we can do 
this we need to create the keytab storing it also under /home/kafka-user/kafka-user.keytab.  $ kadmin.local -q "xst -k /home/kafka-user/kafka-user.keytab kafka-user@MYCORP.NET"
Authenticating as principal kafka-user/admin@MYCORP.NET with password.
Entry for principal kafka-user@MYCORP.NET with kvno 2, encryption type aes256-cts-hmac-sha1-96 added to keytab WRFILE:/home/kafka-user/kafka-user.keytab.
Entry for principal kafka-user@MYCORP.NET with kvno 2, encryption type aes128-cts-hmac-sha1-96 added to keytab WRFILE:/home/kafka-user/kafka-user.keytab.
Entry for principal kafka-user@MYCORP.NET with kvno 2, encryption type des3-cbc-sha1 added to keytab WRFILE:/home/kafka-user/kafka-user.keytab.
Entry for principal kafka-user@MYCORP.NET with kvno 2, encryption type arcfour-hmac added to keytab WRFILE:/home/kafka-user/kafka-user.keytab.
Entry for principal kafka-user@MYCORP.NET with kvno 2, encryption type des-hmac-sha1 added to keytab WRFILE:/home/kafka-user/kafka-user.keytab.
Entry for principal kafka-user@MYCORP.NET with kvno 2, encryption type des-cbc-md5 added to keytab WRFILE:/home/kafka-user/kafka-user.keytab.
$ chown kafka-user. /home/kafka-user/kafka-user.keytab  The JAAS configuration can now be changed to look like this:  KafkaClient {
 com.sun.security.auth.module.Krb5LoginModule required
 doNotPrompt=true
 useTicketCache=true
 principal="kafka-user@MYCORP.NET"
 useKeyTab=true
 serviceName="kafka"
 keyTab="/home/kafka-user/kafka-user.keytab"
 client=true;
};  This will use the keytab stored under 
/home/kafka-user/kafka-user.keytab while the user executing the producer
 must not be logged in to any security controller:  $ klist
klist: Credentials cache file '/tmp/krb5cc_0' not found
$ java -Djava.security.auth.login.config=/home/kafka-user/kafka-jaas.conf -Djava.security.krb5.conf=/etc/krb5.conf -Djavax.security.auth.useSubjectCredsOnly=true -cp hdp-kafka-sample-1.0-SNAPSHOT.jar:/usr/hdp/current/kafka-broker/libs/* hdp.sample.KafkaProducer one.hdp:6667 test  Kafka Producer Servlet  In a last example we will add a Kafka Servlet to the hdp-web-sample project previously described in this post. Our Servlet will get the topic and message as a GET parameter. The Servlet looks as follwoing:  package hdp.webapp;
import java.io.IOException;
import java.io.PrintWriter;
import java.util.Properties;
import javax.servlet.Servlet;
import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
import kafka.javaapi.producer.Producer;
import kafka.producer.KeyedMessage;
import kafka.producer.ProducerConfig;
public class KafkaServlet extends HttpServlet implements Servlet {
    protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
        String topic = request.getParameter("topic");
        String msg = request.getParameter("msg");
        Properties props = new Properties();
        props.put("metadata.broker.list", "one.hdp:6667");
        props.put("serializer.class", "kafka.serializer.StringEncoder");
        props.put("request.required.acks", "1");
        props.put("security.protocol", "PLAINTEXTSASL");
        ProducerConfig config = new ProducerConfig(props);
        Producer producer = new Producer<String, String>(config);
        producer.send(new KeyedMessage<String, String>(topic, msg));
        PrintWriter out = response.getWriter();
        out.println("<html>");
        out.println("<head><title>Write to topic: "+ topic +"</title></head>");
        out.println("<body><h1>/"+ msg +"</h1>");
        out.println("</html>");
        out.close();
    }
}  Again we are changing the JAAS config of the Tomcat service to be able to make use of the previously generated keytab. The jaas.conf of Tomcat will contain now this:  KafkaClient {
 com.sun.security.auth.module.Krb5LoginModule required
 doNotPrompt=true
 useTicketCache=true
 principal="kafka-user@MYCORP.NET"
 useKeyTab=true
 serviceName="kafka"
 keyTab="/home/kafka-user/kafka-user.keytab"
 client=true;
};
com.sun.security.jgss.krb5.initiate {
    com.sun.security.auth.module.Krb5LoginModule required
    doNotPrompt=true
    principal="tomcat/one.hdp@MYCORP.NET"
    useKeyTab=true
    keyTab="/etc/tomcat/tomcat.keytab"
    storeKey=true;
};  After deploying the web app and restarting tomcat with this 
newly adapted JAAS config you should be able to publish message to a 
secured broker be triggering the following GET address from a browser http://one.hdp:8099/hdp-web/kafka?topic=test&msg=Test1 . The response should be a 200 OK like this:      You might be having some issues and in particular seeing this Exception:  SEVERE: Servlet.service() for servlet [KafkaServlet] in context with path [/hdp-web] threw exception [Servlet execution threw an exception] with root cause
javax.security.auth.login.LoginException: Unable to obtain password from user
 at com.sun.security.auth.module.Krb5LoginModule.promptForPass(Krb5LoginModule.java:897)
 at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:760)
 at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:497)
 at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755)
 at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195)
 at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682)
 at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680)
 at javax.security.auth.login.LoginContext.login(LoginContext.java:587)
 at org.apache.kafka.common.security.kerberos.Login.login(Login.java:298)
 at org.apache.kafka.common.security.kerberos.Login.<init>(Login.java:104)
 at kafka.common.security.LoginManager$.init(LoginManager.scala:36)
 at kafka.producer.Producer.<init>(Producer.scala:50)
 at kafka.producer.Producer.<init>(Producer.scala:73)
 at kafka.javaapi.producer.Producer.<init>(Producer.scala:26)
 at hdp.webapp.KafkaServlet.doGet(KafkaServlet.java:33)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:620)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
 at org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
 at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
 at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
 at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
 at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
 at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
 at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171)
 at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950)
 at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
 at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
 at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1040)
 at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
 at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:314)
 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
 at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
 at java.lang.Thread.run(Thread.java:745)  If are seeing the message javax.security.auth.login.LoginException: Unable to obtain password from user
 it likely refers to your keytab file, as being the users password. So 
make sure that the tomcat user is able to read that file stored under /home/kafka-user/kafka-user.keytab for example.  Further Readings  Kafka Security 101
  Kafka Security
  Kafka Sasl/Kerberos and SSL Implementation
  Oracle Doc: JAAS Authentication
  Krb5LoginModule
  Flume with kerberized KafkaJAAS Login Configuration File  This article was first published under: http://henning.kropponline.de/2016/02/21/secure-kafka-java-producer-with-kerberos/     
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
	
					
			
		
	
	
	
	
				
		
	
	
			
    
	
		
		
		05-17-2016
	
		
		11:58 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 Hi @sarfarazkhan pathan,  if I am not mistaken this is just a warning. You typically can install the cluster with already installed users from previous installs. In some cases this obviously can cause some issues, but in general it doesn't. Also if you do add and remove users frequently from multiple installs you are increasing the uid count of your system.   So it should be save to ignore the warning. You can read here about cleaning up nodes from previous installs:   http://henning.kropponline.de/2016/04/24/uninstalling-and-cleaning-a-hdp-node/  EDIT:  BTW to be safe restart of the ambari agent and simply re-running the checks should be enough. No restart of node or similar should be required.  Regards 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		04-22-2016
	
		
		07:49 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 To me it looks like in the Ranger admin config your   ambari_ranger_admin  and   ambari_ranger_password  are empty. Could you please check if they exists and are not blank?  Please note, that the ambari_ranger_admin is different from the Ranger admin_user. But both are defined in the Ranger configuration in Ambari. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
         
					
				













