Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Secure Webhdfs in Hadoop Hortonworks Cluster

avatar
Contributor

Dear community

 

I have installed a hadoop cluster on 8 servers using Ambari Hortonworks.

I am able to access webhdfs using the ip address and the default port 50070 without authentication.

 

How can I secure Webhdfs?

 

P.S I did not enable using kerberos in Ambari > Enable kerberos , should I do it?

 

Any suggestion will be appreciated

Thanks

Asma

1 ACCEPTED SOLUTION

avatar
Master Mentor

@asmarz 

Good to know that your original issue is resolved.  However for any subsequent slightly different issue it is always better to open a new Community Thread that way the readers of this thread can easily find out One Error/Issue with one Solution.    Multiple issues in a single thread can cause readers to get confused.

.

If your question is answered then, Please make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.

 

View solution in original post

10 REPLIES 10

avatar
Master Mentor

@asmarz 

Please refer to the following doc in order to know how you can enable SPNEGO authentication. Once you have enabled Kerberos for your cluster after that you can also enable the SPNEGO authentication. The following doc explains how to configure HTTP authentication for Hadoop components in a Kerberos environment.

 

By default, access to the HTTP-based services and UIs for the cluster are not configured to require authentication.  

1. 
https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/authentication-with-kerberos/content/authe_spn...

2. https://docs.cloudera.com/HDPDocuments/Ambari-2.7.5.0/managing-and-monitoring-ambari/content/amb_sta...

avatar
Contributor

thak you for your help

I tried to restart th ambari server but in vain .

I got this error


2020-01-30 18:20:21,866 INFO [main] KerberosChecker:64 - Checking Ambari Server Kerberos credentials.
2020-01-30 18:20:22,052 ERROR [main] KerberosChecker:120 - Client not found in Kerberos database (6)
2020-01-30 18:20:22,052 ERROR [main] AmbariServer:1119 - Failed to run the Ambari Server
org.apache.ambari.server.AmbariException: Ambari Server Kerberos credentials check failed.
Check KDC availability and JAAS configuration in /etc/ambari-server/conf/krb5JAASLogin.conf
at org.apache.ambari.server.controller.utilities.KerberosChecker.checkJaasConfiguration(KerberosChecker.java:121)
at org.apache.ambari.server.controller.AmbariServer.main(AmbariServer.java:1110)

 

ht JAASLogin is configured like this

 

com.sun.security.jgss.krb5.initiate {
com.sun.security.auth.module.Krb5LoginModule required
renewTGT=false
doNotPrompt=true
useKeyTab=true
keyTab="/etc/security/ambariservername.keytab"
principal="ambariservername@REALM.COM"
storeKey=true
useTicketCache=false;
};

 

I tried to follow  these links 

https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/authentication-with-kerberos/content/kerberos_...

https://docs.cloudera.com/HDPDocuments/HDP3/HDP-3.1.5/authentication-with-kerberos/content/set_up_ke...

 

Any suggestion please?

😞

Thanks

avatar
Master Mentor

@asmarz 

As we see the error like:

 Failed to run the Ambari Server
org.apache.ambari.server.AmbariException: Ambari Server Kerberos credentials check failed.

Check KDC availability and JAAS configuration in /etc/ambari-server/conf/krb5JAASLogin.conf

.

1. So can you please let us know how did you enable Kerberos for Ambari Server ?   or manually?

2. Do you have ambari-agent installed on the ambari server host?  and Do you have the Kerberos clients installed on the ambari server host?

# yum info krb5-libs 
# yum info krb5-workstation


3. Do you have the correct KDC/AD address defined inside the file :

# ps -ef | grep AmbariServer | grep --color krb5.conf

# cat /etc/krb5.conf

.

4. Are you able to do "kinit" to get a valid kerberos ticket using the same detail mentioned in the file "/etc/ambari-server/conf/krb5JAASLogin.conf"

# kinit -kt /etc/security/ambariservername.keytab ambariservername@REALM.COM
# klist


.

 

 

 

 

avatar
Contributor

Thanks a lot 🙂

 

I have configured the Cluster with Kerberos using Active Directory

but i got some issues when connecting 

 

[root@server keytabs]# hdfs dfs -ls /
20/01/31 16:31:19 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
ls: DestHost:destPort namenode:8020 , LocalHost:localPort ambari/ip:0. Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]

 

Any idea please?

looks like  the 8020 ports is also blocked

 

Thanks

Asma

avatar
Master Mentor

@asmarz 

In order to clarify the port access, From Ambari host please check if the NameNode port and address is accessible?

# nc -v $ACTIVE_NAMENODE_FQDN 8020
(OR)
# telnet $ACTIVE_NAMENODE_FQDN 8020


The error which you posted usually indicates that before running the mentioned HDFS command you did not get a Valid kerberos ticket using "kinit" command.

20/01/31 16:31:19 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]

.
Most Possible Cause of above WARNING:

If the port is accessible then please check if you are able to run the same hdfs command after getting a valid kerberos ticket.

# klist -kte /etc/security/ambariservername.keytab
# kinit -kt /etc/security/ambariservername.keytab ambariservername@REALM.COM
# klist
# export HADOOP_ROOT_LOGGER=DEBUG,console
# hdfs dfs -ls /

.


And then try the same command using the "hdfs" headless keytab

# kdestroy
# klist -kte /etc/security/keytabs/hdfs.headless.keytab
# kinit -kt /etc/security/keytabs/hdfs.headless.keytab hdfs-ker1latest@EXAMPLE.COM
# klist
# export HADOOP_ROOT_LOGGER=DEBUG,console
# hdfs dfs -ls /

*NOTE:* the "hdfs-ker1latest@EXAMPLE.COM" principal name may be different in your case so replace it with your own hdfs keytab principle

Please share the output of the above commands.
Also verify if all your cluster nodes has correct FQDN.

 

.

.

avatar
Contributor

Thanks a lot

 

Now the problem for hdfs is fixed  however when i try to launch a script from an edge node , i am getting the same issue

/usr/hdp/3.1.4.0-315/spark2/bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://edgenode.servername:7077 --num-executors 4 --driver-memory 512m --executor-memory 512m --executor-cores 1 /usr/hdp/3.1.4.0-315/spark2/examples/jars/spark-examples_2.11-2.3.2.3.1.4.0-315.jar

 

Results :

20/02/03 15:13:41 INFO StandaloneAppClient$ClientEndpoint: Executor updated: app-20200203151341-0000/79 is now RUNNING
20/02/03 15:13:41 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@69cac930{/metrics/json,null,AVAILABLE,@Spark}
20/02/03 15:13:42 WARN Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
20/02/03 15:13:42 ERROR SparkContext: Error initializing SparkContext.
java.io.IOException: DestHost:destPort namenode.servername:8020 , LocalHost:localPort edgenodeaddress:0. Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:806)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1502)
at org.apache.hadoop.ipc.Client.call(Client.java:1444)
at org.apache.hadoop.ipc.Client.call(Client.java:1354)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
at com.sun.proxy.$Proxy11.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:900)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy12.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1660)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1577)
at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1574)
at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1589)
at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:100)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:522)
at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2498)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:934)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:925)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:925)
at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:31)
at org.apache.spark.examples.SparkPi.main(SparkPi.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:758)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:721)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:814)
at org.apache.hadoop.ipc.Client$Connection.access$3600(Client.java:411)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1559)
at org.apache.hadoop.ipc.Client.call(Client.java:1390)

avatar
Contributor

Actually, for more details:

In my ambari server machine I have this ticket:

[root@ambariserver ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: spark-analytics_hadoop@REALM.COM

Valid starting Expires Service principal
02/03/2020 13:31:21 02/03/2020 23:31:21 krbtgt/REALM.COM@REALM.COM
renew until 02/10/2020 13:31:21

 

When i connect with spark user :

HADOOP_ROOT_LOGGER=DEBUG,console /usr/hdp/3.1.4.0-315/spark2/bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://Edgenode:7077 --num-executors 4 --driver-memory 512m --executor-memory 512m --executor-cores 1 /usr/hdp/3.1.4.0-315/spark2/examples/jars/spark-examples_2.11-2.3.2.3.1.4.0-315.jar

 

=> OK

 

Now if I connect from the Edge Node

[root@EdgeNode~]# klist
Ticket cache: FILE:/tmp/krb5cc_0
Default principal: spark/EdgeNode@REALM.COM

Valid starting Expires Service principal
02/03/2020 16:52:12 02/04/2020 02:52:12 krbtgt/REALM.COM@REALM.COM
renew until 02/10/2020 16:52:12

But when I connect with user spark

 

HADOOP_ROOT_LOGGER=DEBUG,console /usr/hdp/3.1.4.0-315/spark2/bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://Edgenode:7077 --num-executors 4 --driver-memory 512m --executor-memory 512m --executor-cores 1 /usr/hdp/3.1.4.0-315/spark2/examples/jars/spark-examples_2.11-2.3.2.3.1.4.0-315.jar

 

=> I got error :

 

20/02/03 17:53:01 INFO ContextHandler: Started o.s.j.s.ServletContextHandler@69cac930{/metrics/json,null,AVAILABLE,@Spark}
20/02/03 17:53:01 WARN Client: Exception encountered while connecting to the server : org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
20/02/03 17:53:01 ERROR SparkContext: Error initializing SparkContext.
java.io.IOException: DestHost:destPort NameNode:8020 , LocalHost:localPort EdgeNode/10.48.142.32:0. Failed on local exception: java.io.IOException: org.apache.hadoop.security.AccessControlException: Client cannot authenticate via:[TOKEN, KERBEROS]
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:4

 

 

Did I miss something please?

 

Users from their laptop launch this commands

 

cluster = RxSpark(sshHostname = "EdgeNode", sshUsername = "username")
rxSetComputeContext(cluster)
source = c("~/AirlineDemoSmall.csv")
dest_file = "/share"

rxHadoopMakeDir(dest_file)

They are getting thr same issue

On all node cluster hdfs dfs -ls / is working well

 

Please advise

Thanks

Asma

 

avatar
Contributor

should i create principle for each user in the AD ?

We are using active directory users?

If yes how so?

 

Many thanks

Asma

avatar
Master Mentor

@asmarz 

Good to know that your original issue is resolved.  However for any subsequent slightly different issue it is always better to open a new Community Thread that way the readers of this thread can easily find out One Error/Issue with one Solution.    Multiple issues in a single thread can cause readers to get confused.

.

If your question is answered then, Please make sure to mark the answer as the accepted solution.
If you find a reply useful, say thanks by clicking on the thumbs up button.