Member since
06-09-2016
529
Posts
129
Kudos Received
104
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1387 | 09-11-2019 10:19 AM | |
8392 | 11-26-2018 07:04 PM | |
1961 | 11-14-2018 12:10 PM | |
4083 | 11-14-2018 12:09 PM | |
2682 | 11-12-2018 01:19 PM |
04-19-2018
01:05 PM
@Anshuman Mehta Login to shell console on Ranger Admin host and as root user run: # ps -ef | grep rangeradmin # netstat -nap | grep <pid> # grep -C2 https /data1/hdp/2.6.4.0-91/ranger-admin/conf/ranger-admin-site.xml Paste/Attach the results here. Also please perform an extended file list in: ls -l /data1/hdp/2.6.4.0-91/ranger-admin/ews
... View more
04-19-2018
12:39 PM
@Yew Hoong Chia Is 120 retries count. Not ms/seconds unit but count of number of retry times it will attempt. From org.apache.hadoop.fs.CommonConfigurationKeys: /** number of zookeeper operation retry times in ActiveStandbyElector */
public static final String HA_FC_ELECTOR_ZK_OP_RETRIES_KEY ="ha.failover-controller.active-standby-elector.zk.op.retries";
public static final int HA_FC_ELECTOR_ZK_OP_RETRIES_DEFAULT = 3; HTH
... View more
04-19-2018
12:30 PM
@Sudheer Velagapudi Response status 401 usually means a problem with authentication. I recommend you check ranger admin ui host /var/log/ranger/admin xa_portal.log, catalina.out and access.log for more information. Also make sure ranger.tagsync.kerberos.keytab and ranger.tagsync.kerberos.principal are correctly set and that you can successfully run shell command kinit using the underlying file and principal.
... View more
04-19-2018
12:21 PM
@Vinay K 1. Usually Knox is configured with LDAP. But there are other options as well. Here is the link to the list of supported authentication providers for Knox: link (LDAP, PAM, Spnego and Anonymous) 2. The list of supported hadoop services and WebUI is here: link 3. No, you don't need to remove SPNEGO. HTH
... View more
03-23-2018
08:14 PM
@Yuval Smp What value you have for $ echo /proc/sys/vm/swappiness If greater than 0, could try to set swappiness to 0 and test?
... View more
03-23-2018
07:46 PM
@Sanjay Gurnani On my experience this works fine on spark 2.2 using DataFrame. With scala I will do like this: scala> df.write.format("orc").save("top10Salaries-Repartition1”) Then on hdfs user home directory I can see it created: [root@falbani-hdpmgmt bin]# hdfs dfs -ls top10Salaries-Repartition1/ Found 2 items -rw-r--r-- 3 falbani falbani top10Salaries-Repartition1/_SUCCESS -rw-r--r-- 3 falbani falbani top10Salaries-Repartition1/part-00000-445ad80a-8d18-4ba5-859e-cebdb8443e62.snappy.orc Maybe you can use Dataframe API. HTH
... View more
02-27-2018
06:41 PM
7 Kudos
Alfredo Sauce - Hadoop HTTP, Kerberos and SPNEGO Kerberos SPNEGO authentication for HTTP has been part of hadoop for some time now. On secure cluster many services use it to authenticate HTTP APIs and WEB UIs. Setup and configuration can become a challenge as it involves many aspects, including: kerberos principals, keytabs, network and load balancers, remote users accessing via different browsers, different operative systems, etc. In this article I will share how Kerberos SPNEGO authentication for HTTP works in hadoop. Introduction Kerberos SPNEGO authentication for HTTP was introduced to hadoop via HADOOP-7119. The implementation is based on a servlet filter that is configured to front all incoming HTTP requests to the application. If not valid hadoop.auth cookie is found, the servlet filter calls the KerberosAuthenticationHandler to perform kerberos authentication for the UserAgent request. Upon a successful kerberos authentication, servlet filter adds a signed cookie to the response so that following requests, as long as cookie is valid, are only authenticated via cookie and not via kerberos api. Configuration As far as configuration goes most hadoop services support the following properties with similar names: authentication.kerberos.keytab=/etc/security/keytabs/spnego.service.keytab Points to the location of the spnego keytab file authentication.kerberos.principal=HTTP/_HOST@REALM.COM Contains the principal name authentication.kerberos.name.rules auth_to_local rules Contains the auth to local rules Implementation details Kerberos SPNEGO authentication often requires more than one interaction until authentication is successful and a valid cookie is issued. Here is the sequence diagram for a successful authentication. Note: Sequence diagram HadoopAuthenticationFilter is actually an interface implemented by many hadoop services. For simplicity, instead of using a classname specific to any hadoop service I have kept the interface name. Diagram steps: Step (1): The first interaction any UserAgent makes doesn’t contain a valid hadoop.auth cookie. And it also doesn't contain the HTTP Authorization: Negotiate header, which is required to perform kerberos authentication. Hence the KerberosAuthenticationHandler quickly responds back with HTTP/1.1 401 Unauthorized. Step (2): Second interaction UserAgent makes contains the HTTP Authorization: Negotiate header. This header value is base64 encoded and contains the client kerberos token. At this point KerberosAuthenticationHandler performs the following two key steps: Step (3): Finding the right service principal to use is key to authenticate using kerberos. Here are the steps involved in finding the right service principal: At initialization time KerberosAuthenticationHandler reads the principals from spnego keytab (authentication.kerberos.keytab). Those principals are usually in the form HTTP/_HOST@REALM.COM. Where _HOSTS matches FQDN for the server where the service is running and/or the Load Balancer FQDN. Based on the incoming request the KerberosAuthenticationHandler computes the serverName. Here is exact way in which serverName is computed: final StringserverName = InetAddress.getByName(request.getServerName()).getCanonicalHostName(); Let’s break this down: request.getServerName(): Returns the host name of the server to which the request was sent. It is the value of the part before ":" in the Hostheader value, if any, or the resolved server name, or the server IP address (ref: ServletRequest API) InetAddress.getByName(_): Determines the IP address of a host, given the host's name (ref: InetAddress API) getCanonicalHostName(): Gets the fully qualified domain name for this IP address. Best effort method, meaning we may not be able to return the FQDN depending on the underlying system configuration (ref: InetAddress API) It’s important DNS reverse resolution is configured appropriately so that step 1 to 3 result in a valid FQDN. 4. serverName is used to search the hashmap loaded at initialization time for the right service principal name. If service principal is found this step completes successfully. You can see the following TRACE messages in the logs: TRACE KerberosAuthenticationHandler:422 - SPNEGO with server principals:[HTTP/serverName@REALM.COM] for serverName If no principal is found you will see the following (notice empty bracket): TRACE KerberosAuthenticationHandler:422 - SPNEGO with server principals:[]for serverName Step (4): KerberosAuthenticationHandler authenticates using kerberos api. Upon successful authentication it creates a valid authentication token. HadoopAuthenticationFilter receives the token, creates a valid hadoop.auth cookie, and allows request to continue to the requested resource. If trace is enabled logs will show: TRACE KerberosAuthenticationHandler:467 - SPNEGO initiated with server principal [HTTP/fqdn_of_server@REALM.COM]
TRACE KerberosAuthenticationHandler:494 - SPNEGO completed for client principal [user@REALM.COM] Step (5): Following requests made by UserAgent contain a valid hadoop.auth cookie. While cookie remains valid no kerberos authentication will be issued. Advanced Setup with Load Balancer Here is the list of things to check when configuring LB (Load Balancer) with Kerberos SPNEGO authentication for HTTP: Sticky/Persistent sessions are required at LB configuration. New Kerberos service principal needs to be created for the LB HTTP/<YOUR_LOAD_BALANCER_FQDN>@REALM.COM and keytab added to the authentication.kerberos.keytab file on all the application service nodes. Configuration property authentication.kerberos.principal must be set to a wildcard so that the LB service principal is also loaded when KerberosAuthenticationHandler initializes. authentication.kerberos.principal=* 4. Load Balancer's FQDN will resolve to possibly multiple different IP addresses. From service application host reverse DNS lookup for these IP addresses must resolve back to the Load Balancer FQDN. Here is an example: Load balancer FQDN: elb.example.com elf.example.com is mapped to 2 different internal IP addresses -> 192.168.1.10 and 192.168.1.15 Note: ping command issued multiple times helps to find out to what IP addresses the FQDN resolves to. PING elf.example.com (192.168.1.10) 56(84) bytes of data.
ping elf.example.com
PING elf.example.com (192.168.1.15) 56(84) bytes of data. Reverse resolution of IP 192.168.1.10 must be elb.example.com Reverse resolution of IP 192.168.1.15 must be elb.example.com You can use following java code to find out exactly how the serverName is being computed starting form Host: import java.net.InetAddress;
public class GetServerName
{
public static void main(String[] args) throws Exception
{
if(args.length != 1)
{
System.out.println("ERROR: Missing argument <Host>");
System.out.println("Use GetServerName <Host>.");
}
else {
final String serverName = InetAddress.getByName(args[0]).getCanonicalHostName();
System.out.format("Server name for %s is %s\n", args[0], serverName);
}
}
} Create file named GetServerName.java with the above content Run javac GetServerName.java Run java GetServerName <Host> Remote Users - Browser Configuration You should try to answer the following questions when configuring remote UserAgents: Do you have a valid kerberos ticket? Client remote users and services must acquire a valid kerberos ticket. While this task could be automated, sometimes it has to be done manually. Either case you can check what ticket you have by running command klist. What is the REALM for the principal being used and what is the REALM of the service principal your trying to connect with? If realms don't match you should perform the necessary configuration to establish trust between the REALMs. Use command klist to get details on what REALM is your principal using. On service side you can also use klist -kt to list contents of keytab to find the REALM service is using. Is your browser configured to perform SPNEGO correctly? There are several articles on WWW that cover how to perform this configuration for the most popular browsers. Make sure you follow the steps for your browser. Troubleshooting and DEBUG Server Side Your service log files are the place to check. To debug I recommend adding the following to your log4j log4j.logger.org.apache.hadoop.security.authentication.server=TRACE And for kerberos DEBUG you can also add the java argument -Dsun.security.krb5.debug=true Client Side I find very helpful to use curl command like this: curl-iv --negotiate -u :-X GET 'http://URL' With this configuration curl will display each interaction and headers involved. Here is an example: Note: Greater than sign ( > ) indicates request from UserAgent to application. Less than sign ( < ) indicates response from application to UserAgent. curl -iv --negotiate -u : -X GET 'http://oozielb.example.com:11000/oozie/'
GET /oozie/ HTTP/1.1
> Host: oozielb.example.com:11000
> User-Agent: curl/7.54.0
> Accept: */*
< HTTP/1.1 401 Unauthorized
< Date: Wed, 21 Feb 2018 17:29:15 GMT
< Content-Type: text/html;charset=utf-8
< Content-Length: 997
< Connection: keep-alive
< Server: Apache-Coyote/1.1
< WWW-Authenticate: Negotiate
< Set-Cookie: hadoop.auth=; Path=/; HttpOnly
> GET /oozie/ HTTP/1.1
> Host: oozielb.example.com:11000
> Authorization: Negotiate YII....................This is the client kebreros token
> User-Agent: curl/7.54.0
> Accept: */*
< HTTP/1.1 200 OK
< Date: Wed, 21 Feb 2018 17:29:15 GMT
< Content-Type: text/html
< Content-Length: 3754
< Connection: keep-alive
< Server: Apache-Coyote/1.1
< Set-Cookie: hadoop.auth="u=falbani&p=falbani@EXAMPLE.COM&t=kerberos&e=1519270155204&s=6RmPzEYJR0nsF2i7TFk4S+lNydc="; Path=/; HttpOnly
< Set-Cookie: JSESSIONID=254F8AA4060810E7545DEE95F2E6AB83; Path=/oozie
< Continuation you will see the HTML WEB PAGE content
Article Title If you are wondering about article title used you should review jira HADOOP-7119 😉 Thanks Special thanks to @emattos and @Vipin Rathor that helped reviewing this article.
... View more
Labels:
02-07-2018
02:32 PM
Hi @Matt Clarke run into a problem when trying to add component base granular policy /process-group/<uuid>. It should be /process-groups/<uuid> there is a missing 's' please added whenever you can!
... View more
11-14-2017
01:39 AM
4 Kudos
Objective
Using correct HDP repositories is a requirement when building Spark production applications that run on HDP. Hence I decided to create this article to help those creating new spark applications using Eclipse with maven that may not know how to reference the Hortornworks repositories instead of the default ones.
How-To
Following video goes step by step on how to create a simple spark application using the Hortonworks repositories. I will share the content of pom.xml and Hello scala class bellow. Perquisites 1. From market place you need to install Scala IDE for Eclipse. Site: http://scala-ide.org/docs/current-user-doc/gettingstarted/index.html 2. Second install the maven integration for Scala IDE plugin. Site: http://alchim31.free.fr/m2e-scala/update-site/ 3. Finally you need to add the archtype Remote Catalog - Url: http://repo1.maven.org/maven2/archetype-catalog.xml
The pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0"; xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"; xsi:schemaLocation="http://maven.apache.org/POM/4.0.0http://maven.apache.org/maven-v4_0_0.xsd">;
<modelVersion>4.0.0</modelVersion>
<groupId>example</groupId>
<artifactId>spark101</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>${project.artifactId}</name>
<description>My wonderfull scala app</description>
<inceptionYear>2015</inceptionYear>
<repositories>
<repository>
<id>hortonworks repo</id>
<name>hortonworks repo</name>
<url>http://repo.hortonworks.com/content/repositories/releases/</url>
</repository>
<repository>
<id>hortonworks jetty</id>
<name>hortonworks jetty repo</name>
<url>http://repo.hortonworks.com/content/repositories/jetty-hadoop/</url>
</repository>
</repositories>
<licenses>
<license>
<name>My License</name>
<url>http://....</url>;
<distribution>repo</distribution>
</license>
</licenses>
<properties>
<maven.compiler.source>1.6</maven.compiler.source>
<maven.compiler.target>1.6</maven.compiler.target>
<encoding>UTF-8</encoding>
<scala.version>2.11.5</scala.version>
<scala.compat.version>2.11</scala.compat.version>
</properties>
<dependencies>
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>${scala.version}</version>
</dependency>
<!-- Test -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.specs2</groupId>
<artifactId>specs2-core_${scala.compat.version}</artifactId>
<version>2.4.16</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.scalatest</groupId>
<artifactId>scalatest_${scala.compat.version}</artifactId>
<version>2.2.4</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1.2.6.1.0-129</version>
</dependency>
</dependencies>
<build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
<plugin>
<!-- see http://davidb.github.com/scala-maven-plugin -->
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<version>3.2.0</version>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
<configuration>
<args>
<arg>-make:transitive</arg>
<arg>-dependencyfile</arg>
<arg>${project.build.directory}/.scala_dependencies</arg>
</args>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.18.1</version>
<configuration>
<useFile>false</useFile>
<disableXmlReport>true</disableXmlReport>
<!-- If you have classpath issue like NoDefClassError,... -->
<!-- useManifestOnlyJar>false</useManifestOnlyJar -->
<includes>
<include>**/*Test.*</include>
<include>**/*Suite.*</include>
</includes>
</configuration>
</plugin>
</plugins>
</build>
</project>
<br>
Important Note: We will use the HDP 2.6.1 Spark 2.1.1 dependencies to build the project. If you are running different HDP version you need to check and correct the dependency to match the correct version being used. Also you should check which is the correct scala you should use in your project. For spark 2.1.1 the correct scala version is 2.11.x.
The Hello scala class
Create a new package called example Create a new scala class called Hello inside the example package with following content:
package example
import org.apache.spark.{SparkConf, SparkContext}
object Hello extends Greeting with App {
val conf = new SparkConf().setAppName(appName)
val sc = new SparkContext(conf)
println(greeting)
println(sc.version)
}
trait Greeting {
lazy val appName = "Hello World Spark App"
lazy val greeting: String = "hello"
}
... View more
Labels:
11-09-2017
04:41 PM
8 Kudos
Objective
Using correct HDP repositories is a requirement when building Spark production applications that run on HDP. Hence I decided to create this article to help those creating new spark applications using IntelliJ which don't know how to reference the Hortornworks repositories instead of the default ones. How-To Following video goes step by step on how to create a simple spark application using the Hortonworks repositories. I will share the content of build.sbt and Hello scala class bellow. The build.sbt name := "sparkTest"
version := "0.1"
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.1.1.2.6.1.0-129"
resolvers := List("Hortonworks Releases" at "http://repo.hortonworks.com/content/repositories/releases/", "Jetty Releases" at "http://repo.hortonworks.com/content/repositories/jetty-hadoop/")
Important Note: We will use the HDP 2.6.1 Spark 2.1.1 dependencies to build the project. If you are running different HDP version you need to check and correct the dependency to match the correct version being used. Also you should check which is the correct scala you should use in your project. For spark 2.1.1 the correct scala version is 2.11.x. The Hello scala class
Create a new package called example Create a new scala class called Hello inside the example package with following content: package example
import org.apache.spark.{SparkConf, SparkContext}
object Hello extends Greeting with App {
val conf = new SparkConf().setAppName(appName)
val sc = new SparkContext(conf)
println(greeting)
println(sc.version)
}
trait Greeting {
lazy val appName = "Hello World Spark App"
lazy val greeting: String = "hello"
}
... View more
Labels: