Member since
08-13-2019
37
Posts
26
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5583 | 12-31-2018 08:44 AM | |
1689 | 12-18-2018 08:39 PM | |
1352 | 08-27-2018 11:29 AM | |
3483 | 10-12-2017 08:35 PM | |
2354 | 08-06-2017 02:57 PM |
06-21-2021
12:14 AM
@saamurai, as this is an older post, you would have a better chance of receiving a resolution by starting a new thread. This will also be an opportunity to provide details specific to your environment that could aid others in assisting you with a more accurate answer to your question. You can link this thread as a reference in your new post.
... View more
01-08-2019
08:43 AM
Thanks @Stefan Kupstaitis-Dunkler, I marked best answer and I will create a new question for this problem . Can you provide location of these log file? I confused that Can I use metron for Collect windows and linux hosts and network devices log for security purpose ? ( Threat detection and etc) Please accept my thanks for your helps
... View more
07-13-2018
08:10 AM
Summary Using Apache Solr as the indexing and search engine for Metron requires the Metron REST service to perform queries to multple collections. If the Ranger plugin is active there is currently a gotcha ( = Ranger Solr plugin bug). If you don't want to give the Metron user full access to all Solr collections here is a workaround. The Problem 2+ Solr collections that are being queried: metaalert, cef,.... (and other parser collections): 1 user: metron 1 Ranger policy: user: "metron", access type: "Read", "Write", collections: "metaalert", "cef" Query of metaalert collection returns content of metaalert collection as expected and logs event successfully in Ranger audit. curl -k --negotiate -u : "http://solr_url:solr_port/solr/metaalert/search?q=*" Query of cef collection returns content of cef collection as expected and logs it successfully in Ranger audit. curl -k --negotiate -u : "http://solr_url:solr_port/solr/cef/search?q=*" Query of metaalert and cef will return a "403 Unauthorized request". This is what the Metron REST server does: curl -k --negotiate -u : "http://solr_url:solr_port/solr/metaalert/select?q=*&collections=metaalert,cef" In Ranger audit we now see 3 lines:
user: metron, resource: metaalert,cef, Result: Denied user: metron, resource: metaalert, Result: Allowed user: metron, resource: cef, Result: Allowed Expectation would be that query is successfull! Workaround(s) One workaround would be to give metron access to all collections: "*" . We usually don't want that on clusters, that are being used by other use cases. Another workaround would be to give metron access to "*metaalert*" collection.
... View more
Labels:
08-06-2017
06:57 PM
problem solved. just had to kinit the hduser while in hduser
... View more
07-13-2017
10:27 PM
4 Kudos
This article is based on one of my blog posts. It is specifically about how to troubleshoot and debug an application behind Knox and ultimately get it up and running. Start Small
First try to access the service directly before you go over Knox. In many cases, there’s nothing wrong with your Knox setup, but with either the way you setup and configured the service behind Knox or the way you try to access that service.
When you are familiar on how to access your service directly and when you have verified that it works as intended, try to do the same call on Knox. Example:
You want to check if webhdfs is reachable so you first verify directly at the service and try to get the home directory of the service.
curl --negotiate -u : http://webhdfs-host.field.hortonworks.com:50070/webhdfs/v1/?op=GETHOMEDIRECTORY
If above request gives a valid 200 response and a meaningful answer you can safely check your Knox setup.
curl -k -u myUsername:myPassword https://knox-host.field.hortonworks.com:8443/gateway/default/webhdfs/v1/?op=GETHOMEDIRECTORY
Note: Direct access of WebHDFS and access of WebHDFS over Knox use two different authentication mechanisms: The first one uses SPNEGO which requires a valid Kerberos TGT in a secure cluster, if you don’t want to receive a “401 – Unauthorized” response. The latter one uses HTTP basic authentication against an LDAP, which is why you need to provide username and password on the command line.
Note 2: For the sake of completeness, I mention that here: Obviously, you direct the first request directly to the service host and port, while you direct your second request to the Knox host and port and specify which service.
The next section answers the question, what to do if the second command fails? (If the first command fails, go setup your service correctly and return later). Security Related Issues
So what do the HTTP response codes mean for a Knox application? Where to start?
Very common are “401 – Unauthorized”. This can be misleading, since 401 is always tied to authentication – not authorization. That means you need to probably check one of the following items. Which of these items causes the error can be found in the knox log (per default /var/log/knox/gateway.log )
Is your username password combination correct (LDAP)?
Is your username password combination in the LDAP you used?
Is your LDAP server running?
Is your LDAP configuration in the Knox topology correct (hostname, port, binduser, binduser password,…)?
Is your LDAP controller accessible through the firewall (ports 389 or 636 open from the Knox host)?
Note: Currently (in HDP 2.6), you can specify an alias for the binduser password. Make sure, that this alias is all lowercase. Otherwise you will get a 401 response as well.
If you got past the 401s, a popular response code is “403 – Unauthorized”. Now this has actually really something to do with authorization. Depending on if you use ACL authorization or Ranger Authorization (which is recommended) you go ahead differently. If you use ACLs, make sure that the user/group is authorized in your topology definition. If you use Ranger, check the Ranger audit log dashboard and you will immediately notice two possible error sources:
Your user/group is not allowed to use Knox.
Your user/group is not allowed to use the service that you want to access behind Knox.
Well, we came a long way and with respect to security we are almost done. One possible problem you could become is with impersonation. You need knox to be allowed to impersonate any user who access a service with knox. This is a configuration in core-site.xml: hadoop.proxyuser.knox.groups and hadoop.proxyuser.knox.hosts . Enter a comma separated list of groups and hosts that should be able to access a service over knox or set a wildcard * .
This is what you get in the Knox log, when your Ranger Admin server is not running and policies cannot be refreshed.
2017-07-05 21:11:53,700 ERROR util.PolicyRefresher (PolicyRefresher.java:loadPolicyfromPolicyAdmin(288)) - PolicyRefresher(serviceName=condlahdp_knox): failed to refresh policies. Will continue to use last known version of policies (3)
javax.ws.rs.ProcessingException: java.net.ConnectException: Connection refused (Connection refused)
This is also a nice example of Ranger’s design to not interfere with services if it’s down: policies will not be refreshed, but are still able operate as intended with the set of policies before Ranger crashed. Application Specific Issues
Once you are past the authentication and authorization issues, there might be issues with how Knox interacts with its applications. This section might grow with time. If you have more examples of application specific issues, leave a comment or send me an email. Hive:
To enable Hive working with Knox, you need to change the transport mode from binary to http. It might be necessary in rare cases to not only restart Hiveserver2 after this configuration change, but also the Knox gateway.
This is what you get when you don’t switch the transport mode from “binary” to “http”. Binary runs on port 10000, http runs on port 10001. When binary transport mode is still active Knox will try to connect to port 10001 which is not available and thus fails with “Connection refused”.
2017-07-05 08:24:31,508 WARN hadoop.gateway (DefaultDispatch.java:executeOutboundRequest(146)) - Connection exception dispatching request: http://condla0.field.hortonworks.com:10001/cliservice?doAs=user org.apache.http.conn.HttpHostConnectException: Connect to condla0.field.hortonworks.com:10001 [condla0.field.hortonworks.com/172.26.201.30] failed: Connection refused (Connection refused)
org.apache.http.conn.HttpHostConnectException: Connect to condla0.field.hortonworks.com:10001 [condla0.field.hortonworks.com/172.26.201.30] failed: Connection refused (Connection refused)
at org.apache.http.impl.conn.DefaultHttpClientConnectionOperator.connect(DefaultHttpClientConnectionOperator.java:151)
at org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect(PoolingHttpClientConnectionManager.java:353)
When you fixed all possible HTTP 401 errors for other services than Hive, but still get on in Hive, you might forget to pass username and password to beeline beeline -u "<jdbc-connection-string>" -n <username> -p <password> The correct jdbc-connection-string should have a format as in the example below: jdbc:hive2://$KNOX_HOSTNAME:$KNOX_PORT/default;ssl=true;sslTrustStore=$TRUSTSTORE_PATH;trustStorePassword=$TRUSTSTORE_SECRET;transportMode=http;httpPath=gateway/default/hive $TRUSTSTORE_PATH is the path to the truststore containing the knox server certificate, on the server with root access you could e.g. use /usr/hdp/current/knox-server/data/security/keystores/gateway.jks $KNOX_HOSTNAME is the hostname where the Knox instance is running $KNOX_PORT is the port exposed by Knox $TRUSTSTORE_SECRET is the secret you are using for your truststore
Now, this is what you get, when you connect via beeline trying to talk to Knox from a different (e.g. internal) hostname than the one configured in the ssl certificate of the server. Just change the hostname and everything will work fine. While this error is not specifically Hive related, you will most of the time encounter it in combination with Hive, since most of the other services don’t require you to check your certificates. Connecting to jdbc:hive2://knoxserver-internal.field.hortonworks.com:8443/;ssl=true;sslTrustStore=truststore.jks;trustStorePassword=myPassword;transportMode=http;httpPath=gateway/default/hive
17/07/06 12:13:37 [main]: ERROR jdbc.HiveConnection: Error opening session
org.apache.thrift.transport.TTransportException: javax.net.ssl.SSLPeerUnverifiedException: Host name 'knoxserver-internal.field.hortonworks.com' does not match the certificate subject provided by the peer (CN=knoxserver.field.hortonworks.com, OU=Test, O=Hadoop, L=Test, ST=Test, C=US)
HBase:
WEBHBASE is the service in a Knox topology to access HBase via the HBase REST server. Of course, a prerequisite is that the HBase REST server is up and running.
Even if it is up and running it can occur that you receive an Error with HTTP code 503. 503: Unavailable. This is not related to Knox. You can track down the issue to a HBase REST server related issue, in which the authenticated user does not have privileges to e.g. scan the data. Give the user the correct permissions to solve this error.
... View more
Labels:
08-16-2017
08:18 PM
This is a bug in Ambari 2.5.1 [https://issues.apache.org/jira/browse/AMBARI-21473] Resolution: Remove /etc/zeppelin/conf/interpreter.json file and restart Zeppelin service
... View more
05-30-2017
08:25 PM
To answer your question regarding Zookeeper. HBase needs Zookeeper. If you didn't set up Zookeeper yourself, HBase spins up an "internal" Zookeeper server, which is great for testing, but shouldn't be used in production scenarios.
... View more
05-15-2017
05:14 PM
11 Kudos
In the documentation of the particular projects you can find a number of details on how these components work on their own and on which services they rely. Since the projects are open source you can of course check out the source code for more information. Therefore, this article aims to summarize, rather than explain each process in detail. In this article I am first going through some basic component descriptions to get an idea which services are in use. Then I explain the “security flow” from a user perspective (authentication –> impersonation (optional) –> authorization –> audit) and provide a short example using Knox. When reading the article keep following figure in mind: Component Descriptions and Concepts Apache Ranger Components and what they do Ranger Admin Service: Provides RESTful API and a UI to manage authorization policies and service access audits based on resources, users, groups and tags. Ranger User sync: Syncs users and groups from an LDAP source (OpenLDAP or AD) Stores users and groups in the relational DB of the Ranger service. Ranger Plugins: Service side plugin, that syncs policies from Ranger per default every 30 seconds. That way authorization is possible even if Ranger Admin does not run in HA mode and is currently down. Ranger Tag Sync: Syncs tags from Atlas meta data server Stores tags in the relational DB of the Ranger service. Ranger Key Management Service (KMS): Provides a RESTful API to manage encryption keys used for encrypting data at rest in HDFS. Supporting relational Database: Contains all policies, synced users, groups, tags Supporting Apache Solr instances: Audits are stored here. Documentation For the newest HDP release (2.6.0) use these Ranger Docs Apache Atlas Components Meta Data Server Provides a RESTful API and a UI to manage meta data objects Metastore Contains meta data objects Index Maintains index to meta data objects Documentation For the newest HDP release (2.6.0) use these Atlas Docs Apache Knox Knox serves as a gateway and proxy for Hadoop services and their UIs so that they can be accessible behind a firewall without requiring to open too many ports in the firewall. Documentation For the newest HDP release (2.6.0) use these Knox Docs Wire Encryption Concepts To complete the picture I just want to mention that it is very important, to not only secure the access of services, but also encrypt data transferred between services. Keystores and Truststores To enable a secure connection (SSL) between a server and a client, first an encryption key needs to be created. The server uses it to encrypt any communication. The key is securely stored in a keystore for Java services JKS could be used. In order for a client to trust the server, one could export the key from the keystore and import it into a truststore, which is basically a keystore, containing keys of trusted services. In order to enable two-way SSL the same thing needs to be done on the client side. After creating a key in a keystore the client can access, put it into a trust store of the server. Commands to perform these actions are: Generate key in “/path/to/keystore.jks” setting its alias to “myKeyAlias” and its password to “myKeyPassword”. If the keystore file “/path/to/keystore.jks” does not exist, this will command will also create it. keytool -genkey -keyalg RSA -alias myKeyAlias -keystore /path/to/keystore.jks -storepass myKeyPassword -validity 360 -keysize 2048 Export key stored in “/path/to/keystore.jks” with alias “myKeyAlias” into a file “myKeyFile.cer” keytool -export -keystore /path/to/keystore.jks -alias myKeyAlias -file myKeyFile.cer Import key from a file “myKeyFile.cer” with alias “myKeyAlias” into a keystore (that may act as truststore) named “/path/to/truststore.jks” using the password “trustStorePassword” keytool -import -file myKeyFile.cer -alias myKeyAlias -keystore /path/to/truststore.jks -storepass trustStorePassword Active Directory Components Authentication Server (AS) Responsible for issuing Ticket Granting Tickets (TGT) Ticket Granting Server (TGS) Responsible for issuing service tickets Key Distribution Center (KDC) Talks with clients using KRB5 protocol AS + TGS LDAP Server Contains user and group information and talks with its clients using the LDAP protocol. Supporting Database Security Flow Authentication Only a properly authenticated user (which can also be a service using another service) can communicate successfully with a kerberized Hadoop service. Missing the required authentication, in this case by proving the identity of both user and the service, any communication will fail. In a kerberized environment user authentication is provided via a ticket granting ticket (TGT). Note: Not using KERBEROS, but SIMPLE authentication, which is set up by default, provides any user with the possibility to act as any other type of user, including the superuser. Therefore strong authentication using Kerberos is highly encouraged. Technical Authentication Flow User requests TGT from AS. This is done automatically upon login or using the kinit command. User receives TGT from AS. User sends request to a kerberized service. User gets service ticket from Ticket Granting Server. This is done automatically in the background when user sends a request to the service. User sends service a request to the service using the service ticket. Authentication Flow from a User Perspective Most of the above processes are hidden from the user. The only thing, the user needs to do before issuing a request from the service is to login on a machine and thereby receive a TGT or receive it programmatically or obtain it manually using the kinit command. Impersonation This is the second step after a user is successfully authenticated at a service. The user must be authenticated, but can then choose to perform the request to the service as another user. If everyone could do this by default, this would raise another security concern and the authentication process would be futile. Therefore this behaviour is forbidden by default for everyone and must be granted for individual users. It is used by proxy services like Apache Ambari, Apache Zeppelin or Apache Knox. Ambari, Zeppelin and Knox authenticate as “ambari”, “zeppelin”, “knox” users, respectively, at the service using their TGTs, but can choose to act on behalf of the person, who is logged in in the browser in Ambari, Zeppelin or Knox. This is why it is very important to secure these services. To allow, for example, Ambari to perform operations as another user, set the following configs in the core-site.xml, hadoop.proxyuser.ambari.groups and hadoop.proxyuser.ambari.hosts , to a list of groups or hosts that are allowed to be impersonated or set a wildcard * . Authorization Authorization defines the permissions of individual users. After it is clear which user will be performing the request, i.e., the actually authenticated or the impersonated one, the service checks against the local Apache Ranger policies, if the request is allowed for this certain user. This is the last instance in the process. A user passing this step is eventually allowed to perform the requested action. Audit Every time the authorization instance is called, i.e., policies are checked if the action of a user is authorized or not, an audit event is being logged, containing, time, user, service, action, data set and success of the event. An event is not logged in Ranger in case a user without authentication tries to access data or if a user tries to impersonate another user, without having appropriate permissions to do so. Example Security Flow Using Apache Knox Looking at the figure above you can follow what’s going on in the background, when a user Eric wants to push a file into the HDFS service on path “/user/eric/” from outside the Hadoop cluster firewall. User Eric sends the HDFS request including the file and the command to put that file into the desired directory, while authenticating successfully via LDAP provider at the Apache Knox gateway using his username/password combination. Eric does not need to obtain a Kerberos ticket. In fact, since he is outside the cluster, he probably does not have access to the KDC through the firewall to obtain one anyway. Knox Ranger plugin checks, if Eric is allowed to use Knox. If he’s not, the process ends here. This event is logged in Ranger audits. Knox has a valid TGT (and refreshes it before it becomes invalid), obtains a service ticket with it and authenticates at the HDFS namenode as user “knox”. Knox asks the service to perform the action as Eric, which is configured to be allowed. Ranger HDFS plugin checks, if Eric has the permission to “WRITE” to “/user/eric”. If he’s not, the process ends here. This event is logged in Ranger audits. File is pushed to HDFS. I hope this article helps to get a better understanding of the sercurity concepts within the Hadoop Ecosystem. I published the original article on my blog.
... View more