Member since
09-11-2015
115
Posts
126
Kudos Received
15
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1050 | 08-15-2016 05:48 PM | |
758 | 05-31-2016 06:19 PM | |
861 | 05-11-2016 03:10 PM | |
518 | 05-10-2016 07:06 PM | |
1971 | 05-02-2016 06:25 PM |
04-04-2017
04:26 PM
The second link no longer works. It would be nice to have a comprehensive comparison, rather than "jupyter is good for running python locally, zeppelin is better for cluster workloads"
... View more
03-10-2017
06:11 AM
This is very helpful, thank you. Can you please advise on where to find the Jetty logs, in case of issues with the web application itself?
... View more
08-15-2016
05:48 PM
There could be a problem with the certificate itself. I recommend regenerating it and trying again. You can follow instructions in the Apache Knox Users Guide to generate a self-signed certificate: http://knox.apache.org/books/knox-0-6-0/user-guide.html#Generating+a+self-signed+cert+for+use+in+testing+or+development+environments If you want to use a more legitimate certificate you can generate and sign it yourself with OpenSSL or from a CA, and follow the steps in the next section of the guide, Using a CA Signed Key Pair.
... View more
08-12-2016
05:03 PM
Can you provide more details about how you're attempting to connect, and with which client? If you're using curl, specify the exact command (masking the user password if you want), and the exact version of curl + OS
... View more
07-11-2016
10:17 PM
The only way to reliably accomplish this is to prevent users from logging into cluster nodes at all, and force them to use beeline to access HS2 in HTTP mode through Knox. Every solution recommending changes to hive-env.sh or hive.distro can be overridden by using a modified copy of those files. Those files could even be copied from elsewhere, because this is all open source.
... View more
06-16-2016
09:53 PM
The purpose of Knox is to provide secure access to cluster REST interfaces by external users. It will not restrict access for users who connect directly to the NameNode web UI without going through Knox. One option is to implement the Knox Gateway, restrict users from accessing the cluster directly (via your choice of infrastructure... firewall, network routing, etc), and have them go through Knox instead. The web UIs will be supported by Knox in the next major HDP release, but many people have successfully used community-contributed services to expose the UIs with the current version of Knox. Knox typically authenticates against an LDAP directory, so end users would use their credentials from the configured LDAP directory. To control who has access to HDFS resources you could use Ranger: HDP 2.4 Security Guide - Authorization If security is a concern then it's highly recommended to secure the cluster using Kerberos. Then an alternative to forcing users to go through Knox would be to enable SPNEGO authentication for the web UIs.
... View more
06-13-2016
06:29 PM
Running each topology on its own Gateway instance is fine, but it's not necessary. You can use a single Knox Gateway instance and simply create a separate topology per-AD. Say you have 2 topologies, ad1 and ad2, then you can connect using: https://knox-host:8443/gateway/ad1/<service>/. https://knox-host:8443/gateway/ad2/<service>/.
... View more
06-10-2016
06:18 PM
2 Kudos
This feature is tracked in YARN-2477 (DockerContainerExecutor must support secure mode). There is currently no fix version specified. Docker container support on YARN is still very new, so you might want to follow the umbrella JIRA, (YARN-2466) to gain an idea of when additional features might become available.
... View more
06-08-2016
06:19 PM
@Tim Veil you might find this post helpful as a reference, or to integrate into your project: https://community.hortonworks.com/articles/29203/automated-kerberos-installation-and-configuration.html
... View more
06-07-2016
07:05 PM
@Pardeep Gorla you could try to define a custom stack in Ambari:
https://cwiki.apache.org/confluence/display/AMBARI/Defining+a+Custom+Stack+and+Services If you are interested in alternatives to Sentry that are already covered by Ambari, feel free to provide details about your security use case, and I'm sure you'll get some good recommendations.
... View more
06-07-2016
07:00 PM
@Giuseppe cloud does the hostname of the host that is causing problems begin with a non-alphabetic character, or contain any uppercase characters? If so, you might be encountering a bug. You can verify the hostnames reported by 'hst list-agents' matches the hostnames in Ambari. Also verify the JDK version is consistent across all nodes (probably not an issue since you are using Ambari). If the above does not apply, please check /var/log/hst/hst-agent.log on the problem host, and provide any details here that might help narrow the cause of the issue.
... View more
06-01-2016
03:04 PM
1 Kudo
Sagar's answer is the best solution if both clusters will use the same AD. If each cluster has its own AD with unique users and groups, then you should clarify what you are hoping to gain by duplicating the policies. Keeping in mind that you'll need to sync them on an ongoing basis, it seems like "updating" every policy for a new set of users/groups would be more work than manually adding the policies on each cluster.
... View more
06-01-2016
02:07 PM
It sounds like there are two conflicting goals you might want to achieve. Is the intention to migrate cluster-B/2 to use the same AD as cluster-A/1? Or do all users have accounts in both ADs, and you want to translate the policies from A to B but keep them on different ADs?
... View more
05-31-2016
06:19 PM
1 Kudo
Anonymization rules are covered in the SmartSense Admin Guide. You will need to use a regular expression-based rule to mask from a text file. Depending on what text file(s) may contain passwords, you can either specify the exact filename or use a regular expression here as well. It's best to define the path as specifically as possible to avoid accidentally masking values in unrelated files. The string to mask/replace is identified by a regular expression. Here's a very simple example that will replace a line that contains the string "password:" in my-credentials.txt: {
"name":"my_credentials",
"path":"my-credentials.txt",
"pattern": ".*password:.*",
"value": "password: Hidden"
},
... View more
05-26-2016
07:20 PM
This recently started happening in my scripts too and I hadn't figured out why. Thanks for the tip!
... View more
05-24-2016
03:52 PM
Could you provide the output of the following command while executing the curl command? tail -f /var/log/knox/gateway*.log Also let us know the exact HDP version you're using, and whether you are using Kerberos and/or NameNode HA.
... View more
05-19-2016
06:41 PM
Could you provide relevant entries from Ranger Admin logs (/var/log/ranger/admin/*.log) and/or HAProxy logs? Verbose curl output or a screenshot of the browser console may also provide a clue. Your current HAProxy config would also be useful.
... View more
05-19-2016
03:59 AM
10 Kudos
SmartSense is an excellent tool for keeping your cluster running at optimal efficiency while maintaining operational best practices. We’ve combined knowledge from the greatest minds in the industry, and use it to analyze metadata about your cluster from the bundles you submit. Have you ever wondered exactly what data you’re sending to SmartSense? The SmartSense Admin Guide contains a high-level description (see What’s Included in a Bundle), but for the greatest understanding you should extract a bundle and explore it with your own eyes! Obtain a Bundle There are two types of bundles... Analysis Bundle: configs and metrics for all services on all hosts Troubleshooting Bundle: Analysis Bundle + logs for selected service(s) To begin, let’s capture an Analysis Bundle: ...and download an unencrypted copy to our local machine: The bundle is a gzipped tar file that contains a gzipped tar file from each host running the HST Agent. In the following examples, notice the bundle variable excludes the .tgz extension. Linux or OS X users can extract everything with a bash for-loop: bundle=a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35
tar zxf $bundle.tgz && cd $bundle && for i in * ; do tar zxf "$i" ; rm "$i" ; done Windows users can use a similar process with a utility like 7-Zip. Assuming 7z.exe is in your path: setlocal
set bundle=a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35
7z x %bundle%.tgz && 7z x %bundle%.tar && rm %bundle%.tar && cd %bundle%
for %i in (*.tgz) do 7z x %i && rm %i
for %i in (*.tar) do 7z x %i && rm %i
endlocal Exploring Bundle Contents NOTE: Example console output was obtained from a SmartSense 1.2.1 bundle and may differ in future versions. The output is also truncated for brevity. You’re encouraged to follow along with a bundle from your own cluster. For a convenient overview of the bundle contents, use the tree command, limited to a depth of 3: MyLaptop:a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35 myuser$ tree -L 3
.
├── meta
│ └── metadata.json
├── mgmt.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35
│ ├── os
│ │ ├── logs
│ │ └── reports
│ └── services
│ ├── AMBARI
│ ├── AMS
│ ├── HDFS
│ ├── HST
│ ├── MR
│ ├── TEZ
│ ├── YARN
│ └── ZK
├── node1.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35
│ ├── os
│ │ ├── logs
│ │ └── reports
│ └── services
│ ├── AMBARI
...
41 directories, 4 files At the root of the bundle, we see a ‘meta’ folder, and a folder per host. The meta folder contains some bundle metadata. Note that domain names are anonymized (my cluster uses example.com). Let’s take a look inside the two subfolders (os & services) per host... Bundle Contents: OS The os folder contains a couple system logs and a variety of reports. Here’s a sample from my cluster: MyLaptop:node1.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35 myuser$ tree -I "blockdevices" os/
os/
├── logs
│ └── messages.log
└── reports
├── chkconfig.txt
├── cpu_info.txt
├── dns_lookup.txt
├── dstat.txt
├── error_dmesg.txt
├── file_max.txt
...
5 directories, 49 files Most of the filenames here are self-explanatory. Reports generally contain output from system commands or the /proc filesystem. These system characteristics serve as valuable inputs for determining your cluster’s optimal configuration. Bundle Contents: Services Within each host folder, the services subfolder contains configurations and reports for every HDP service on that host. Here’s an example from my node1: MyLaptop:node1.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35 myuser$ tree -L 3 services
services
├── AMBARI
│ ├── conf
│ │ ├── ambari-agent.ini
│ │ ├── ambari-agent.pid
│ │ └── logging.conf.sample
│ └── reports
│ ├── ambari_rpm.txt
│ ├── postgres_rpm.txt
│ ├── postmaster.txt
│ └── process_info.txt
├── AMS
│ ├── conf
│ │ ├── ams-env.sh
│ │ ├── metric_groups.conf
│ │ └── metric_monitor.ini
│ ├── metrics
│ │ └── ams
│ └── reports
│ └── ams_rpm.txt
...
32 directories, 157 files The conf folders are copied from their respective locations under /etc/ (or /var/run for the .pid files). Reports contain JMX metrics and output from CLI commands, such as the YARN application list. You can explore the contents using text processing commands like grep, sort, and uniq, which might be sufficient for your needs. Another option is to use a text editor with a file-tree view. Text Editors Here are three open source text editors that integrate a file-tree for easy navigation (see attachments at the bottom for full-size images)... TextMate 2 (OS X): Notepad++ (Windows): Vim + NerdTree (Linux, OS X): Anonymization Rules The default set of anonymization rules will protect IP addresses, hostnames, and password fields in standard HDP configuration files. You can modify or add anonymization rules if desired. Watch for a future HCC article where we take a deep dive into anonymization. After making any changes to the anonymization ruleset, it is wise to verify everything is still functioning as intended. This can be accomplished by downloading an unencrypted bundle and examining its contents using the methods described above. Until Next Time... Keeping in mind that we only looked within a single host folder, and that my demo cluster has the minimum number of components for a functioning HDP stack, we can see that every bundle is packed with useful information.
Knowing exactly what’s included in a SmartSense bundle provides peace of mind, and the trust that your confidential data remains secure and private.
... View more
- Find more articles tagged with:
- Cloud & Operations
- How-ToTutorial
- operations
- Security
- smartsense
Labels:
05-18-2016
07:32 PM
Good find! Here's a copy of the workaround: Replace /var/lib/knox/data/services/yarn-ui/2.7.1/rewrite.xml with the attached rewrite.xml (change ownership to knox:knox) Restart Knox Note that "data" might be version-specific (e.g. data-2.4.2.0-258), or you can use /usr/hdp/current/knox-server/data/ instead. The fixed rewrite.xml is attached.
... View more
05-17-2016
09:07 PM
Can you verify the NN is listening on the public interface? Since you're able to ssh using the public hostname, but attempts to use WebHDFS don't show up in the NN log, it sounds like NN might only be listening on the internal interface.
... View more
05-17-2016
07:41 PM
Can you confirm network connectivity from Mac to namenode by some other means, like hdfs client (not webhdfs) or ping/ssh? Do you see the same behavior from both NNs?
... View more
05-16-2016
09:43 PM
Did you obtain a ticket first, and is krb5.conf configured to use the same KDC on your laptop and the cluster? Any errors in the namenode log?
... View more
05-12-2016
04:23 PM
@Benjamin R Does it work if you add a trailing slash?
... View more
05-11-2016
05:46 PM
1 Kudo
For quick reference, here's an example of adding Oozie UI to HDP 2.4 Sandbox: 1. start Sandbox and make sure all non-maintenance services are running 2. add service definition: git clone https://git-wip-us.apache.org/repos/asf/knox.git
cp -R knox/gateway-service-definitions/src/main/resources/services/oozieui /var/lib/knox/data-2.4.0.0-169/services/
chown -R knox:knox /var/lib/knox/data-2.4.0.0-169/services/oozieui
3. add OOZIEUI service to default.xml topology (Ambari > Knox > Configs > Advanced topology) <service>
<role>OOZIEUI</role>
<url>http://{{oozie_server_host}}:{{oozie_server_port}}/oozie</url>
</service>
4. start (or restart) Knox & Demo LDAP (using Ambari) 5. visit https://localhost:8443/gateway/default/oozie/
... View more
05-11-2016
03:10 PM
1 Kudo
If your users belong to different branches of the LDAP directory you'll need to use Advanced LDAP Authentication in the Knox topology. Review the linked doc to understand the limitations of userDnTemplate, and refer to the "Example provider config" section to understand the additional properties available. There should be log messages in gateway.log corresponding to the 401. Those might provide more insight into the reason for the error, so please provide them if possible.
... View more
05-10-2016
07:06 PM
2 Kudos
Unfortunately an application that uses a credential store will always need at least one cleartext password so it can unlock that credential store. This can be hardcoded into the binary or stored in a file. The ranger-policymgr-ssl.xml files contain the passwords to unlock the keystore and truststore used by Ranger agents. Obviously this file should be secured with the minimal permissions necessary. Other passwords in Ranger config files are stored in a credential store (jceks file), so they don't show up in plaintext in the configs. The credential stores typically use the default keystore password, so the files themselves should still be protected by appropriate file permissions. (thanks to @lmccay for clarifying the last part for me)
... View more
05-02-2016
06:25 PM
8 Kudos
This error occurs because the md5 digest became deprecated in favor of sha256 in recent versions of Java. It is fixed in the next SmartSense HST release. The workaround is somewhat complicated, so we recommend you open a support case for assistance. If you wish to attempt it yourself, here is the process... WORKAROUND: Change the default digest to “sha256” instead of “md5” and then regenerate all certificates. Follow these steps:
Use Ambari to stop the SmartSense service (all components) Backup the old server keys on the HST Server host: cp -rp /var/lib/smartsense/hst-server/keys /var/lib/smartsense/hst-server/keys.backup On the HST Server host, clean out the old keys:i. rm -f /var/lib/smartsense/hst-server/keys/ca.key rm -f /var/lib/smartsense/hst-server/keys/*.csr rm -f /var/lib/smartsense/hst-server/keys/*.crt rm -rf /var/lib/smartsense/hst-server/keys/db/* mkdir /var/lib/smartsense/hst-server/keys/db/newcerts touch /var/lib/smartsense/hst-server/keys/db/index.txt echo 01 > /var/lib/smartsense/hst-server/keys/db/serial Edit file /var/lib/smartsense/hst-server/keys/ca.config and change line "default_md = md5" to "default_md = sha256" On all HST Agent hosts, clean out the old keys: rm -f /var/lib/smartsense/hst-agent/keys/* If using the HST Gateway:
Stop the gateway: hst gateway stop Repeat steps 3 & 4 for the files under /var/lib/smartsense/hst-gateway/keys/ on the HST Gateway host Repeat step 5 for the files under /var/lib/smartsense/hst-gateway-client/keys on all HST Server host(s) Start the gateway: hst gateway start Use Ambari to start the SmartSense service (all components) Verify both Ambari SmartSense service and SmartSense view shows correct number of agents registered. NOTE: Turning off two-way SSL is NOT recommended (the error message has been improved in newer versions of HST), and the issue occurs on hosts with following JDK versions or newer: JDK Family Versions Oracle 1.8.0_71 Oracle 1.7.0_95 Oracle 1.6.0_111 OpenJDK 1.7.0_45 OpenJDK 1.8.0_40
... View more
05-02-2016
01:41 PM
Also which version of HDP, and whether the script succeeds using MR.
... View more
04-05-2016
06:50 PM
1 Kudo
You mention Knox 0.6.0 however the path shows 0.5.0. For Java it will also help to know whether you are using Oracle or OpenJDK. To address these questions, please also provide the output of: hdp-select versions
hdp-select status knox-server
rpm -qa | grep knox
java -version
... View more
03-29-2016
06:36 PM
You may also need to add or modify some gateway properties, such as gateway.frontend.url (undefined by default), to accommodate the load balancer.
... View more