Member since
09-11-2015
115
Posts
126
Kudos Received
15
Solutions
03-10-2017
06:11 AM
This is very helpful, thank you. Can you please advise on where to find the Jetty logs, in case of issues with the web application itself?
... View more
05-19-2016
03:59 AM
10 Kudos
SmartSense is an excellent tool for keeping your cluster running at optimal efficiency while maintaining operational best practices. We’ve combined knowledge from the greatest minds in the industry, and use it to analyze metadata about your cluster from the bundles you submit. Have you ever wondered exactly what data you’re sending to SmartSense? The SmartSense Admin Guide contains a high-level description (see What’s Included in a Bundle), but for the greatest understanding you should extract a bundle and explore it with your own eyes! Obtain a Bundle There are two types of bundles... Analysis Bundle: configs and metrics for all services on all hosts Troubleshooting Bundle: Analysis Bundle + logs for selected service(s) To begin, let’s capture an Analysis Bundle: ...and download an unencrypted copy to our local machine: The bundle is a gzipped tar file that contains a gzipped tar file from each host running the HST Agent. In the following examples, notice the bundle variable excludes the .tgz extension. Linux or OS X users can extract everything with a bash for-loop: bundle=a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35
tar zxf $bundle.tgz && cd $bundle && for i in * ; do tar zxf "$i" ; rm "$i" ; done Windows users can use a similar process with a utility like 7-Zip. Assuming 7z.exe is in your path: setlocal
set bundle=a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35
7z x %bundle%.tgz && 7z x %bundle%.tar && rm %bundle%.tar && cd %bundle%
for %i in (*.tgz) do 7z x %i && rm %i
for %i in (*.tar) do 7z x %i && rm %i
endlocal Exploring Bundle Contents NOTE: Example console output was obtained from a SmartSense 1.2.1 bundle and may differ in future versions. The output is also truncated for brevity. You’re encouraged to follow along with a bundle from your own cluster. For a convenient overview of the bundle contents, use the tree command, limited to a depth of 3: MyLaptop:a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35 myuser$ tree -L 3
.
├── meta
│ └── metadata.json
├── mgmt.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35
│ ├── os
│ │ ├── logs
│ │ └── reports
│ └── services
│ ├── AMBARI
│ ├── AMS
│ ├── HDFS
│ ├── HST
│ ├── MR
│ ├── TEZ
│ ├── YARN
│ └── ZK
├── node1.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35
│ ├── os
│ │ ├── logs
│ │ └── reports
│ └── services
│ ├── AMBARI
...
41 directories, 4 files At the root of the bundle, we see a ‘meta’ folder, and a folder per host. The meta folder contains some bundle metadata. Note that domain names are anonymized (my cluster uses example.com). Let’s take a look inside the two subfolders (os & services) per host... Bundle Contents: OS The os folder contains a couple system logs and a variety of reports. Here’s a sample from my cluster: MyLaptop:node1.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35 myuser$ tree -I "blockdevices" os/
os/
├── logs
│ └── messages.log
└── reports
├── chkconfig.txt
├── cpu_info.txt
├── dns_lookup.txt
├── dstat.txt
├── error_dmesg.txt
├── file_max.txt
...
5 directories, 49 files Most of the filenames here are self-explanatory. Reports generally contain output from system commands or the /proc filesystem. These system characteristics serve as valuable inputs for determining your cluster’s optimal configuration. Bundle Contents: Services Within each host folder, the services subfolder contains configurations and reports for every HDP service on that host. Here’s an example from my node1: MyLaptop:node1.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35 myuser$ tree -L 3 services
services
├── AMBARI
│ ├── conf
│ │ ├── ambari-agent.ini
│ │ ├── ambari-agent.pid
│ │ └── logging.conf.sample
│ └── reports
│ ├── ambari_rpm.txt
│ ├── postgres_rpm.txt
│ ├── postmaster.txt
│ └── process_info.txt
├── AMS
│ ├── conf
│ │ ├── ams-env.sh
│ │ ├── metric_groups.conf
│ │ └── metric_monitor.ini
│ ├── metrics
│ │ └── ams
│ └── reports
│ └── ams_rpm.txt
...
32 directories, 157 files The conf folders are copied from their respective locations under /etc/ (or /var/run for the .pid files). Reports contain JMX metrics and output from CLI commands, such as the YARN application list. You can explore the contents using text processing commands like grep, sort, and uniq, which might be sufficient for your needs. Another option is to use a text editor with a file-tree view. Text Editors Here are three open source text editors that integrate a file-tree for easy navigation (see attachments at the bottom for full-size images)... TextMate 2 (OS X): Notepad++ (Windows): Vim + NerdTree (Linux, OS X): Anonymization Rules The default set of anonymization rules will protect IP addresses, hostnames, and password fields in standard HDP configuration files. You can modify or add anonymization rules if desired. Watch for a future HCC article where we take a deep dive into anonymization. After making any changes to the anonymization ruleset, it is wise to verify everything is still functioning as intended. This can be accomplished by downloading an unencrypted bundle and examining its contents using the methods described above. Until Next Time... Keeping in mind that we only looked within a single host folder, and that my demo cluster has the minimum number of components for a functioning HDP stack, we can see that every bundle is packed with useful information.
Knowing exactly what’s included in a SmartSense bundle provides peace of mind, and the trust that your confidential data remains secure and private.
... View more
Labels:
03-29-2016
06:30 PM
This occurs on hosts with following JDK versions or newer: JDK Family Versions Oracle 1.8.0_71 Oracle 1.7.0_95 Oracle 1.6.0_111 OpenJDK 1.7.0_45 OpenJDK 1.8.0_40 It is also recommended to upgrade to SmartSense 1.2.1+ while applying these changes.
... View more
11-11-2015
04:16 PM
3 Kudos
SYMPTOM: For a Capacity Scheduler queue that specifies some groups in its acl_submit_applications property, a user who is not a member of any of those groups is still able to submit jobs to the queue.
ROOT CAUSE: By default the root queue is allow-all, which results in all child queues defaulting to allow-all. The acl_submit_applications property is described as: The ACL which controls who can submit applications to the given queue. If the given user/group has necessary ACLs on the given queue or one of the parent queues in the hierarchy they can submit applications. ACLs for this property are inherited from the parent queue if not specified. SOLUTION: Set the root queue to deny-all, by entering a "space" for the value. Then set who to allow in the ACL for each child queue. For example: yarn.scheduler.capacity.root.acl_submit_applications=
yarn.scheduler.capacity.root.default.acl_administer_jobs=appdev
yarn.scheduler.capacity.root.default.acl_submit_applications=appdev
yarn.scheduler.capacity.root.system.acl_administer_jobs=dbadmin
yarn.scheduler.capacity.root.system.acl_submit_applications=dbadmin
... View more
Labels:
11-10-2015
04:56 AM
3 Kudos
SYMPTOM: Attempting to submit a Pig job via WebHCat that defines parameter(s) for substitution as command-line arguments results in an "incorrect usage" message and job does not run. Doing the same through Hue results in an "undefined parameter" message and job does not run.
ROOT CAUSE: If the parameter is passed to curl as a single argument (-d 'arg=-param paramName=paramValue') it is interpreted incorrectly by Pig. Submitting the parameter via Hue as a single argument has the same unwanted effect. WORKAROUND: Pass the parameter as two arguments: curl -d file=myScript.pig -d 'arg=-param' -d 'arg=paramName=paramValue' 'http://<server>:50111/templeton/v1/pig' To achieve the same using Hue, pass two arguments in sequence (refer to attached image for an example). RESOLUTION: WebHCat works as designed. This issue is a limitation of curl. The Hue workaround is good for a single parameter, however multiple parameters may not work.
... View more
10-30-2015
04:27 AM
6 Kudos
Authorization Models applicable to the Hive CLI
Hive provides a few different authorization models plus Apache Ranger, as described in the Hive Authorization section of the HDP System Administration Guide. Hive CLI is subject to the following two models-- Hive default (Insecure) - Any user can run GRANT statements - DO NOT USE Storage-based (Secure) - Authorization at the level of databases/tables/partitions, based on HDFS permissions (and ACLs in HDP 2.2.0+)
Frequently Asked Questions about Hive CLI Security
Can I set restrictive permissions on the hive executable (shell wrapper script) and hive-cli jar?No, components such as Sqoop and Oozie may fail. Additionally, a user can run their own copy of the hive client from anywhere they can set execution privileges. To avoid this limitation, migrate to the Beeline CLI and utilize HiveServer2, and restrict access to the cluster through a gateway such as Knox. Can Ranger be used to enforce permissions for Hive CLI users?HDFS policies can be created in Ranger, and the Hive Metastore Server can enforce HDFS permissions (and ACLs in HDP 2.2+) using storage-based authorization. The user executing hive-cli can bypass authorization mechanisms by overriding properties on the command line, so the Ranger Hive plugin does not enforce permissions for Hive CLI users.
Related Tutorials Secure JDBC and ODBC Clients’ Access to HiveServer2 using Apache Knox Manage Security Policy for Hive & HBase with Knox & Ranger
... View more
Labels:
10-29-2015
09:10 PM
4 Kudos
container-executor.cfg YARN containers in a secure cluster use the operating system facilities to offer execution isolation for containers. Secure containers execute under the credentials of the job user. The operating system enforces access restriction for the container. The container must run as the user that submitted the application. Therefore it is recommended to never submit jobs from a superuser account (HDFS or Linux) when LinuxContainerExecutor is used. To prevent superusers from submitting jobs, the container executor configuration (/etc/hadoop/conf/container-executor.cfg) includes the properties banned.users and min.user.id. Attempting to submit a job that violates either of these settings will result in an error indicating the AM container failed to launch:
INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
Application application_1234567890123_4567 failed 2 times due to AM
Container for appattempt_1234567890123_4567_000002 exited with exitCode: -1000 Followed by one of these two diagnostic messages: Diagnostics: Application application_1234567890123_4567 initialization failed (exitCode=255) with output:
Requested user hdfs is not whitelisted and has id 507,which is below the minimum allowed 1000
Diagnostics: Application application_1234567890123_4567 initialization failed (exitCode=255) with output: Requested user hdfs is banned Although it is possible to modify these properties, leaving the default values is recommended for security reasons. yarn-site.xml
yarn.nodemanager.linux-container-executor.group - A special group (e.g. hadoop) with executable permissions for the container executor, of which the NodeManager Unix user is the group member and no ordinary application user is. If any application user belongs to this special group, security will be compromised. This special group name should be specified for the configuration property. Learn more about YARN Secure Containers from the Apache Hadoop docs.
... View more
Labels: