Member since
09-11-2015
115
Posts
126
Kudos Received
15
Solutions
05-19-2016
03:59 AM
10 Kudos
SmartSense is an excellent tool for keeping your cluster running at optimal efficiency while maintaining operational best practices. We’ve combined knowledge from the greatest minds in the industry, and use it to analyze metadata about your cluster from the bundles you submit. Have you ever wondered exactly what data you’re sending to SmartSense? The SmartSense Admin Guide contains a high-level description (see What’s Included in a Bundle), but for the greatest understanding you should extract a bundle and explore it with your own eyes! Obtain a Bundle There are two types of bundles... Analysis Bundle: configs and metrics for all services on all hosts Troubleshooting Bundle: Analysis Bundle + logs for selected service(s) To begin, let’s capture an Analysis Bundle: ...and download an unencrypted copy to our local machine: The bundle is a gzipped tar file that contains a gzipped tar file from each host running the HST Agent. In the following examples, notice the bundle variable excludes the .tgz extension. Linux or OS X users can extract everything with a bash for-loop: bundle=a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35
tar zxf $bundle.tgz && cd $bundle && for i in * ; do tar zxf "$i" ; rm "$i" ; done Windows users can use a similar process with a utility like 7-Zip. Assuming 7z.exe is in your path: setlocal
set bundle=a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35
7z x %bundle%.tgz && 7z x %bundle%.tar && rm %bundle%.tar && cd %bundle%
for %i in (*.tgz) do 7z x %i && rm %i
for %i in (*.tar) do 7z x %i && rm %i
endlocal Exploring Bundle Contents NOTE: Example console output was obtained from a SmartSense 1.2.1 bundle and may differ in future versions. The output is also truncated for brevity. You’re encouraged to follow along with a bundle from your own cluster. For a convenient overview of the bundle contents, use the tree command, limited to a depth of 3: MyLaptop:a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35 myuser$ tree -L 3
.
├── meta
│ └── metadata.json
├── mgmt.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35
│ ├── os
│ │ ├── logs
│ │ └── reports
│ └── services
│ ├── AMBARI
│ ├── AMS
│ ├── HDFS
│ ├── HST
│ ├── MR
│ ├── TEZ
│ ├── YARN
│ └── ZK
├── node1.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35
│ ├── os
│ │ ├── logs
│ │ └── reports
│ └── services
│ ├── AMBARI
...
41 directories, 4 files At the root of the bundle, we see a ‘meta’ folder, and a folder per host. The meta folder contains some bundle metadata. Note that domain names are anonymized (my cluster uses example.com). Let’s take a look inside the two subfolders (os & services) per host... Bundle Contents: OS The os folder contains a couple system logs and a variety of reports. Here’s a sample from my cluster: MyLaptop:node1.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35 myuser$ tree -I "blockdevices" os/
os/
├── logs
│ └── messages.log
└── reports
├── chkconfig.txt
├── cpu_info.txt
├── dns_lookup.txt
├── dstat.txt
├── error_dmesg.txt
├── file_max.txt
...
5 directories, 49 files Most of the filenames here are self-explanatory. Reports generally contain output from system commands or the /proc filesystem. These system characteristics serve as valuable inputs for determining your cluster’s optimal configuration. Bundle Contents: Services Within each host folder, the services subfolder contains configurations and reports for every HDP service on that host. Here’s an example from my node1: MyLaptop:node1.zoeocuz.com-a-00000000-c-00000000_supportlab_0_2016-05-17_23-05-35 myuser$ tree -L 3 services
services
├── AMBARI
│ ├── conf
│ │ ├── ambari-agent.ini
│ │ ├── ambari-agent.pid
│ │ └── logging.conf.sample
│ └── reports
│ ├── ambari_rpm.txt
│ ├── postgres_rpm.txt
│ ├── postmaster.txt
│ └── process_info.txt
├── AMS
│ ├── conf
│ │ ├── ams-env.sh
│ │ ├── metric_groups.conf
│ │ └── metric_monitor.ini
│ ├── metrics
│ │ └── ams
│ └── reports
│ └── ams_rpm.txt
...
32 directories, 157 files The conf folders are copied from their respective locations under /etc/ (or /var/run for the .pid files). Reports contain JMX metrics and output from CLI commands, such as the YARN application list. You can explore the contents using text processing commands like grep, sort, and uniq, which might be sufficient for your needs. Another option is to use a text editor with a file-tree view. Text Editors Here are three open source text editors that integrate a file-tree for easy navigation (see attachments at the bottom for full-size images)... TextMate 2 (OS X): Notepad++ (Windows): Vim + NerdTree (Linux, OS X): Anonymization Rules The default set of anonymization rules will protect IP addresses, hostnames, and password fields in standard HDP configuration files. You can modify or add anonymization rules if desired. Watch for a future HCC article where we take a deep dive into anonymization. After making any changes to the anonymization ruleset, it is wise to verify everything is still functioning as intended. This can be accomplished by downloading an unencrypted bundle and examining its contents using the methods described above. Until Next Time... Keeping in mind that we only looked within a single host folder, and that my demo cluster has the minimum number of components for a functioning HDP stack, we can see that every bundle is packed with useful information.
Knowing exactly what’s included in a SmartSense bundle provides peace of mind, and the trust that your confidential data remains secure and private.
... View more
Labels:
03-25-2017
02:44 PM
@Alex Miller Am facing an issue, where irrespective of users defined for the queue all the users were able to run jobs in the queue. And i came across this article and tried to deny all users in root queue by entering space in root queue submit applications from 'Ambari Yarn queue manager' but in submit applications space character is not accepting. Could you kindly let us know, how to use space in submit_applications to deny access to users.
... View more
04-02-2016
09:12 PM
Any idea how we can pass multiple parameters to the curl command? eg: I would want to specify an input as well as output file as a parameter to my query.pig file Also, I have a jar that I register within my pig script. How to use that with curl command? eg register "/home/test/my.jar" A = load '$input/pig' using pigstorage() Store A into '$output' ------------------------------------ Above I am trying to pass 2 parameters: input and output as well as trying to register a jar that is there on my local. Any idea how to go about it? Again, I am trying to run the above script via curl Thank you
... View more
10-30-2015
04:27 AM
6 Kudos
Authorization Models applicable to the Hive CLI
Hive provides a few different authorization models plus Apache Ranger, as described in the Hive Authorization section of the HDP System Administration Guide. Hive CLI is subject to the following two models-- Hive default (Insecure) - Any user can run GRANT statements - DO NOT USE Storage-based (Secure) - Authorization at the level of databases/tables/partitions, based on HDFS permissions (and ACLs in HDP 2.2.0+)
Frequently Asked Questions about Hive CLI Security
Can I set restrictive permissions on the hive executable (shell wrapper script) and hive-cli jar?No, components such as Sqoop and Oozie may fail. Additionally, a user can run their own copy of the hive client from anywhere they can set execution privileges. To avoid this limitation, migrate to the Beeline CLI and utilize HiveServer2, and restrict access to the cluster through a gateway such as Knox. Can Ranger be used to enforce permissions for Hive CLI users?HDFS policies can be created in Ranger, and the Hive Metastore Server can enforce HDFS permissions (and ACLs in HDP 2.2+) using storage-based authorization. The user executing hive-cli can bypass authorization mechanisms by overriding properties on the command line, so the Ranger Hive plugin does not enforce permissions for Hive CLI users.
Related Tutorials Secure JDBC and ODBC Clients’ Access to HiveServer2 using Apache Knox Manage Security Policy for Hive & HBase with Knox & Ranger
... View more
Labels:
10-29-2015
09:10 PM
4 Kudos
container-executor.cfg YARN containers in a secure cluster use the operating system facilities to offer execution isolation for containers. Secure containers execute under the credentials of the job user. The operating system enforces access restriction for the container. The container must run as the user that submitted the application. Therefore it is recommended to never submit jobs from a superuser account (HDFS or Linux) when LinuxContainerExecutor is used. To prevent superusers from submitting jobs, the container executor configuration (/etc/hadoop/conf/container-executor.cfg) includes the properties banned.users and min.user.id. Attempting to submit a job that violates either of these settings will result in an error indicating the AM container failed to launch:
INFO org.apache.hadoop.yarn.server.resourcemanager.rmapp.RMAppImpl:
Application application_1234567890123_4567 failed 2 times due to AM
Container for appattempt_1234567890123_4567_000002 exited with exitCode: -1000 Followed by one of these two diagnostic messages: Diagnostics: Application application_1234567890123_4567 initialization failed (exitCode=255) with output:
Requested user hdfs is not whitelisted and has id 507,which is below the minimum allowed 1000
Diagnostics: Application application_1234567890123_4567 initialization failed (exitCode=255) with output: Requested user hdfs is banned Although it is possible to modify these properties, leaving the default values is recommended for security reasons. yarn-site.xml
yarn.nodemanager.linux-container-executor.group - A special group (e.g. hadoop) with executable permissions for the container executor, of which the NodeManager Unix user is the group member and no ordinary application user is. If any application user belongs to this special group, security will be compromised. This special group name should be specified for the configuration property. Learn more about YARN Secure Containers from the Apache Hadoop docs.
... View more
Labels: