Member since
08-31-2015
15
Posts
21
Kudos Received
3
Solutions
09-25-2015
12:41 AM
1 Kudo
I need to create some automation views for creating accounts in Ambari. Are there any views that are already created that do this?
... View more
Labels:
- Labels:
-
Apache Ambari
09-25-2015
12:38 AM
3 Kudos
Is the Hortonworks Sandbox available on Amazon Web Services as well?
... View more
Labels:
- Labels:
-
Hortonworks Data Platform (HDP)
09-25-2015
12:22 AM
2 Kudos
What is the YARN's Capacity Scheduler? YARN's CapacityScheduler is designed to run Hadoop applications in a shared, multi-tenant cluster while maximizing the throughput and the utilization of the cluster. Traditionally each organization has it own private set of compute resources that have sufficient capacity to meet the organization's SLA. This generally leads to poor average utilization. Also there is heavy overhead of managing multiple independent clusters. Sharing clusters between organizations allows economies of scale . However, organizations are concerned about sharing a cluster in the fear of not getting enaough available resources that are critical to meet their SLAs. The CapacityScheduler is designed to allow sharing a large cluster while giving each organization capacity guarantees. There is an added benefit that an organization can access any excess capacity not being used by others. This provides elasticity for the organizations in a cost-effective manner. Sharing clusters across organizations necessitates strong support for multi-tenancy since each organization must be guaranteed capacity and safe-guards to ensure the shared cluster is impervious to single rogue application or user or sets thereof. The CapacityScheduler provides a stringent set of limits to ensure that a single application or user or queue cannot consume disproportionate amount of resources in the cluster. Also, the CapacityScheduler provides limits on initialized/pending applications from a single user and queue to ensure fairness and stability of the cluster. The primary abstraction provided by the CapacityScheduler is the concept of queues. These queues are typically setup by administrators to reflect the economics of the shared cluster. To provide further control and predictability on sharing of resources, the CapacityScheduler supports hierarchical queues to ensure resources are shared among the sub-queues of an organization before other queues are allowed to use free resources, there-by providing affinity for sharing free resources among applications of a given organization. Configuring the Capacity Scheduler For rest of the tutorial we will use the Ambari hosted on the Hortonworks Sandbox. After you spin up the Hortonworks Sandbox, login to Ambari. The default username and password is admin / admin .
After you Login, you will see the Dashboard. This is an unified view of the state of your cluster. You can drill into specify service dashboard and configuration. Let’s dive into YARN dashboard by selecting Yarn from the left-side bar or the drop down menu. We will start updating the configuration for Yarn Capacity Scheduling policies. Scroll down to the Scheduler section of the page. The default capacity scheduling policy just has one queue. Let check out the scheduling policy visually. Scroll up to the top of the page and click on quick links. Then select ResourceManager UI from the dropdown. As you can see below we just have the default policy. Let’s change the capacity scheduling policy to where we have seperate queues and policies for Engineering, Marketing and Support departments: <code>yarn.scheduler.capacity.maximum-am-resource-percent=0.2
yarn.scheduler.capacity.maximum-applications=10000
yarn.scheduler.capacity.node-locality-delay=40
yarn.scheduler.capacity.root.Engineering.Development.acl_administer_jobs=*
yarn.scheduler.capacity.root.Engineering.Development.acl_administer_queue=*
yarn.scheduler.capacity.root.Engineering.Development.acl_submit_applications=*
yarn.scheduler.capacity.root.Engineering.Development.capacity=20
yarn.scheduler.capacity.root.Engineering.Development.minimumaximum-capacity=100
yarn.scheduler.capacity.root.Engineering.Development.state=RUNNING
yarn.scheduler.capacity.root.Engineering.Development.user-limit-factor=1
yarn.scheduler.capacity.root.Engineering.QE.acl_administer_jobs=*
yarn.scheduler.capacity.root.Engineering.QE.acl_administer_queue=*
yarn.scheduler.capacity.root.Engineering.QE.acl_submit_applications=*
yarn.scheduler.capacity.root.Engineering.QE.capacity=80
yarn.scheduler.capacity.root.Engineering.QE.maximum-capacity=90
yarn.scheduler.capacity.root.Engineering.QE.state=RUNNING
yarn.scheduler.capacity.root.Engineering.QE.user-limit-factor=1
yarn.scheduler.capacity.root.Engineering.acl_administer_jobs=*
yarn.scheduler.capacity.root.Engineering.acl_administer_queue=*
yarn.scheduler.capacity.root.Engineering.acl_submit_applications=*
yarn.scheduler.capacity.root.Engineering.capacity=60
yarn.scheduler.capacity.root.Engineering.maximum-capacity=100
yarn.scheduler.capacity.root.Engineering.queues=Development,QE
yarn.scheduler.capacity.root.Engineering.state=RUNNING
yarn.scheduler.capacity.root.Engineering.user-limit-factor=1
yarn.scheduler.capacity.root.Marketing.Advertising.acl_administer_jobs=*
yarn.scheduler.capacity.root.Marketing.Advertising.acl_administer_queue=*
yarn.scheduler.capacity.root.Marketing.Advertising.acl_submit_applications=*
yarn.scheduler.capacity.root.Marketing.Advertising.capacity=30
yarn.scheduler.capacity.root.Marketing.Advertising.maximum-capacity=40
yarn.scheduler.capacity.root.Marketing.Advertising.state=STOPPED
yarn.scheduler.capacity.root.Marketing.Advertising.user-limit-factor=1
yarn.scheduler.capacity.root.Marketing.Sales.acl_administer_jobs=*
yarn.scheduler.capacity.root.Marketing.Sales.acl_administer_queue=*
yarn.scheduler.capacity.root.Marketing.Sales.acl_submit_applications=*
yarn.scheduler.capacity.root.Marketing.Sales.capacity=70
yarn.scheduler.capacity.root.Marketing.Sales.maximum-capacity=80
yarn.scheduler.capacity.root.Marketing.Sales.minimum-user-limit-percent=20
yarn.scheduler.capacity.root.Marketing.Sales.state=RUNNING
yarn.scheduler.capacity.root.Marketing.Sales.user-limit-factor=1
yarn.scheduler.capacity.root.Marketing.acl_administer_jobs=*
yarn.scheduler.capacity.root.Marketing.acl_submit_applications=*
yarn.scheduler.capacity.root.Marketing.capacity=10
yarn.scheduler.capacity.root.Marketing.maximum-capacity=40
yarn.scheduler.capacity.root.Marketing.queues=Sales,Advertising
yarn.scheduler.capacity.root.Marketing.state=RUNNING
yarn.scheduler.capacity.root.Marketing.user-limit-factor=1
yarn.scheduler.capacity.root.Support.Services.acl_administer_jobs=*
yarn.scheduler.capacity.root.Support.Services.acl_administer_queue=*
yarn.scheduler.capacity.root.Support.Services.acl_submit_applications=*
yarn.scheduler.capacity.root.Support.Services.capacity=80
yarn.scheduler.capacity.root.Support.Services.maximum-capacity=100
yarn.scheduler.capacity.root.Support.Services.minimum-user-limit-percent=20
yarn.scheduler.capacity.root.Support.Services.state=RUNNING
yarn.scheduler.capacity.root.Support.Services.user-limit-factor=1
yarn.scheduler.capacity.root.Support.Training.acl_administer_jobs=*
yarn.scheduler.capacity.root.Support.Training.acl_administer_queue=*
yarn.scheduler.capacity.root.Support.Training.acl_submit_applications=*
yarn.scheduler.capacity.root.Support.Training.capacity=20
yarn.scheduler.capacity.root.Support.Training.maximum-capacity=60
yarn.scheduler.capacity.root.Support.Training.state=RUNNING
yarn.scheduler.capacity.root.Support.Training.user-limit-factor=1
yarn.scheduler.capacity.root.Support.acl_administer_jobs=*
yarn.scheduler.capacity.root.Support.acl_administer_queue=*
yarn.scheduler.capacity.root.Support.acl_submit_applications=*
yarn.scheduler.capacity.root.Support.capacity=30
yarn.scheduler.capacity.root.Support.maximum-capacity=100
yarn.scheduler.capacity.root.Support.queues=Training,Services
yarn.scheduler.capacity.root.Support.state=RUNNING
yarn.scheduler.capacity.root.Support.user-limit-factor=1
yarn.scheduler.capacity.root.acl_administer_queue=*
yarn.scheduler.capacity.root.capacity=100
yarn.scheduler.capacity.root.queues=Support,Marketing,Engineering
yarn.scheduler.capacity.root.unfunded.capacity=50
Copy and paste the above policy in the Capacity Scheduler textbox: Click Save and confirm on the dialog box: At this point the, the configuration is saved but we still need to restart the affected components by the configuration change as indicated in the orange band below: Also note that there is now a new version of the configuration as indicated by the green Current label. Let’s restart the daemons by clicking Restart All . Wait for the restart to complete: and then goto the browser tab with the Capacity Scheduler policy and refresh the page. Voila! There’s our new policy:
... View more
Labels:
09-25-2015
12:03 AM
7 Kudos
Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides central security policy administration across the core enterprise security requirements of authorization, accounting and data protection. Apache Ranger already extends baseline features for coordinated enforcement across Hadoop workloads from batch, interactive SQL and real–time in Hadoop. In this tutorial, we cover using Apache Ranger for HDP 2.3 to secure your Hadoop environment. We will walkthrough the following topics:
Support for Knox authorization and audit Command line policies in Hive Command line policies in HBase REST APIs for policy manager Prerequisite The only prerequisite for this tutorial is that you have Hortonworks Sandbox. Once you have Hortonworks Sandbox, login through SSH: Starting Knox Service and Demo LDAP Service From the Ambari console at http://localhost:8080/ (username and password is admin and admin respectively), select Knox from the list of Services on the left-hand side of the page.
Then click on Service Actions from the top right hand side of the page click on Start ![]http://www.dropbox.com/s/jhb30dgey8m30n6/Screenshot%202015-09-08%2010.27.06.png?dl=1) From the following you can track the start of the Knox service to completion: Then go back to the Service Actions button on the Knox service and click on Start Demo LDAP You can track the start of the Demo LDAP Service from the following screen: Knox access scenarios Check if Ranger Admin console is running, at http://localhost:6080/from your host machine. The username is admin and the password is admin If it is not running you can start from the command line using the command sudo service ranger-admin start Click on sandbox_knox link under Knox section in the main screen of Ranger Administration Portal You can review policy details by a clicking on the policy name. To start testing Knox policies, we would need to turn off the “global knox allow” policy. Locate Sandbox for Guest policy on the Ranger Admin console and edit the policy and enable policy named “Sandbox for Guest” From your local SSHd terminal (not directly on the Sandbox), run this CURL command to access WebHDFS curl -k -u admin:admin-password 'https://127.0.0.1:8443/gateway/knox_sample/webhdfs/v1?op=LISTSTATUS' Go to Ranger Policy Manager tool → Audit screen and check the knox access (denied) being audited. Now let us try the same CURL command using “guest” user credentials from the terminal curl -k -u guest:guest-password 'https://127.0.0.1:8443/gateway/knox_sample/webhdfs/v1?op=LISTSTATUS' <code>{"FileStatuses":{"FileStatus":[{"accessTime":0,"blockSize":0,"childrenNum":0,"fileId":16393,"group":"hadoop","length":0,"modificationTime":1439987528048,"owner":"yarn","pathSuffix":"app-logs","permission":"777","replication":0,"storagePolicy":0,"type":"DIRECTORY"},{"accessTime":0,"blockSize":0,"childrenNum":4,"fileId":16389,"group":"hdfs","length":0,"modificationTime":1439987809562,"owner":"hdfs","pathSuffix":"apps","permission":"755","replication":0,"storagePolicy":0,"type":"DIRECTORY"},{"accessTime":0,"blockSize":0,"childrenNum":1,"fileId":17000,"group":"hdfs","length":0,"modificationTime":1439989173392,"owner":"hdfs","pathSuffix":"demo","permission":"755","replication":0,"storagePolicy":0,"type":"DIRECTORY"},{"accessTime":0,"blockSize":0,"childrenNum":1,"fileId":16398,"group":"hdfs","length":0,"modificationTime":1439987529660,"owner":"hdfs","pathSuffix":"hdp","permission":"755","replication":0,"storagePolicy":0,"type":"DIRECTORY"},{"accessTime":0,"blockSize":0,"childrenNum":1,"fileId":16394,"group":"hdfs","length":0,"modificationTime":1439987528532,"owner":"mapred","pathSuffix":"mapred","permission":"755","replication":0,"storagePolicy":0,"type":"DIRECTORY"},{"accessTime":0,"blockSize":0,"childrenNum":2,"fileId":16396,"group":"hadoop","length":0,"modificationTime":1439987538099,"owner":"mapred","pathSuffix":"mr-history","permission":"777","replication":0,"storagePolicy":0,"type":"DIRECTORY"},{"accessTime":0,"blockSize":0,"childrenNum":1,"fileId":16954,"group":"hdfs","length":0,"modificationTime":1439988741413,"owner":"hdfs","pathSuffix":"ranger","permission":"755","replication":0,"storagePolicy":0,"type":"DIRECTORY"},{"accessTime":0,"blockSize":0,"childrenNum":3,"fileId":16386,"group":"hdfs","length":0,"modificationTime":1440165443820,"owner":"hdfs","pathSuffix":"tmp","permission":"777","replication":0,"storagePolicy":0,"type":"DIRECTORY"},{"accessTime":0,"blockSize":0,"childrenNum":8,"fileId":16387,"group":"hdfs","length":0,"modificationTime":1439988397561,"owner":"hdfs","pathSuffix":"user","permission":"755","replication":0,"storagePolicy":0,"type":"DIRECTORY"}]}}
We can check the auditing in the Ranger Policy Manager → Audit screen Ranger plugin for Knox intercepts any request made to Knox and enforces policies which are retrieved from the Ranger Administration Portal You can configure the Knox policies in Ranger to restrict to a specific service (WebHDFS, WebHCAT etc) and to a specific user or a group and you can even bind user/group to an ip address Hive grant/revoke permission scenarios Ranger can support import of grant/revoke policies set through command line or Hue for Hive. Ranger can store these policies centrally along with policies created in the administration portal and enforce it in Hive using its plugin. As a first step, disable the global access policy for Hive in Ranger Administration Portal Let us try running a Grant operation using user Hive from the command line. Login into beeline tool using the following command beeline -u "jdbc:hive2://sandbox.hortonworks.com:10000/default" -n it1 -p it1-d org.apache.hive.jdbc.HiveDriver Then issue the GRANT command grant select, update on table xademo.customer_details to user network1; You should see the following error: Let us check the audit log in the Ranger Administration Portal → Audit You can see that access was denied for an admin operation for user it1. We can create a policy in Ranger for user ‘it1’ to be an admin. Create a new policy from the Ranger Admin console and ensure the configuration matches the illustration below We can try the beeline command again, once the policy has been saved. GRANT select, update on table xademo.customer_details to user network1; If the command goes through successfully, you will see the policy created/updated in Ranger Admin Portal → Policy Manager. It checks if there is an existing relevant policy to update, else it creates a new one.
What happened here? Ranger plugin intercepts GRANT/REVOKE commands in Hive and creates corresponding policies in Admin portal. The plugin then uses these policies for enforcing Hive authorization (Hiveserver2) Users can run further GRANT commands to update permissions and REVOKE commands to take away permissions. HBase grant/revoke permission scenarios Ranger can support import of grant/revoke policies set through command line in Hbase. Similar to Hive, Ranger can store these policies as part of the Policy Manager and enforce it in Hbase using its plugin. Before you go further, ensure HBase is running from Ambari – http://127.0.0.1:8080 (username and password are admin ). If it is not go to Service Actions button on top right and Start the service As a first step, let us try running a Grant operation using user Hbase. Disable the public access policy “HBase Global Allow” in Ranger Administration Portal – policy manager Login into HBase shell as ‘it1’ user <code>su - it1
[it1@sandbox ~]$ hbase shell
Run a grant command to give “Read”, “Write”, “Create” access to user mktg1 in table ‘iemployee’ hbase(main):001:0> grant 'mktg1', 'RWC', 'iemployee' you should get a Acess Denied as below: Go to Ranger Administration Portal → Policy Manager and create a new policy to assign “admin” rights to user it1 Save the policy and rerun the HBase command again <code>hbase(main):006:0> grant 'mktg1', 'RWC', 'iemployee'
0 row(s) in 0.8670 seconds
Check HBase policies in the Ranger Policy Administration portal. The grant permissions were added to an existing policy for table ‘iemployee’ that we created in previous step You can revoke the same permissions and the permissions will be removed from Ranger admin. Try this in the same HBase shell <code>hbase(main):007:0> revoke 'mktg1', 'iemployee'
0 row(s) in 0.4330 seconds
You can check the existing policy and see if it has been changed What happened here? Ranger plugin intercepts GRANT/REVOKE commands in Hbase and creates corresponding policies in the Admin portal. The plugin then uses these policies for enforcing authorization Users can run further GRANT commands to update permissions and REVOKE commands to take away permissions. REST APIs for Policy Administration Ranger policies administration can be managed through REST APIs. Users can use the APIs to create or update policies, instead of going into the Administration Portal. Running REST APIs from command line From your local command line shell, run this CURL command. This API will create a policy with the name “hadoopdev-testing-policy2” within the HDFS repository “sandbox_hdfs” <code>curl -i --header "Accept:application/json" -H "Content-Type: application/json" --user admin:admin -X POST http://127.0.0.1:6080/service/public/api/policy -d '{ "policyName":"hadoopdev-testing-policy2","resourceName":"/demo/data/test","description":"Testing policy for /demo/data/test","repositoryName":"sandbox_hdfs","repositoryType":"HDFS","permMapList":[{"userList":["mktg1"],"permList":["Read"]},{"groupList":["IT"],"permList":["Read"]}],"isEnabled":true,"isRecursive":true,"isAuditEnabled":true,"version":"0.1.0","replacePerm":false}'
the policy manager and see the new policy named “hadoopdev-testing-policy2” Click on the policy and check the permissions that has been created The policy id is part of the URL of this policy detail pagehttp://127.0.0.1:6080/index.html#!/hdfs/1/policy/26 We can use the policy id to retrieve or change the policy. Run the below CURL command to get policy details using API curl -i --user admin:admin -X GET http://127.0.0.1:6080/service/public/api/policy/26 What happened here? We created a policy and retrieved policy details using REST APIs. Users can now manage their policies using API tools or applications integrated with the Ranger REST APIs Hopefully, through this whirlwind tour of Ranger, you were introduced to the simplicity and power of Ranger for security administration.
... View more
09-23-2015
08:22 PM
3 Kudos
We are running into an impersonation error while trying to access Ambari Views. 500 User root is not allowed to impersonate admin or ldap user Here’s the background: HDP 2.3 installed via Ambari 2.1.
Ambari setup to authenticate against LDAP Files view setup according to docs.hortonworks.com LDAP user is granted permission to Files view in Ambari LDAP user logs into Ambari and sees the View listed. LDAP user clicks on the view and receives the error. Ensured that Ambari is running as root. I have successfully achieved this functionality locally on a virtual box cluster using HDP 2.2. In my setup, I do not find it necessary to create OS or HDFS users to use the views. I did check the ambari-server logs, but there was only an error indicating the server 500 error. Nothing regarding an ldap or permissions error. Any ideas or guidance on how to solve this is much appreciated.
... View more
Labels:
- Labels:
-
Apache Ambari
09-23-2015
08:04 PM
1 Kudo
We have upgraded to Ambari 2.1.1 and we cannot find where to override the new AMS account name. All of our Service Accounts are defined in AD and we must follow an internal naming convention. Can someone point us where and how to change the AMS account name? Thanks,
... View more
Labels:
- Labels:
-
Apache Ambari
09-23-2015
07:34 PM
I am trying to run the simple yarn application listed here: https://github.com/hortonworks/simple-yarn-app I am a beginner with both Java and Hadoop, and when I try to compile the simple yarn Client file using ' javac ', I get the following error: Client.java:9: error: package org.apache.hadoop.conf does not exist import org.apache.hadoop.conf.Configuration;
The command I am using to compile the file is: javac Client.java
I have Googled this error to see if I could find which JAR file is missing from my classpath, but I couldn't find anything helpful with respect to YARN. Most of the results were related to HBASE , PIG or HIVE . Can someone please point me towards the relevant JAR file I am missing here? Thanks.
... View more
Labels:
- Labels:
-
Apache Hadoop
-
Apache YARN