<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>question Re: Part-1 : Authorization on production cluster in Archives of Support Questions (Read Only)</title>
    <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Part-1-Authorization-on-production-cluster/m-p/154172#M32785</link>
    <description>&lt;P&gt;1. If you need home directories for each of the users, then you need to create home directories. Ownership can be changed from CLI or you can set using Ranger (though I think changing from CLI is better than creating a new profile in Ranger for these things)&lt;/P&gt;&lt;P&gt;2. I am talking about principals here, not service users (like hdfs, hive, yarn) coming from AD (using SSSD or some other such too). So, with you setup local users are create on each node. But they still need to authenticate with your KDC. Ambari can create it for you on the OU once you give the credentials to ambari. &lt;/P&gt;&lt;P&gt;3. Its not mandatory to have /user/&amp;lt;username&amp;gt; for each user. We have cases where BI users how use ODBC/JDBC and don't even have login access to the nodes not needing /user/&amp;lt;username&amp;gt;. Even users that login don't need /user/&amp;lt;username&amp;gt; and could use something like /data/&amp;lt;group&amp;gt;/... to read/write to hdfs. &lt;/P&gt;</description>
    <pubDate>Thu, 23 Jun 2016 23:30:39 GMT</pubDate>
    <dc:creator>ravi1</dc:creator>
    <dc:date>2016-06-23T23:30:39Z</dc:date>
    <item>
      <title>Part-1 : Authorization on production cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Part-1-Authorization-on-production-cluster/m-p/154169#M32782</link>
      <description>&lt;P&gt;Freshly installed HDP 2.4 using Ambari 2.2.2.0 over RHEL7 machines.&lt;/P&gt;&lt;P&gt;I have tried to depict the usage scenario in a hand-drawn diagram, please bear with it &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="5202-usage-scenarios.png" style="width: 572px;"&gt;&lt;img src="https://community.cloudera.com/t5/image/serverpage/image-id/21004iA03E4F7498B88B98/image-size/medium?v=v2&amp;amp;px=400" role="button" title="5202-usage-scenarios.png" alt="5202-usage-scenarios.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;U&gt;Description &lt;/U&gt;&lt;/STRONG&gt;:&lt;/P&gt;&lt;OL&gt;
&lt;LI&gt;&lt;STRONG&gt;&lt;/STRONG&gt;&lt;STRONG&gt;The authentication&lt;/STRONG&gt; i.e the log-in to the Linux machines where the cluster components exist &lt;STRONG&gt;is via some AD-like service&lt;/STRONG&gt;&lt;/LI&gt;&lt;LI&gt;Several roles exist - a Data scientist would load some data and write Pig scripts, ETL guy would import RDBMS schema onto Hive and Ambari admin would start-stop the Ambari server and so on&lt;/LI&gt;&lt;LI&gt;Several users pertaining to one or more roles can exist, &lt;STRONG&gt;all the users will have a Linux account in the AD&lt;/STRONG&gt; in case they wish to log-in via the CLI e.g: Putty. So a Data Scientist would log-on some node using Putty, then load some data using 'hdfs dfs -copyFromLocal' and then execute some pig scripts &lt;STRONG&gt;but he should not be able to CRUD(even see) the directories/data belonging to the ETL Expert or a two Hive users can't see each other's schemas and so on&lt;/STRONG&gt;&lt;/LI&gt;&lt;LI&gt;Since everyone uses a browser, people can access the NN, RM, Job History UI via their Windows/Mac/Linux workstations and will be valid domain users. &lt;STRONG&gt;It's crucial that only 'authorized' people can browse the file system and check the job status, logs and so on e.g: to NO one can just browse the file system without any authentication and authorization&lt;/STRONG&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;STRONG&gt;Questions/Confusions :&lt;/STRONG&gt;&lt;/P&gt;&lt;OL&gt;
&lt;LI&gt;I read several documents - &lt;A target="_blank" href="https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/SecureMode.html#Hadoop_in_Secure_Mode" rel="nofollow noopener noreferrer"&gt;Hadoop in secure mode&lt;/A&gt;, &lt;A target="_blank" href="https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html#HDFS_Permissions_Guide" rel="nofollow noopener noreferrer"&gt;HDFS Permissions Guide&lt;/A&gt;, &lt;A target="_blank" href="https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.0/bk_Security_Guide/content/hdp_security_features.html" rel="nofollow noopener noreferrer"&gt;HDP's Ranger approach&lt;/A&gt; but given a &lt;STRONG&gt;fresh cluster with default settings&lt;/STRONG&gt;, I'm unsure do all of these are required or merely Ranger suffices and HOW to begin&lt;/LI&gt;&lt;LI&gt;Ideally, alike the Linux /home/&amp;lt;username&amp;gt; dir., each user should have his/her own hdfs user space and he/she is restricted to that - can't even read anything outside that&lt;/LI&gt;&lt;LI&gt;Given the existing AD-like systems, I am unsure if the Hadoop Kerberos authentication is required but I think that the Access Control Lists on HDFS would be required but I don't know how to start here&lt;/LI&gt;&lt;LI&gt;The users and roles will be expanding so it should be easy and quick to add/remove/modify/delete users and roles that will be using the Hadoop ecosystem&lt;/LI&gt;&lt;LI&gt;Probably, a naive question - &lt;STRONG&gt;if Ambari/Ambari + Ranger/Ambari + Ranger + Knox is used, is it necessary to do anything at the Linux level ? Is it necessary to go the the hdfs user on CLI and play with ACLs and so on ?&lt;/STRONG&gt;&lt;/LI&gt;&lt;/OL&gt;</description>
      <pubDate>Sun, 18 Aug 2019 12:34:02 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Part-1-Authorization-on-production-cluster/m-p/154169#M32782</guid>
      <dc:creator>kaliyugantagoni</dc:creator>
      <dc:date>2019-08-18T12:34:02Z</dc:date>
    </item>
    <item>
      <title>Re: Part-1 : Authorization on production cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Part-1-Authorization-on-production-cluster/m-p/154170#M32783</link>
      <description>&lt;P&gt;1. Ranger takes care of authorization. You will need something for authentication which is where kerberos and AD can come up. &lt;/P&gt;&lt;P&gt;2. You can set a /user/&amp;lt;username&amp;gt; in hdfs which is a user home directory. You might still need common hdfs directories where collaboration happens. &lt;/P&gt;&lt;P&gt;3. If you have AD, it will have kerberos. If you have write access to an OU in AD, you can create all service level principals there. So, no separate kerberos/KDC will be required. But if you don't want to create service level principals on AD, you can have local kerberos/KDC and have a one way trust with AD. &lt;/P&gt;&lt;P&gt;4. If you enable group based authorizations, adding users could be as easy adding user to the right group and creating a home directory for the user. &lt;/P&gt;&lt;P&gt;5. Ranger can take care of most authorizations and you can avoid working with ACLs.  &lt;/P&gt;</description>
      <pubDate>Thu, 23 Jun 2016 21:34:07 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Part-1-Authorization-on-production-cluster/m-p/154170#M32783</guid>
      <dc:creator>ravi1</dc:creator>
      <dc:date>2016-06-23T21:34:07Z</dc:date>
    </item>
    <item>
      <title>Re: Part-1 : Authorization on production cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Part-1-Authorization-on-production-cluster/m-p/154171#M32784</link>
      <description>&lt;P&gt;@&lt;A href="https://community.hortonworks.com/users/216/ravi.html"&gt;Ravi Mutyala&lt;/A&gt;&lt;/P&gt;&lt;P&gt;Can you elaborate and help me understand :&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;B&gt;You can set a /user/&amp;lt;username&amp;gt; in hdfs which is a user home directory. You might still need common hdfs directories where collaboration happens&lt;/B&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Does this mean that every time a new user is to be added, someone has to log-in as 'hdfs' on cli and create a hdfs dir. /user/&amp;lt;username&amp;gt; and then change the ownership of that dir. ?&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;If you have write access to an OU in AD, you can create all service level principals there&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;An OU can be created but what is 'service level principal' - is it creating groups(or users?) like hadoop, hdfs, hive, yarn,sqoop etc. in that OU manually ? The biggest concern I have here is that &lt;B&gt;during cluster installation, under Misc, the &lt;/B&gt;&lt;STRONG&gt;'Skip group modifications during install' was left unchecked so the users and groups were created locally&lt;/STRONG&gt;, now is it reqd. to change it(how to do that in Ambari) and if yes, will the cluster function properly? Can you provide a documentation link ?&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;If you enable group based authorizations, adding users could be as easy adding user to the right group and creating a home directory for the user&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Unsure if I understood, I believe, the addition of users to a group has to be done at both Linux and HDFS levels, this will still involve creating /user/&amp;lt;username&amp;gt; dir. on HDFS manually. Can you provide some detailed inputs here ?&lt;/P&gt;</description>
      <pubDate>Thu, 23 Jun 2016 22:32:42 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Part-1-Authorization-on-production-cluster/m-p/154171#M32784</guid>
      <dc:creator>kaliyugantagoni</dc:creator>
      <dc:date>2016-06-23T22:32:42Z</dc:date>
    </item>
    <item>
      <title>Re: Part-1 : Authorization on production cluster</title>
      <link>https://community.cloudera.com/t5/Archives-of-Support-Questions/Part-1-Authorization-on-production-cluster/m-p/154172#M32785</link>
      <description>&lt;P&gt;1. If you need home directories for each of the users, then you need to create home directories. Ownership can be changed from CLI or you can set using Ranger (though I think changing from CLI is better than creating a new profile in Ranger for these things)&lt;/P&gt;&lt;P&gt;2. I am talking about principals here, not service users (like hdfs, hive, yarn) coming from AD (using SSSD or some other such too). So, with you setup local users are create on each node. But they still need to authenticate with your KDC. Ambari can create it for you on the OU once you give the credentials to ambari. &lt;/P&gt;&lt;P&gt;3. Its not mandatory to have /user/&amp;lt;username&amp;gt; for each user. We have cases where BI users how use ODBC/JDBC and don't even have login access to the nodes not needing /user/&amp;lt;username&amp;gt;. Even users that login don't need /user/&amp;lt;username&amp;gt; and could use something like /data/&amp;lt;group&amp;gt;/... to read/write to hdfs. &lt;/P&gt;</description>
      <pubDate>Thu, 23 Jun 2016 23:30:39 GMT</pubDate>
      <guid>https://community.cloudera.com/t5/Archives-of-Support-Questions/Part-1-Authorization-on-production-cluster/m-p/154172#M32785</guid>
      <dc:creator>ravi1</dc:creator>
      <dc:date>2016-06-23T23:30:39Z</dc:date>
    </item>
  </channel>
</rss>

