About kgautam

kgautam · ‎07-31-2018

The general perception that one needs to thoroughly know the code base for Understanding and Debugging a HDP service. Do appreciate the fact that HDP services including the execution engine (MR, TEZ, SPARK) are all JVM based processes. JVM along with the Operating Systems provides various knobs to know what the process is doing and performing at run-time. The steps mentioned below can be applied for understanding and debugging any JVM process in general. Lets take an example of how HiveServer2 works assuming one is not much acquainted or has a deep understanding of the code base. We are assuming one knows what the service does, but how it works internally the user is not aware of 1. Process Resource Usage Gives a complete overview of the usage pattern of CPU, memory by the process providing a quick insight of the health of the process 2. How to figure out which all service the process interacts with ADD JMX parameters to the service allowing you to visualize whats happening within the JVM at run-time using jconsole or jvisualm What this ensures is the JVM is broadcasting the metrics on port 7001 and can be connected using jconsole or Jvisualm. There is no security enabled on can add SSL certificates too for authentication What can we infer about the kind of threads we see from jconsole Number of threads have peaked to 461, currently only 170 are active Abandoned connection cleaner is cleaning the lost connections. Kafka- kerberos-refres-thread : As HS2 supports kerberos, TGT needs to be refreshed after the renewal period. It means we dont have to manually refresh the kerberos ticket. HS2 is interacting with Kafka and Atlas as can be seen by the threads above CuratorFramework is the class used for talking to zookeeper means HS2 is interacting with Zookeeper. HS2 is interacting with SOLR and Hadoop services (RMI) HS2 is sending audits to ranger, means HS2 is interacting with Ranger HS2 has some HiveServer2-Handler thread which are used for doing some reading from thrift Socket (this is the thread used for responding to connection) This view provides overall view of whats happening with the JVM. GC : Old gen ParNew GC has been happening . and 10 mins have been spend on minor GC ConcurrentMarkSweep is used for Tenured Gen and 1 min has been spend. If you look VM summary (Total up-time of 3 days 3 hours) you can find when was the VM started and since then 10 mins has been spend on minor and 1 min on Major GC providing a good overview of how the process is tuned to handle GC and is the heap space sufficient. Hence the above information provides the HS2 interaction with 1. Ranger 2. Solr 3. Kafka 4. Hadoop Components (NameNode, DataNode) 5. Kerberos 6. As its running Tez it should be interacting with Resource Manager and Node Manager 7. Heap and GC performance Lets take a further deep dive of how exactly JVM is performing over time. use jvisualm to find real time info of Threads and sampling of CPU and RAM all the above mentioned information can be derived from command line too 1. find the pid of the process ps -eaf | grep hiveserver2 (process name ) will fetch you the pid of the process 2. finding memory usage and gc at real time Total Heap = EDEN SPACE (S0C + S1C + EC ) + Tenured Gen ( OC) Currently Used Heap = EDEN SPACE (S0U + S1U + EU ) + Tenured Gen (OU) YGC = 15769 says till now how many time young GC has been done FGC = 83 How many times the FGC has been done If you see the count increasing too frequently its time to optimize it . JMAP to find the instances available inside the class along with their count and memory footprint Know state of threads running use jstack top -p pid RES = memory it consumes in RAM which should be equal to the heap consumed Find ports on which the process is listening to and which all clients its is connected To find any errors being reported by the service to client scan the network packets tcpdump -D to find the interfaces the machine has tcpdump -i network interface port 10000 -A (tcpdump -i eth0 port 10000 -A ) Scan the port 10000 on which HS2 is listening and look at the packets exchanged to find which client is having the exceptions Any GSS exception or any other exception reported by Server to client can be seen in the packet. To know the cpu consumption and memory consumed by the process top -p pid To know the the disk I/O read write done by the process use iotop -p pid Some OS commands to know about the how the host is doing running the process 1. iostat -c -x 3 dispaly the CPU and Disk IO 2. mpstat -A ALL get utilization of individual CPU 3. iostat -d -x 5 disk utilization 4. ifstat -t -i interface network utilization Take away 1. Any exception you see in the process logs for clients can be tracked in the network packet, hence you dont need to enable debug logs ( use tcpdump ) 2. A process consumes exceptionally high memory under High GC specially frequent FULL and minor GC 3. In hadoop every service client has a retry mechanism (no client fails after one retry) hence search for retries in the log and try to optimize for that 4. jconsole and jvisualm reveals all important info from threads, memory, cpu utilization, GC 5. keep a check on cpu, network, disk and memory utilization of the process to get holistic overview of the process. 6. In case of failure take heap dump and analyse using jhat for deeper debugging. Jhat will generally consume 6X the memory of the size of the dump (20 GB heap dump will need 120 GB of heap for jhat to run, jhat -J-mx20g hive_dump.hprof) 7. Always refer to the code to have a correlation of th process behavior to memory foot print. 8. jmap output will help you understand what data structure is consuming most of heap and you can point to where in the code the data Structure is being used. Jhat will help you get the tree of the data Structure.

kgautam · ‎07-30-2018

Heap resigning and Garbage Collection tuning plays a central role in deciding how healthy and efficiently the cluster resources will be utilized. One of the biggest challenges is to fine tune the heap to make sure neither you are underutilized or over utilizing the resources. The following heap sizing has been made after a in depth analysis of the health of individual services. Do remember this is the base line you can add more heap to your resources depending on the kind of work load one executes. Key take away 1. All the services present in HDP are JVM based and all need appropriate heap sizing and GC tuning 2. HDP and YARN are the base of all the services, hence make sure NN, DN, RM and NN are given sufficient heap. 3. Rule of thumb, heap sizing till 20 - 30 GB doesnt require any special tuning, anything above the value need fine tuning. < 4. 99% of time your cluster is not working efficiently because the services are suffering GC and appear RED in the UI. 5. Most of the Services have short lived connections and interactions hence always provide enough space to the young Generation in the order of 5 - 10 GB (depending on load and concurrency). 6. HiveServer2 , Spark Thrift Server, LLAP sever needs special attention as this service interacts with each and every component in cluster, slowness of any of the component will impact the connection Establishment time of these services. 7. Look for "retries/ retry" in your log, to know which services are slow and fine tune it. HDFS YARN MapReduce2 HBase Oozie Zookeeper Ambari Infra Ambari Metrics - Hbase Atlas Spark Thrift Server1 Hive https://community.hortonworks.com/articles/209789/making-hiveserver2-instance-handle-more-than-500-c.html Ambari and Ambari views should run with a heap of at least 15 GB for it to be fast.

kgautam · ‎07-27-2018

The article provides a optimized mysql config (Usage of enterprise RDBMS is encouraged ) that can be used, in case mysql is used as the backed DB for Ambari, Hive and Ranger. The configuration is good enough to handle 1000 concurrent users on HS2 and Ambari. The config file my.cnf is generally located at /etc/my.cnf my.cnf ****************************************************************************************************************************** # For advice on how to change settings please see # http://dev.mysql.com/doc/refman/5.6/en/server-configuration-defaults.html [mysqld] # # Remove leading # and set to the amount of RAM for the most important data # cache in MySQL. Start at 70% of total RAM for dedicated server, else 10%. # innodb_buffer_pool_size = 128M # # Remove leading # to turn on a very important data integrity option: logging # changes to the binary log between backups. # log_bin # # Remove leading # to set options mainly useful for reporting servers. # The server defaults are faster for transactions and fast SELECTs. # Adjust sizes as needed, experiment to find the optimal values. # join_buffer_size = 128M # sort_buffer_size = 2M # read_rnd_buffer_size = 2M datadir=/var/lib/mysql socket=/var/lib/mysql/mysql.sock max_allowed_packet = 32M max_connections = 2000 open_files_limit = 10000 tmp_table_size = 64M max_heap_table_size = 64M query_cache_type = ON query_cache_size = 128M table_open_cache_instances = 16 back_log = 1000 default_storage_engine = InnoDB innodb-buffer-pool-size = 10G innodb_log_file_size = 1024M innodb_log_buffer_size = 16M innodb_flush_log_at_trx_commit = 0 innodb_flush_method = O_DIRECT innodb_buffer_pool_instances = 4 innodb_thread_concurrency = 10 innodb_checksum_algorithm=crc32 # MYISAM in case there is one key-buffer-size = 64M # Disabling symbolic-links is recommended to prevent assorted security risks symbolic-links=0 # Recommended in standard MySQL setup sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES [mysqld_safe] log-error=/var/log/mysqld.log pid-file=/var/run/mysqld/mysqld.pid ***************************************************************************************************** Key Take Aways : 1.innodb-buffer-pool-size = 10G This is one of the most important parameters, which says how much data can be cached in RAM. 2. mysqldump -u root --all-databases -p > dump.sql will dump all the databases in a file (which can be used to recovery) and in the process will fetch all the data from disk and cache it in RAM, making all queries lightening fast. 3. max_connections = 2000 concurrent connection as the database is connected by all HS2 instances. This can become a limiting factor and may result in connection timeouts for clients connecting to HS2. 4. Do keep in mind the number of connection the mysql can handle can directly impact the number of users the HS2 server can handle concurrently. 5. For a details of all config parameters of mysql kindly follow the amazing file http://www.speedemy.com/files/mysql/my.cnf 6. https://community.hortonworks.com/articles/80635/optimize-ambari-performance-for-large-clusters.html

kgautam · ‎07-15-2018

HiveServer2 is a thrift server which is a thin Service layer to interact with the HDP cluster in a seamless fashion. It supports both JDBC and ODBC driver to provide a SQL layer to query the data. An incoming SQL query is converted to either TEZ or MR job, the results are fetched and send back to client. No heavy lifting work is done inside the HS2. It just acts as a place to have the TEZ/MR driver, scan metadata infor and apply ranger policy for authorization. HiveServer 2 maintains two type pools 1. Connection pool : Any incoming connection is handled by the HiveServer2-handler thread and is kept in this pool which is unlimited in nature but restricted by number of threads available to service an incoming request . An incoming request will not be accepted if there are no free HiveServer2-handler thread to service the request. The total no of threads that can be spawnned with the HS2 is controlled by parameter hive.server2.thrift.max.worker.thread. 2. Session pool (per queue) : this is the number of concurrent sessions that can be active. Whenever a connection in connection pool executes a SQL query, and empty slot in the Session queue is found and the sql statement is executed. The queue is maintained for each queue defined in "Default query queue". In this example only Default Queue is defined with the session per queue =3. More than 3 concuurent queries (Not connections) will have to wait until one of the slots in the session pool is empty. * There can be N connections alive in HS2 but at any point in time it can only execute 3 queries rest of the queries will have to wait. HIveServer 2 Connection overview. 1. HS2 listens on port 10000 and accepts connections from client on this port. 2. HS2 has connections opened to Kafka to write metric and lineage information which is consumed by Ambari metrics service and Atlas. 3. HS2 also has connects to DataNodes directly time to time to service request like "Select * from Table Limit N" 4. HS2 also connects to NodeManager where the TEZ AM are spawned to service the SQL query. 5. In The process it is talking to NameNode and Resource Manager, and a switch during a HA will impact the connection. 6. HiveServer2 overall is talking to ATLAS, SOLR, KAFKA, NameNode, DataNode, ResourceManager, NodeManager, MYSQL(overall) For HiveServer2 to be more interactive and Faster following parameters are set 1. Start Tez Session at Initilization = true: what it does is at time of Bootstrapping(start, restart) of HS2 it allocates TEZ AM for every session. (Default Query Queue = default, Session per queue = 3, hence 3 TEZ AM will be spawnned. If there were 2 queues in Default Query Queue = default, datascience and Session per queue = 3 then 6 TEZ AM will be spawned.) 2. Every TEZ AM can also be forced to spawn fix number of container (pre-warm) at time of HS2 bootstrapping. Hold container to reduce latency = true, Number of container held =3 will ensure Every TEZ AM will hold 2 containers to execute a SQL query . 3. Default Query Queue = default, Session per queue = 3, Hold container to reduce latency = true, Number of container held =3 HS2 will spawn 3 TEZ AM for every session. For Every TEZ AM 2 containers will be spawned. hence overall 9 conatiners (3 TEZ AM + 6 containers) can be seen running in Resource Manager UI. 4. For the TEZ to be more efficient and save the time for bootstrapping following parameters needs to be configured in TEZ configs. The parameter prevents the AM to die immediately after job execution. It wait between min- max to service another query else it kills itself freeing YARN resources. Tez.am.container.reuse.enabled=true Tez.am.container.idle.release-timeout.min.millis=10000 Tez.am.container.idle.release-timeout.max.millis=20000 Every connection has a memory footprint in HS2 as 1. Every connection scans the metastore to know about table. 2. Every connection reserves memory to hold metrics and TEZ driver 3. Every connection will reserve memory to hold the output of TEZ execution Hence if a lot of connections are alive then HS2 might become slower due to lack of heap space, hence depending on the number of connections Heap space needs to be tuned. 40 connections will need somewhere around 16 GB of HS2 heap. Anything more than this need fine tuning of GC and horizontal scalling. The client connections can be disconnected based on the idle state based on the following parameter hive.server2.session.check.interval=6h hive.server2.idle.operation.timeout=5d hive.server2.idle.session.check.operation=true hive.server2.idle.session.timeout=7d HiveServer2 has embedded metastore which Interacts with the RDBMS to store the table schema info. The Database is available to any other service through HiveMetaStore. HiveMetastore is not used by HiveServer2 directly. Problems and solution 1. client have a frequent a timeout : HS2 can never deny a connection hence timeout is a parameter set of client and not on HS2. Most of the Tools have 30 second as timeout, increase it to 90-150 seconds depending on your cluster usage pattern. 2. HS2 becomes unresponsive: This can happen if HS2 is undergoing frequent FULL GC or unavailability of heap. Its time to bump up the heap and horizontally scale. 3. HS2 has large number of connections : HS2 accepts connections on port 10000 and also connects to RDBMS, DataNodes (for direct queries ) and NodeManager (to connect to AM) hence always tries to analyse is it lot of incoming or outgoing connections. 4. As HS2 as a service is interactive its always good to provide HS2 a good size in terms of young Gen. 5. In case SAS is interacting with HS2 a 20 GB heap where 6 -8 GB is provided to youngen is a good practice. Use ParNewGC for young Gen and ConcurrentMarandSweepGC for Tenured gen. To know in details about the connection establishment phase of HiveServer2 https://community.hortonworks.com/articles/202479/hiveserver2-connection-establishment-phase-in-deta.html

kgautam · ‎07-15-2018

Key Take away : 1. For a hiveServer2 client the connection time seen is the total time to interact with AD(TGT + Zookeper Service ticket + HiveServer2 Service Ticket) + Zookeeper + HiveServer2 (mysql + YARN allocation). 2. In case your AD is slow, the hive connection will take long time. 3. time beeline -u "zookeeper String" -e "select 1" can be used to find how much time the beeline is taking. 4. In general it takes 4 to 10 seconds for connection to establish. 5. Neither AD, zookeeper or HiveServer2 can ever deny a connection, the connection time can be more but it can never be denied ideally. 6. Clients can only have a timeout (configurable parameter in most of clients like HUE, SAS, Alation, health check scripts ), as neither zookeeper of HS2 can ever deny a connection. 7. HiveServer2 will try to allocate resources in YARN before acking that it has accepted connection. In case your queue is full The connection time will be impacted. 8. Kindly set hive.server2.tez.initialize.default.sessions=true on HS2 in case you want a connection to be accepted even without allocation YARN resource (As yarn resources are already allocated). 9. If you mention queue name in your JDBC string the connection will be accepted only after allocating resources in YARN: Reasons under which the connection time is more. 1. AD is slow 2. Zookeeper is having too many connection issue or zookeeper is slow 3. HiveServer2 interaction with Mysql is slow 4. Huge GC is happening within HiveServer2 or Zookeeper. 5. HS2 can deny a connection if it has exhausted all its handler-thread. 6. Zookeeper can deny a connection if has reached to its max rate limit from a host. https://community.hortonworks.com/articles/51191/understanding-apache-zookeeper-connection-rate-lim.html 7. Mysql slowness can directly impact the HIveServer2. 8. Mysql is reaching max_connection limit 9. Network is slow. 10. HiveServer2 does a lot of retries for every service it talks to(atlas, solr, kafka, msql, datanode, namenode, RM) keep an eye of any retries thats happening. The various ways to find the time for individual steps are 1. Run Beeline in debug mode. https://community.hortonworks.com/content/supportkb/150574/how-to-enable-debug-logging-for-beeline.html 2. strace -t beeline -u "Zookeeper JDBC string" -e "select 1"

kgautam · ‎02-23-2018

How HDFS Apply Ranger Policies Apache Ranger. Apache Ranger™ is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform. The vision with Ranger is to provide comprehensive security across the Apache Hadoop ecosystem. With the advent of Apache YARN, the Hadoop platform can now support a true data lake architecture. Enterprises can potentially run multiple workloads, in a multi tenant environment. Data security within Hadoop needs to evolve to support multiple use cases for data access, while also providing a framework for central administration of security policies and monitoring of user access. Ranger Goals Overview Apache Ranger has the following goals: Centralized security administration to manage all security related tasks in a central UI or using REST APIs. Fine grained authorization to do a specific action and/or operation with Hadoop component/tool and managed through a central administration tool Standardize authorization method across all Hadoop components. Enhanced support for different authorization methods - Role based access control, attribute based access control etc. Centralize auditing of user access and administrative actions (security related) within all the components of Hadoop. Ranger maintains various type of rule mapping the general layout looks like 1. User -> groups -> policy -> actual Resource(hdfs, hive tables) access/deny/allowed/read/write 2. User -> policy -> actual Resource(hdfs, hive tables) access/deny/allowed/read/write Key Take away of Ranger 1. Ranger is not an Identity management system, its a service which hold the policy mappings 2. Ranger is least worried about the user name and group names actual relation. 3. You can create a dummy group and attach it to a user, ranger is not bothered if this relationship exsist in LDAP or not 4. Ranger users and groups are snynced from the same LDAP which powers the rest of Hadoop cluster. 5. Its is the common ldap shared between Ranger and Hadoop cluster which enables them to see the same user. 6. No where Ranger claims that it knows all the user present on the cluster, its the job of Ranger user to sync users and groups to Ranger. Namenode: The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself. Client applications talk to the NameNode whenever they wish to locate a file, or when they want to add/copy/move/delete a file. The NameNode responds the successful requests by returning a list of relevant DataNode servers where the data lives. Key Take aways. 1. Namenode is the place where meta info of the file is hdfs maintained. 2. While reading or writing a file, hdfs clients interact with namenode, to get the location of flie blocks on various datanodes and eventually interact with datanodes. 3. All file permission checks happen at namenode for HDFS. 4. Namnode maintains a POSIX style permission user : group : other but also supports fine grained access by applying Hadoop ACLS. Please follow the following link to have a interesting perspective of HDFS compared to Linux ext3. 5. dfs.namenode.acls.enabled = true enables ACLs on namenode. 6. To know more about hadoop ACLS follow the link. 7. Hadoop POSIX permission are not sufficient to decide all possible permission applicable on a given file of directory. 8. For setting unsettings acls use hdfs dfs -setfacl and hdfs dfs -getfacl How Namenode and Ranger Interacts HDFS permission checks happens on hdfs client interaction with Namenode. Namenode has his own ACLs and Ranger policies to apply. The application of permission starts with Ranger ACLs, and then to hadoop ACLs. How it all works (doAs=true impersonation enabled). 1. Ranger policies are fetched by Namenode and maintained in local cache. Do realize hdfs ranger plugin is not a separate process, but a lib which is executed along with Namenode. 2. User Authenticates to Namenode using one of the specified authenticating mechanism simple, Kerberos. 3. Namenode gets the username during the authentication phase. Do remember even with Kerberos Authentication groups available in the ticket are never used. 4. Based on how core-site.xml if configured Namenode either lookups LDAP to fetch groups of the authenticated user OR it does a lookup from the underlying OS (NSS -> SSSD -> LDAP) to fetch the groups. 5. Once groups are fetched, Namenode has mapping of user to groups of authenticated user. 6. Hdfs Ranger plugin has mapping of user -> groups -> policy, now the groups which were fetched from namenode are used to select the ranger policy and enforce them. 7. Just realize ranger might provide a relation of user 1 -> mapper to 3 groups -> 3 groups mapped to 3 policies. Not all the policies, mapped to the three groups will be applied by default. 8. Namenode will fetch the groups at its own end (LDAP or through OS) and only the overlapping groups with the ranger groups rules will be used while enforcing the policies. 9. Namenode (hdfs ranger plugin lib) will write audit logs locally which is eventually pushed to ranger service (solr). 10. If due to some reason groups are not fetched from Namenode for the authenticated user all the Ranger policies mapped to those groups will not be applied. 11. Sometime mapping user to policies directly help mitigating issues in case LDAP is not working correctly. 12. Do realize all the mapping here are in terms of group names and not gid. As there can be scenario that gid is available on the OS but no groups. 13. IF there are no ranger policies for the user then Hadoop ACLs are applied and appropriate permission is enforced. Config hdfs-site.xml Dis.namenode.inode.attributes.provider.class org.apache.ranger.authorization.hadoop.RangerHdfsAuthorizer RangerHdfsAuthorizer => calls checkPermission => which internally calls gets groups of the authenticated user using UserGroupInformation class. Code Flow : HDFS Authorization https://github.com/apache/ranger/blob/master/hdfs-agent/src/main/java/org/apache/ranger/authorization/hadoop/RangerHdfsAuthorizer.java#L64 checkPermission : from userName get groups and check privilidges. https://github.com/apache/ranger/blob/master/hdfs-agent/src/main/java/org/apache/ranger/authorization/hadoop/RangerHdfsAuthorizer.java#L200 String user = ugi != null ? ugi.getShortUserName() : null; Set<String> groups = ugi != null ? Sets.newHashSet(ugi.getGroupNames()) : null; Get Groups from Username. Set<String> groups = ugi != null ? Sets.newHashSet(ugi.getGroupNames()) : null; https://github.com/apache/ranger/blob/master/hdfs-agent/src/main/java/org/apache/ranger/authorization/hadoop/RangerHdfsAuthorizer.java#L208 UserGroupInformation : Core class to authenticate the users and get groups (Kerberos authentication, LDAP, PAM) . Get groups from User. http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.action/0.2.0/org/apache/hadoop/security/UserGroupInformation.java#221 Groups : if nothing is mentioned in core-site.xml then call invoke a shell and get groups for the use. http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-common/2.7.0/org/apache/hadoop/security/Groups.java#Groups.getUserToGroupsMappingService%28org.apache.hadoop.conf.Configuration%29 ShellBasedUnixGroupsMapping: default Implementation http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-common/2.7.0/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java#ShellBasedUnixGroupsMapping

kgautam · ‎02-22-2018

How HiveServer2 Apply Ranger Policies Apache Ranger. Apache Ranger™ is a framework to enable, monitor and manage comprehensive data security across the Hadoop platform. The vision with Ranger is to provide comprehensive security across the Apache Hadoop ecosystem. With the advent of Apache YARN, the Hadoop platform can now support a true data lake architecture. Enterprises can potentially run multiple workloads, in a multi tenant environment. Data security within Hadoop needs to evolve to support multiple use cases for data access, while also providing a framework for central administration of security policies and monitoring of user access. Ranger Goals Overview Apache Ranger has the following goals: Centralized security administration to manage all security related tasks in a central UI or using REST APIs. Fine grained authorization to do a specific action and/or operation with Hadoop component/tool and managed through a central administration tool Standardize authorization method across all Hadoop components. Enhanced support for different authorization methods - Role based access control, attribute based access control etc. Centralize auditing of user access and administrative actions (security related) within all the components of Hadoop. Ranger maintains various type of rule mapping the general layout looks like 1. User -> groups -> policy -> actual Resource(hdfs, hive tables) access/deny/allowed/read/write 2. User -> policy -> actual Resource(hdfs, hive tables) access/deny/allowed/read/write Key Take away of Ranger 1. Ranger is not an Identity management system, its a service which hold the policy mappings 2. Ranger is least worried about the user name and group names actual relation. 3. You can create a dummy group and attach it to a user, ranger is not bothered if this relationship exsist in LDAP or not 4. Ranger users and groups are snynced from the same LDAP which powers the rest of Hadoop cluster. 5. Its is the common ldap shared between Ranger and Hadoop cluster which enables them to see the same user. 6. No where Ranger claims that it knows all the user present on the cluster, its the job of Ranger user to sync users and groups to Ranger. How HiveServer2 and Ranger Interacts How it all works (doAs=true impersonation enabled). 1. Ranger policies are fetched by HiseServer2 and maintained in local cache. Do realize hive ranger plugin is not a separate process, but a lib which is executed along with HivseServer2. 2. User Authenticates to Hiveserver2 using one of the specified authenticating mechanism LDAP, Kerberos, PAM etc 3. HiveServer2 gets the username during the authentication phase. Do remember even with Kerberos Authentication groups available in the ticket are never used. 4. Based on how core-site.xml if configured hivserver2 either lookups LDAP to fetch groups of the authenticated user OR it does a lookup from the underlying OS (NSS -> SSSD -> LDAP) to fetch the groups. 5. Once groups are fetched, Hiverser2 has mapping of user to groups of authenticated user. 6. HiveServer2 Ranger plugin has mapping of user -> groups -> policy, now the groups which were fetched from hiveserver2 are used to select the ranger policy and enforce them. 7. Just realize ranger might provide a relation of user 1 -> mapper to 3 groups -> 3 groups mapped to 3 policies. Not all the policies, mapped to the three groups will be applied by default. 8. Hiveserver2 will fetch the groups at its own end (LDAP or through OS) and only the overlapping groups with the ranger groups rules will be used while enforcing the policies. 9. Hiveserver2 (ranger plugin lib) will write audit logs locally which is eventually pushed to ranger service (solr). 10. If due to some reason groups are not fetched from HiveServer2 for the authenticated user all the Ranger policies mapped to those groups will not be applied. 11. Sometime mapping user to policies directly help mitigating issues in case LDAP is not working correctly. 12. Do realize all the mapping here are in terms of group names and not gid. As there can be scenario that gid is available on the OS but no groups. Config Hiveserver2-site. hive.security.authorization.manager. org.apache.ranger.authorization.hive.authorizer.RangerHiveAuthorizerFactory RangerHiveAuthorizerFactory => calls RangerHiveAuthorizer => which internally calls a checkPrivileges() method , which subsequently gets groups of the authenticated user using UserGroupInformation class. Code Flow : Ranger Authorization https://github.com/apache/ranger/blob/master/hive-agent/src/main/java/org/apache/ranger/authorization/hive/authorizer/RangerHiveAuthorizerFactory.java https://github.com/apache/ranger/blob/master/hive-agent/src/main/java/org/apache/ranger/authorization/hive/authorizer/RangerHiveAuthorizer.java checkPrivileges : from userName get groups and check prividigles. https://github.com/apache/ranger/blob/master/hive-agent/src/main/java/org/apache/ranger/authorization/hive/authorizer/RangerHiveAuthorizer.java#L225 UserGroupInformation ugi = getCurrentUserGroupInfo(); https://github.com/apache/ranger/blob/master/hive-agent/src/main/java/org/apache/ranger/authorization/hive/authorizer/RangerHiveAuthorizer.java#L220 Get Groups from Username. Set<String> groups = Sets.newHashSet(ugi.getGroupNames()); UserGroupInformation mUgi = userName == null ? null : UserGroupInformation.createRemoteUser(userName); https://github.com/apache/ranger/blob/master/hive-agent/src/main/java/org/apache/ranger/authorization/hive/authorizer/RangerHiveAuthorizerBase.java#L65 getCurrentUserGroupInfo() https://github.com/apache/ranger/blob/master/hive-agent/src/main/java/org/apache/ranger/authorization/hive/authorizer/RangerHiveAuthorizerBase.java#L92 UserGroupInformation : Core class to authenticate the users and get groups (Kerberos authentication, LDAP, PAM) . Get groups from User. http://grepcode.com/file/repo1.maven.org/maven2/com.ning/metrics.action/0.2.0/org/apache/hadoop/security/UserGroupInformation.java#221 Groups : if nothing is mentioned in core-site.xml then call invoke a shell and get groups for the use. http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-common/2.7.0/org/apache/hadoop/security/Groups.java#Groups.getUserToGroupsMappingService%28org.apache.hadoop.conf.Configuration%29 ShellBasedUnixGroupsMapping: default Implementation http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hadoop/hadoop-common/2.7.0/org/apache/hadoop/security/ShellBasedUnixGroupsMapping.java#ShellBasedUnixGroupsMapping

kgautam · ‎02-22-2018

User and Groups significance in HDFS Before we even start lets take a look back on how user and groups are handled in Linux. The key take away from the previous articles were lets Look at the various relationship that exists 1. Every group has a group id. 2. Every user has a user id 3. In linux its not possible to have user without a group id.(by default when a user is created , it has a group with same name) 4. A user can have one primary group and multiple secondary groups. 5. A group can have multiple users. 6. Authentication is done based on username and password. 7. Authorization is done based on groups as unix follow POSIX permission for user : group : others 8.A user cannot exist without a group. 9. A group can exist without a user. 10. A file can only have usernames and groups which are part of the Linux OS (Local or Remote service) 11. A file ownership can never be changed to a non existent user ( Create a file and try chown XXXXXX fileName ). 12. Linux is applying authorization policy not only during reading the file but also while creating the file. 13. In linux system there can be no resource which is being handled by a random user which the OS is not aware of. 14. OS maintains(locally or LDAP) a table of user and groups, and will never allow a user outside of this mapping to create, delete or own a file. Lets try creating a file on hdfs. 1. Lets change our current user to hdfs on Locally by sudo su hdfs. 2. hdfs is the superuser for HDFS filesystem , like root is the super user on Linux file System. 3. Lets create a dir in hdfs. hadoop dfs -mkdir /tmp/testDir 4. Change the ownership of /tmp/kunal to random user and group by hadoop dfs -chown XXXX:YYYYY /tmp/kunal 5. Doing a ls on hdfs "/tmp" by hadoop dfs -ls /tmp | grep testDir which will display drwx-xr-x - XXXX YYYY 0 2018-02-20 11:00 /tmp/testDir Key take aways. 1. hdfs user is the super user in hdfs. 2. HDFS has no strict policy regarding user and groups like your linux OS. 3. You interact with HDFS through hdfs client, hdfs client takes the username of the user through which it was run on the linux OS. 4. HDFS always checks for permissions while reading a file, while creating or chown it does no check who is creating the files. 5. Your linux OS users in a way are related to the user on HDFS, as your hdfs clients pickup the Linux user through which it was run. 6. HDFS provides two kind of security mapping POSIX and ACLS: and its for ACLS that it requires user to group mapping to be made available to it. 7. In the HDFS file system user and group are not as tight coupled as Linux. 8. User Identity is never maintained with the HDFS, the user identity mechanism is extrinsic to HDFS itself. There is no provision within HDFS for creating user identities, establishing groups, or processing user credentials. HDFS The Hadoop Distributed File System (HDFS) implements a permissions model for files and directories that shares much of the POSIX model. Each file and directory is associated with an owner and a group. The file or directory has separate permissions for the user that is the owner, for other users that are members of the group, and for all other users. For files, the r permission is required to read the file, and the w permission is required to write or append to the file. For directories, the r permission is required to list the contents of the directory, the w permission is required to create or delete files or directories, and the x permission is required to access a child of the directory.In contrast to the POSIX model, there are no setuid or setgid bits for files as there is no notion of executable files. For directories, there are no setuid or setgid bits directory as a simplification. The sticky bit can be set on directories, preventing anyone except the superuser, directory owner or file owner from deleting or moving the files within the directory. Setting the sticky bit for a file has no effect. Collectively, the permissions of a file or directory are its mode. In general, Unix customs for representing and displaying modes will be used, including the use of octal numbers in this description. When a file or directory is created, its owner is the user identity of the client process, and its group is the group of the parent directory (the BSD rule).HDFS also provides optional support for POSIX ACLs (Access Control Lists) to augment file permissions with finer-grained rules for specific named users or named groups. ACLs are discussed in greater detail later in this document.Each client process that accesses HDFS has a two-part identity composed of the user name, and groups list. Whenever HDFS must do a permissions check for a file or directory foo accessed by a client process, If the user name matches the owner of foo, then the owner permissions are tested; Else if the group of foo matches any of member of the groups list, then the group permissions are tested; Otherwise the other permissions of foo are tested. If a permissions check fails, the client operation fails. Hadoop Groups Mapping for details.

kgautam · ‎02-21-2018

The two most important aspect of security is 1. Authentication 2. Authorization Authentication : the process of ascertaining that somebody really is who he claims to be. (who are you) Authorization : the process of verifying that you can access to something. (Are you allowed to access the resource) Lets take example of a user logging into a Linux machine (ssh / terminal login). One needs to authenticate himself username and password thus verifying he is the person who he claims to be . The same user might not be authorized to access a file as he dosent have enough permissions to read/write on the file. The idea can be extended to even other services like, one can login (authenticate ) into booking.com if he has a profile, but is not authroized to change the prices of the flights, only admins are allowed to do that. Hence authentication and authorization play a key role in determining the security aspects of a service. Lets see How authentication and authorization are implemented in Linux OS. The three main important file from security perspective are 1. /etc/passwd 2. /etc/groups 3. /etc/shadow An excerpt from /etc/passwd username : x : uid : gid : user info : home dir : shell to use. Things to know : 1. x denotes the encrypted password is saved into /etc/shadow file. 2. gid present here is the primary group id of the user. A user can be part of multiple groups, but the one present in the /etc/passwd is his primary group An excerpt from /etc/groups group name : password : gid : group List Things to know 1. password is generally not used , but we can have password for a group too. 2. The group list refers to the list of user names. These user have these groups as the secondary group. lets Look at the various relationship that exists 1. Every group has a group id. 2. Every user has a user id 3. In linux its not possible to have user without a group id.(by default when a user is created , it has a group with same name) 4. A user can have one primary group and multiple secondary groups. 5. A group can have multiple users. 6. Authentication is done based on username and password. 7. Authorization is done based on groups as unix follow POSIX permission for user : group : others Some important linux commands. 1. sudo adduser user: adds a user with the groupname as user name. In Linux a user cannot exist without a group. 2. id username : uid=1001(foobar) gid=1001(foobar) groups=1001(foobar), 4201(security) to get groups of a user (/etc/passwd has this info). For uid foobar, group foobar (gid 1001) is the primary group, security(4201) is the secondary group 3. groups username: gets all the user than belong to this group (/etc/groups has this info) 4. To change primary group of a user use : sudo usermod -g Username groupname 5. getent passwd and getent groups can also be used to lookup the info, it also provides the source from where the info is looked from. The Linux OS security architecture is very restrictive. The various aspects are 1. A user cannot exist without a group. 2. A group can exist without a user. 3. A file can only have usernames and groups which are part of the Linux OS (Local or Remote service) 4. A file ownership can never be changed to a non existent user ( Create a file and try chown XXXXXX fileName ). 5. Linux is applying authorization policy not only during reading the file but also while creating the file. 7. In linux system there can be no resource which is being handled by a random user which the OS is not aware of.

kgautam · ‎02-20-2018

Lets start with having a quick understanding of a file saved in Linux OS 1. vim /etc/fstab to know the filesystem of the disk. 2. create and save a file using vim. 3. filefrag -v filename Facts to appreciate. 1. Filefrag commands will give you an idea of how many blocks( OS blocks formed by grouping sectors) the file occupies. block is the minimum data that can be read by the OS . Data is fetched from hard disk in multiples of the blocks. 2. The blocks might not be contiguous( can you corelate something to HDFS). 3. The filesystem has no idea about a record saved in the file or the format of the file. For the filesystem file is a sequence of bytes. 4. Number of blocks occupied on Linux FileSystem is filesize/filesystem block size , which mostly is file size/ 4096 for ext3/4. 5. Record is a logical entity which, only the reader and writer of the file and can understand. Filesystem has no clue about the same. 6. We have standardized the process of identifying a record with file format. 7. Example In a file of "text" format record is the sequence of byte contained within two \n. 8. The editors we use have this logic inbuilt in them example vim is a text editor, which has this notion of \n determining records, which is part of its code base. 9. A record can be spread across two blocks, as while dividing the file into blocks, filesystem dosent consider anything about the notion of records. The file in the hadoop world. When a file is stored in hadoop filesystem which is distributed in nature, following facts need to be appreciated. 1. HDFS is distributed in nature. 2. HDFS uses the OS file system on the individual nodes to store data. On individual node a HDFS blocks, is saved as multiple OS blocks on hard disk. HDFS blocks in itself is a higher level abstraction. One hdfs block (128 MB) on a given node comprises of multiple OS blocks(4K bytes) which is made of sectors(512 bytes) on the hard-disk. 3. The file is divided into HDFS blocks by: Num blocks = fileSize/ Hadoop block size. 4. The File blocks might reside on the same node or distributed across multiple nodes. 5. The HDFS has no idea of the notion of record or file format. For HDFS file is just a sequence of bytes that needs to be stored. 6. As there is no notion of records, it is very much possible that a record is split across two blocks. 7. To view the blocks of a file in HDFS use hadoop fsck filePath -file -blocks. The Question arises who takes care of honoring the notion of records and file formats on hadoop. 1. InputFormats. 2. RecordReaders. InputFormat is the code which knows how to read a specific file, hence this is the code which helps you reading a specific file format. Just like we use vim to open text files, adobe to open pdf format files, similary we use TextInputformat to read text files saved on HDFS, SequenceFileInput Format to read sequence files in hadoop. InputFormat holds the logic of how the file has been split and saved, and which recordreader will be used to read the record in the splits. Once we have the InputFormats the next logical question is which component decides how the sequence of bytes read to be converted into records. RecordReader is the code which understands how to logically form a record from the stream of read bytes. Lets take example of a Text file and try to understand the concepts in-depth. Let there be a text file, the format itself says a record is formed by all the bytes between \n or \r, and the individual bytes in the record will be encoded in UTF-8. Lets assume a big text file is saved onto HDFS. The facts to understand is. 1. The file is broken into HDFS blocks and saved. The blocks can be spread across multiple nodes. 2. A record can be split between two hdfs blocks. 3. TextInput format : uses LinerecordReader 4. TextInputFormat uses the logic of fileSize/ HDFS blocksize ( getSplit Functionality ) to find the number of blocks the file consists of. 5. For each file block one can find the start byte and end byte index which is provided as the input to the record reader. 6. A record reader knows to read from byte index X to Byte index Y with certain conditions 1. If starting byte index X == 0 (starting of the file ) then include all the bytes till \n in the first record. 2. If the starting byte index X != 0 (All the blocks except the first block) then leave the byte till the first \n is encountered. 3. IF byte index Y = file size ( End of File) then do not read any further records. 4. if byte index Y != file size (The blocks excluding the last block) then go ahead and read extra record from the next block. What all the condition ensures is. 1. The recordreader which reads the first block of file consumes the first line. 2. All other record reader always skip the initial bytes till the first \n occurs.

Online	Offline
Last Visited	‎08-14-2019 06:45 PM

Member Since	‎10-02-2017 07:17 AM
Last Visited	‎08-14-2019 06:45 PM
Posts	112
Kudos received	70

Cloudera Community

Re: Accessing HBase Table through Hive is very slo...

Re: Yarn job stuck at Accepted state in a kerberiz...

Re: How to create a hadoop custom inputformat/file...

Re: Can columnar format occupy more space than row...

Re: Handling compression in Spark

Generic Steps of Understanding/ Debugging Services...

Tuning Heap Sizing for a 80 Node HDP cluster

Optimized Mysql DB config for Ambari, Hive to ha...

HiveServer2 configurations deep dive

HiveServer2 Connection Establishment Phase in deta...

How HDFS Apply Ranger Policies

How HiveServer2 and Ranger Interact (Internals)

User and Groups significance in HDFS

Authentication and Authorization on Linux OS

RecordReader : How Records from a file are read i...