About Jagatheeshr

Jagatheeshr · ‎01-22-2019

Thanks, @Sandeep Nemuri

Jagatheeshr · ‎01-22-2019

Less than 5% of the support cases have recommendations to engage professional services. I have seen this message create mixed emotions in the customer's mind. I would like to wear both customers and support engineer's hat to see how this can be handled efficiently. As a customer, I will be delighted if solutions come instantly, a setting like “hive.run.query.faster=true or spark.fix.error=true” would be very ideal. Requesting customers to connect with a different team could often be seen as a deflection strategy from support. Also, professional service engagement has its own sales cycle and costs are attached to it. This results in frustration. Let’s take an example of one such redirection, Customer requests support’s help refine their use case and suggest the right technology stack. From a support point of view, this request is considered out of scope and the engineer would direct the customer to engage professional services. As a professional services consultant in my past life, use case discussion would start with gathering business requirements and perform various brainstorming sessions, proof of concepts etc. to pick the right components and benchmark them. Containing this whole process in a single support case would do injustice to the problem itself. The quickest and most efficient way to solve a problem is to engage the right resource at a right time. While support can take a crack at it on a best effort basis, it may not be efficient in solving that problem and may often result in providing incorrect recommendations due to the limited information availability and varied skill set. Support does a great job in solving an issue when the problem boundaries are clearly defined. What really frustrates customers is that, an engineer is sitting on a case for a long time, without setting clear expectations up front and coming back at a later time saying you need to engage professional services. While we take extreme care in providing great customer experience, some of these issues are great learning and want to pre-empt those before it occurs. Here is an attempt to set clear guidelines for both engineers and customers by defining support scope. I request everyone to review these support scope guidelines before submitting a case. This would save a lot of time. “If I had an hour to save the world, I would spend 59 minutes defining the problem and one-minute finding solutions.” - Albert Einstein

Jagatheeshr · ‎06-27-2018

Nice article, Mugdha.

Jagatheeshr · ‎06-21-2017

Nice one Gaurav

Jagatheeshr · ‎10-17-2016

@Alena Melnikova, Following link would help. https://community.hortonworks.com/questions/2517/maximum-hive-table-partitions-allowed-recommended.html https://community.hortonworks.com/questions/29031/best-pratices-for-hive-partitioning-especially-by.html http://www.slideshare.net/BenjaminLeonhardi/hive-loading-data Hope this helps

Jagatheeshr · ‎08-22-2016

Great one Karthik.

Jagatheeshr · ‎08-05-2016

The concept of Delegation token is introduced to avoid frequent authentication check against Kerberos(AD/MIT). After the initial authentication against Namenode using Keberos, any subsequent authentication can be done without Kerberos service ticket(ot TGT). Once the client authentication with Kerberos for Namenode is successfull, The client can get a delegation token from Namenode.This token has expiration and max issue date. But this can be renewed up to max issue date. In this article, We are going to see how delegation token can be created with the initial authentication and even if you delete initial TGT, You can still list the content with the help of delegation token 1). List the ticket from Kerberos. root@hdptest-1 ~]# cd /etc/security/keytabs/ [root@hdptest-1 keytabs]# klist -kt hdfs.headless.keytab Keytab name: FILE:hdfs.headless.keytab KVNO Timestamp Principal ---- ----------------- -------------------------------------------------------- 0 07/23/16 02:44:25 hdfs-hdptest@LAB.HORTONWORKS.NET 0 07/23/16 02:44:25 hdfs-hdptest@LAB.HORTONWORKS.NET 0 07/23/16 02:44:25 hdfs-hdptest@LAB.HORTONWORKS.NET 0 07/23/16 02:44:25 hdfs-hdptest@LAB.HORTONWORKS.NET 0 07/23/16 02:44:25 hdfs-hdptest@LAB.HORTONWORKS.NET 2).Perform Kinit. [root@hdptest-1 keytabs]# kinit -kt hdfs.headless.keytab hdfs-hdptest@LAB.HORTONWORKS.NET [root@hdptest-1 keytabs]# klist Ticket cache: FILE:/tmp/krb5cc_0 Default principal: hdfs-hdptest@LAB.HORTONWORKS.NET Valid starting Expires Service principal 08/04/16 22:18:50 08/05/16 08:18:50 krbtgt/LAB.HORTONWORKS.NET@LAB.HORTONWORKS.NET renew until 08/11/16 22:18:50 3). List HDFS contents using [root@hdptest-1 keytabs]# hadoop fs -ls / Found 11 items drwxrwxrwx - yarn hadoop 0 2016-08-03 05:51 /app-logs drwxr-xr-x - hdfs hdfs 0 2016-08-03 05:53 /apps drwxr-xr-x - yarn hadoop 0 2016-07-23 00:16 /ats drwxr-xr-x - hdfs hdfs 0 2016-07-23 00:16 /hdp drwxr-xr-x - mapred hdfs 0 2016-07-23 00:16 /mapred drwxrwxrwx - mapred hadoop 0 2016-07-23 00:16 /mr-history drwxr-xr-x - hdfs hdfs 0 2016-07-25 05:25 /ranger drwxrwxrwx - hdfs hdfs 0 2016-07-23 02:51 /tmp drwxr-xr-x - hdfs hdfs 0 2016-08-03 05:50 /user drwxr-xr-x - hadoopadmin hdfs 0 2016-08-03 05:52 /zone_encr drwxr-xr-x - hadoopadmin hdfs 0 2016-08-03 05:49 /zone_encr2 4). Generate the Delegation token, This is based on the existing ticket you have. [root@hdptest-1 keytabs]# hdfs fetchdt --renewer hdfs my.delegation.token 16/08/04 22:19:44 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 15 for hdfs on 172.26.67.6:8020 Fetched token for 172.26.67.6:8020 into file:/etc/security/keytabs/my.delegation.token Fetched token for 172.26.67.8:9292 into file:/etc/security/keytabs/my.delegation.token [root@hdptest-1 keytabs]# klist Ticket cache: FILE:/tmp/krb5cc_0 Default principal: hdfs-hdptest@LAB.HORTONWORKS.NET Valid starting Expires Service principal 08/04/16 22:18:50 08/05/16 08:18:50 krbtgt/LAB.HORTONWORKS.NET@LAB.HORTONWORKS.NET renew until 08/11/16 22:18:50 5). Destroy the ticket cache. [root@hdptest-1 keytabs]# kdestroy [root@hdptest-1 keytabs]# export HADOOP_TOKEN_FILE_LOCATION=/etc/security/keytabs/my.delegation.token [root@hdptest-1 keytabs]# klist klist: No credentials cache found (ticket cache FILE:/tmp/krb5cc_0) 6). List the HDFS content.Even though you dont have the ticket you are still able to do the listing thanks to Delegation token. [root@hdptest-1 keytabs]# hadoop fs -ls / Found 11 items drwxrwxrwx - yarn hadoop 0 2016-08-03 05:51 /app-logs drwxr-xr-x - hdfs hdfs 0 2016-08-03 05:53 /apps drwxr-xr-x - yarn hadoop 0 2016-07-23 00:16 /ats drwxr-xr-x - hdfs hdfs 0 2016-07-23 00:16 /hdp drwxr-xr-x - mapred hdfs 0 2016-07-23 00:16 /mapred drwxrwxrwx - mapred hadoop 0 2016-07-23 00:16 /mr-history drwxr-xr-x - hdfs hdfs 0 2016-07-25 05:25 /ranger drwxrwxrwx - hdfs hdfs 0 2016-07-23 02:51 /tmp drwxr-xr-x - hdfs hdfs 0 2016-08-03 05:50 /user drwxr-xr-x - hadoopadmin hdfs 0 2016-08-03 05:52 /zone_encr drwxr-xr-x - hadoopadmin hdfs 0 2016-08-03 05:49 /zone_encr2 7). Check if we can get the Delegation token without the initial ticket from kerberos. [root@hdptest-1 keytabs]# unset HADOOP_TOKEN_FILE_LOCATION [root@hdptest-1 keytabs]# hdfs fetchdt --renewer hdfs my.delegation.token 16/08/04 22:21:05 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)] at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211) at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:413) at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:563) <output truncated for brevity> Here you can see that delegation token can only be obtained based on the initial authentication with Kerberos. Failing to have the ticket would make Namenode to deny any Delegation token.

Jagatheeshr · ‎07-28-2016

We have noticed production job failures where customer upgraded their hive from .14 (HDP 2.1) to the latest version(>1.2.x) and resulted in critical jobs failing (not to mention the severity 1 case). This is due to the changes in the reserved words between the source and target hive versions. For example, Word 'date' is not a reserved word in Hive.14 but inHive 1.2.1 it is. Same is the case with REGEXP and RLIKE. Here are the reserved keywords which hive.support.sql11.reserved.keywords support. https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-ReservedKeywords There are two ways if the user still would like to use those reserved keywords as identifiers: 1).use quoted identifiers, 2).set hive.support.sql11.reserved.keywords=false. This would be the best option though code change is required. To illustrate option 1, For example, With word "user" being a keyword, We can use it as an identifier like the below SQL. SELECT createddate, `user`.screenname FROM twitter_json4 WHERE `user`.name LIKE 'Sarah%'; The second option is quite easier to write queries. However during the upgrade or If we need to enable the hive.support.sql11.reserved.keywords to true for some reason, then the existing queries(without using quotes) hive throws the following error FailedPredicateException(identifier,{useSQL11ReservedKeywordsForIdentifier()}?) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11644) at org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:45920) OPTION 2: hive> set hive.support.sql11.reserved.keywords; hive.support.sql11.reserved.keywords=false create table table (user string); ==> table and user are keywords. OK Time taken: 1.458 seconds hive> desc table; OK user string Time taken: 0.34 seconds, Fetched: 1 row(s) hive> show tables; OK table Time taken: 0.075 seconds, Fetched: 1 row(s) hive> set hive.support.sql11.reserved.keywords=true; ===> Enabling the property. hive> show tables; OK table Time taken: 0.041 seconds, Fetched: 1 row(s) hive> show tables; OK table Time taken: 0.039 seconds, Fetched: 1 row(s) hive> describe table; FailedPredicateException(identifier,{useSQL11ReservedKeywordsForIdentifier()}?) at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.identifier(HiveParser_IdentifiersParser.java:11644) at org.apache.hadoop.hive.ql.parse.HiveParser.identifier(HiveParser.java:45920) at org.apache.hadoop.hive.ql.parse.HiveParser.tabTypeExpr(HiveParser.java:15574) Setting hive.support.sql11.reserved.keywords to false would allow the user to use the keywords as identifiers without hive throwing any exception. We need to be considerate of the fact that when enabling the hive.support.sql11.reserved.keywords to true would require the use of quotes to differentiate the keyword and identifier. Feel free to get in touch with Hortonworks Support incase of any issues.

Jagatheeshr · ‎07-06-2016

@Rahul Mishra History Server,App Timeline, and RM are part of YARN & MR master component, In the @Venkat layout, It will on node3 .Also, suggest placing Kafka in a different server as its an ingestion component.

Jagatheeshr · ‎04-27-2016

@vinay kumar Answering inline It Will use Active directory as KDC. YES.That's Correct. . As soon as the user login into the system, AS will generate TGT and TGS will issue a ticket with that TGT. (AS and TGS will lie in AD). When you do a kinit, Only TGT is obtained from the KDC. TGS will be obtained only when you run the actual command. Say for example you run hadoop fs -ls / You can use "klist" to check what tickets you hold. It Will have to create principals and key tabs for all the service users,services, local users, AD users in this active directory.But it is only creating principals and keytabs for service users like hdfs,hive. Creating principals and keytabs should be done as part of Kerberising the cluster. The problem which we are facing is, it is not generating keytabs for local linux users, which is restricting them to use the services even thought they have access to those services(created policies in ranger). AD comes with LDAP and KDC . So you would ideally want to maintain all users just in AD and sync them in Unix machines via SSSD/Windbind etc. You do not want to have local user as well as AD users . This is not at all recommended. If you want to create a keytab for the local user, then that user should be created in AD and then keytabs can be created on top of it. Here is the very good github book about Kerberos. You may bookmark it https://community.hortonworks.com/content/kbentry/1327/kerberos-the-missing-guide.html Hope this helps.

Online	Offline
Last Visited	‎12-12-2022 11:15 AM

Member Since	‎04-08-2019 05:52 AM
Last Visited	‎12-12-2022 11:15 AM
Posts	115
Kudos received	98

Cloudera Community

Re: How to secure sqoop data transfer channel

Re: How to remove Falcon service from Ambari?

Re: Is it possible to set 'Skip Group Modification...

Re: Can ranger work with AD without Kerberos?

Re: Address already in use when deploying a cluste...

Re: "I would suggest you engage Professional Servi...

"I would suggest you engage Professional Services....

Re: Setting up Yarn queue acls

Re: How Region Split works in HBase.

Re: The best approach to the thousands of small pa...

Re: Comparing all service configurations between c...

Demystifying Delegation Token

HDP Upgrade : Hive Behavioral changes : .14 to 1....

Re: Any guidelines on assigning masters and assign...

Re: How does Kerberos work?