About smartninja723

smartninja723 · ‎07-09-2018

Nope @Josh Nicholson

smartninja723 · ‎03-15-2018

Hi, I was going through the smart sense recommendation, which suggests to enable "tez.task.scale.memory.enabled". I searched through Tez official documentation which says Whether to scale down memory requested by each component if the total exceeds the available JVM memory I am keen on understanding if we enable auto-scaling of memory for tasks, what are the disadvantages and possible advantages. Thanks for sharing your experience. Regards,

smartninja723 · ‎08-22-2017

Hi, I understand that mechanism of Hive(with Tez) and Hive (with MR) is different from traditional RDBMS databases. We have a set of analysts who perform : "select * from view limit n" kind of queries many times. Since all analysts/BI users come from traditional RBDMS background, users do compare the waiting time for RDBMS and for Hive Query to return results. For example : select top 10 * from db.view ; using SQL server and on a much larger dataset, takes the following times to complete: run 1: 0 seconds run 2: 0 seconds run 3: 0 seconds ........... When running the same query through Hive over Knox (or even with beeline), it takes much higher time. SELECT * FROM db.view limit 10 Takes the following times to complete via hive over Knox or via Ambari View or with beeline. run 1: 36 seconds run 2: 18 seconds run 3: 38 seconds ....... This is one example of a db/table combination, but this is a common scenario for mostly all the tables in a few databases. I tried analyze and compute statistics on underlying tables on which these queries are run, but query times did not change. I understand that, we are not comparing apple to apple here, but this question is more to do with improvement of end user experience, and how best can we help to avoid long wait times? (This is on HDP 2.6.x) Regards, SS

smartninja723 · ‎07-07-2017

Thanks @Manish Gupta for clarification. This is helpful. Cheers.

smartninja723 · ‎07-06-2017

Thank you @Manish Gupta, This is something we can try by having a proxy configured to have load balancing for HS2. Also would like to understand are they changes to be made for Zookeeper ? Is there any other way by which, without using non HDP component or external network changes, we can achieve load balancing of HS2. Regards,

smartninja723 · ‎07-05-2017

Hi, Using https://knox.apache.org/books/knox-0-9-0/user-guide.html, I have configured Knox topology for Hive Server2 High Availability. I also noticed Dynamic Service Discovery Through ZooKeeper in documentation. I see that all the queries/connections happen though only one of the HiveServer2, now if this HS2 instances down, I notice that connections/queries happen through another instance of HS2. My question is : In the busy cluster, when we have multiple HS2 servers installed, is it possible to load balance (possibly round robin) so that one server does not get overloaded? If yes, how? Regards, SS

smartninja723 · ‎06-30-2017

@Sandeep NemuriThank you for the confirmation. When do we expect this fix?

smartninja723 · ‎06-27-2017

Using auth=HTTPKerberosAuth() will pass your Kerberos ticket in my understanding. It is similar to --negotiate, in curl.

smartninja723 · ‎06-27-2017

HI @Javert Kirilov, I was facing this issue when trying accessing livy with Python scripts. Please try something like this , if curl is blocking you. You may need to install python's requests package. import json, pprint, requests, textwrap from requests_kerberos import HTTPKerberosAuth host='http://LIVY_HOST:LIVY_PORT' data = {'kind': 'spark'} headers = {'Requested-By': 'MY_USER_ID','Content-Type': 'application/json'} auth=HTTPKerberosAuth() r0 = requests.post(host + '/sessions', data=json.dumps(data), headers=headers,auth=auth) r0.json() Regards, SS

smartninja723 · ‎06-27-2017

This works for us. thanks.

Online	Offline
Last Visited	‎08-14-2019 10:39 AM

Member Since	‎02-24-2016 02:02 PM
Last Visited	‎08-14-2019 10:39 AM
Posts	175
Kudos received	56

Cloudera Community

Re: HDPCA Practice Exam VM not able to connect

Re: Can we not have HS2 and Spark Thrift Server (S...

Re: Weird error while converting RDD[CaseClass] to...

Re: Need to understand impact for setting tez.task...

Need to understand impact for setting tez.task.sca...

Speeding up : select "n" rows from a view (select ...

Re: Load balancing HiveServer2 Over Knox

Re: Load balancing HiveServer2 Over Knox

Load balancing HiveServer2 Over Knox

Re: User impersonation in Apache Spark2 Thrift Ser...

Re: Can't connect to Livy through Kerberos

Re: Can't connect to Livy through Kerberos

Re: Controlling Number of small files while insert...