About kerra

kerra · ‎09-15-2017

Some other steps taken: 1. Create a knox.crt file 2. Place it in the cacerts folder in /etc/pki/java 3. Download this cert to the machine where ODBC needs to be configured. 4. Give this path in the SSL options, in the ODBC configuration and check the 'enable SSL' box. 5. Give Kerberos as authentication mechanism, in the ODBC configuration. 6. Give knox URL and port as the host and port, in the ODBC configuration

kerra · ‎09-15-2017

@Geoffrey Shelton Okot Thanks Geoffrey. I got it to work. Used Kerberos as the authentication mechanism. Hive was expecting a Kerberos ticket from Knox. @Ajay Thanks. Will try with Zookeeper. Yes, able to connect to Hive via beeline. And I have HTTP as the transport mode.

kerra · ‎09-14-2017

When I give the hive ODBC configuration as follows, I am getting this error: [Hortonworks][Hardy] (34) Error from server: Bad Status: HTTP/1.1 500 Server Error. Configuration: Service discovery mode: No Service Discovery Host: Knox gateway host Port: 8443 (Knox port) Authentication mechanism: Username and Password Thrift transport: HTTP

kerra · ‎09-07-2017

Here's the recommendation from a Hive SME: You should start by checking off the typical recommendations https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.5/bk_hive-performance-tuning/bk_hive-performance-tuning.pdf Especially partitioning, depending on how you are accessing your datetime field you may not benefit at all from partitioning pruning. A safe / proven path is to partition by date and use either an explicit partition key filter or a dimension lookup that allows Hive to infer partition keys from the datetime field. I don't recall seeing any other specific blob tuning techniques. Ideally you would lazy load the BLOB only if the ID matches but I don't believe there is a way to control that. One way to get closer to that is to have the ID / datetime mapping in a separate table without the BLOBs. Populating the list of datetimes (query 1) would be faster that way. Other thoughts: You should try Hive 2 (in HDP: enable LLAP) which has a bucket pruning optimization, if you cluster by ID it would scan fewer files. I see you are on 2.3 but this could be an incentive to move. You may try experimenting with ORC stripe sizes. You might try compressing the blobs to speed the search for a specific ID (if it is a point lookup). The application would need to decompress it. Long story short, only the 2 pruning options above are system-level optimizations, other than that you are probably looking at dealing with this at the app layer.

kerra · ‎09-07-2017

Really useful scripts. Helped me with my MySQL environment. To prepare the input.txt, I had to run this and pass this as input to the script: $ mysql -u userName -p dbName -e “select user_name from x_user” > /tmp/input.txt

kerra · ‎09-05-2017

This is the access pattern: Select datetime from tableName where id =? Select content from tableName where id =? And datetime=? One of the columns is a BLOB and the table is in ORC format. (The table needs to be transactional) Because of the BLOB, the read times are high. Any recommendations on optimization? This is on HDP 2.3. Thanks, Kiran

kerra · ‎08-29-2017

If the cluster is Kerberized, use kadmin to create a principal and the keytab file. Once keytab is created, add one rule/line for this user in auth_to_local. Ensure httpfs proxyuser config is present and then make changes in the httpfs conf file to reflect the keytab.

kerra · ‎08-09-2017

The recommended approach is to add another Hiveserver2 on another machine. Increasing the thread count will help in the short term, but is not the recommended solution.

kerra · ‎08-08-2017

This is a great article. I had trouble with Ambari server outside the private network (where all the master and data nodes are). There are two approaches to fix this. Ambari server can be brought into the private network (recommended) or the network on which agents are listening to, should be made primary. Basically, the hostname that the ambari server resolves to, should be what the agents are communicating to.

kerra · ‎06-21-2017

If the table is partitioned and there are delta files (from updates, for eg.), I think mr works but not tez. You may have to run compaction to convert the delta files into base files and then tez will work.

Online	Offline
Last Visited	‎10-24-2017 09:15 PM

Member Since	‎09-26-2016 03:09 PM
Last Visited	‎10-24-2017 09:15 PM
Posts	29

Cloudera Community

Re: Hive ODBC on Kerberos

Re: BLOB with ORC in Hive

Re: Hive ODBC on Kerberos

Re: Hive ODBC on Kerberos

Hive ODBC on Kerberos

Re: BLOB with ORC in Hive

Re: How to Remove all External Users from the Rang...

BLOB with ORC in Hive

Re: HTTPFS - Configure and Run with HDP

Re: Increasing hive.server2.thrift.max.worker.thre...

Re: Parameters for Multi-Homing

Re: Difference between mr and Tez?