Member since
09-26-2016
29
Posts
0
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7709 | 09-15-2017 12:06 PM | |
1597 | 09-07-2017 05:52 PM |
09-15-2017
03:54 PM
Some other steps taken: 1. Create a knox.crt file 2. Place it in the cacerts folder in /etc/pki/java 3. Download this cert to the machine where ODBC needs to be configured. 4. Give this path in the SSL options, in the ODBC configuration and check the 'enable SSL' box. 5. Give Kerberos as authentication mechanism, in the ODBC configuration. 6. Give knox URL and port as the host and port, in the ODBC configuration
... View more
09-15-2017
12:06 PM
@Geoffrey Shelton Okot Thanks Geoffrey. I got it to work. Used Kerberos as the authentication mechanism. Hive was expecting a Kerberos ticket from Knox. @Ajay Thanks. Will try with Zookeeper. Yes, able to connect to Hive via beeline. And I have HTTP as the transport mode.
... View more
09-14-2017
08:22 PM
When I give the hive ODBC configuration as follows, I am getting this error: [Hortonworks][Hardy] (34) Error from server: Bad Status: HTTP/1.1 500 Server Error. Configuration: Service discovery mode: No Service Discovery Host: Knox gateway host Port: 8443 (Knox port) Authentication mechanism: Username and Password Thrift transport: HTTP
... View more
Labels:
- Labels:
-
Apache Hive
09-07-2017
05:52 PM
Here's the recommendation from a Hive SME: You should start by checking off the typical recommendations https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.5/bk_hive-performance-tuning/bk_hive-performance-tuning.pdf Especially partitioning, depending on how you are accessing your datetime field you may not benefit at all from partitioning pruning. A safe / proven path is to partition by date and use either an explicit partition key filter or a dimension lookup that allows Hive to infer partition keys from the datetime field. I don't recall seeing any other specific blob tuning techniques. Ideally you would lazy load the BLOB only if the ID matches but I don't believe there is a way to control that. One way to get closer to that is to have the ID / datetime mapping in a separate table without the BLOBs. Populating the list of datetimes (query 1) would be faster that way. Other thoughts: You should try Hive 2 (in HDP: enable LLAP) which has a bucket pruning optimization, if you cluster by ID it would scan fewer files. I see you are on 2.3 but this could be an incentive to move. You may try experimenting with ORC stripe sizes. You might try compressing the blobs to speed the search for a specific ID (if it is a point lookup). The application would need to decompress it. Long story short, only the 2 pruning options above are system-level optimizations, other than that you are probably looking at dealing with this at the app layer.
... View more
09-07-2017
01:08 PM
Really useful scripts. Helped me with my MySQL environment. To prepare the input.txt, I had to run this and pass this as input to the script: $ mysql -u userName -p dbName -e “select user_name from x_user” > /tmp/input.txt
... View more
09-05-2017
08:29 PM
This is the access pattern: Select datetime from tableName where id
=? Select content from tableName where id =? And datetime=? One of the columns is a BLOB and the table is in ORC format. (The table needs to be transactional) Because of the BLOB, the read times are high. Any recommendations on optimization? This is on HDP 2.3. Thanks, Kiran
... View more
Labels:
- Labels:
-
Apache Hive
08-29-2017
07:35 PM
If the cluster is Kerberized, use kadmin to create a principal and the keytab file. Once keytab is created, add one rule/line for this user in auth_to_local. Ensure httpfs proxyuser config is present and then make changes in the httpfs conf file to reflect the keytab.
... View more
08-09-2017
01:48 PM
The recommended approach is to add another Hiveserver2 on another machine. Increasing the thread count will help in the short term, but is not the recommended solution.
... View more
08-08-2017
07:40 PM
This is a great article. I had trouble with Ambari server outside the private network (where all the master and data nodes are). There are two approaches to fix this. Ambari server can be brought into the private network (recommended) or the network on which agents are listening to, should be made primary. Basically, the hostname that the ambari server resolves to, should be what the agents are communicating to.
... View more
06-21-2017
10:07 PM
If the table is partitioned and there are delta files (from updates, for eg.), I think mr works but not tez. You may have to run compaction to convert the delta files into base files and then tez will work.
... View more