About kerra

BharatTiwari9 · ‎07-28-2020

"I highly recommend skimming quickly over following slides, specially starting from slide 7. http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey" This slide is not there at the path

GerardReverte · ‎09-17-2019

Hi @dstreev Thanks for your article, I was checking and correct me if I'm wrong, but the same could be done using Knox service, that comes by default with HDP, it's that correct? Or there is some extra feature with this service? Regards Gerard

Shelton · ‎09-15-2017

@kerra Was it self signed certificate of CA?

kerra · ‎09-07-2017

Here's the recommendation from a Hive SME: You should start by checking off the typical recommendations https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.5/bk_hive-performance-tuning/bk_hive-performance-tuning.pdf Especially partitioning, depending on how you are accessing your datetime field you may not benefit at all from partitioning pruning. A safe / proven path is to partition by date and use either an explicit partition key filter or a dimension lookup that allows Hive to infer partition keys from the datetime field. I don't recall seeing any other specific blob tuning techniques. Ideally you would lazy load the BLOB only if the ID matches but I don't believe there is a way to control that. One way to get closer to that is to have the ID / datetime mapping in a separate table without the BLOBs. Populating the list of datetimes (query 1) would be faster that way. Other thoughts: You should try Hive 2 (in HDP: enable LLAP) which has a bucket pruning optimization, if you cluster by ID it would scan fewer files. I see you are on 2.3 but this could be an incentive to move. You may try experimenting with ORC stripe sizes. You might try compressing the blobs to speed the search for a specific ID (if it is a point lookup). The application would need to decompress it. Long story short, only the 2 pruning options above are system-level optimizations, other than that you are probably looking at dealing with this at the app layer.

kerra · ‎08-09-2017

The recommended approach is to add another Hiveserver2 on another machine. Increasing the thread count will help in the short term, but is not the recommended solution.

tomo_hirano · ‎05-22-2019

Looks, it's tez issue comes from "fs.permissions.umask-mode" setting. https://community.hortonworks.com/questions/246302/hive-tez-vertex-failed-error-during-reduce-phase-h.html

maybe618 · ‎11-30-2017

hdp 2.3 and hive 1.2 the hive.enforce.bucketing is default true What is the need to set?

kerra · ‎05-03-2017

Very good article Rahul. Quick question: Does the table have to be partitioned? I'm trying to replicate a non-partitioned table with UI and I'm getting an exception. default/FalconWebException:FalconException:java.net.URISyntaxException:Partition Details are missing. How can I replicate this table using the UI?

jarnold · ‎08-27-2018

The article doesn't indicate this, so for reference, the listed HDFS settings do not exist by default. These settings, as shown below, need to go into hdfs-site.xml, which is done in Ambari by adding fields under "Custom hdfs-site". dfs.namenode.rpc-bind-host=0.0.0.0 dfs.namenode.servicerpc-bind-host=0.0.0.0 dfs.namenode.http-bind-host=0.0.0.0 dfs.namenode.https-bind-host=0.0.0.0 Additionally, I found that after making this change, both NameNodes under HA came up as stand-by; the article at https://community.hortonworks.com/articles/2307/adding-a-service-rpc-port-to-an-existing-ha-cluste.html got me the missing step of running a ZK format. I have not tested the steps below against a Production cluster and if you foolishly choose to follow these steps, you do so at a very large degree of risk (you could lose all of the data in your cluster). That said, this worked for me in a non-Prod environment: 01) Note the Active NameNode. 02) In Ambari, stop ALL services except for ZooKeeper. 03) In Ambari, make the indicated changes to HDFS. 04) Get to the command line on the Active NameNode (see Step 1 above). 05) At the command line you opened in Step 4, run: `sudo -u hdfs hdfs zkfc -formatZK` 06) Start the JournalNodes. 07) Start the zKFCs. 08) Start the NameNodes, which should come up as Active and Standby. If they don't, you're on your own (see the "high risk" caveat above). 09) Start the DataNodes. 10) Restart / Refresh any remaining HDFS components which have stale configs. 11) Start the remaining cluster services. It would be great if HWX could vet my procedure and update the article accordingly (hint, hint).

Online	Offline
Last Visited	‎10-24-2017 09:15 PM

Member Since	‎09-26-2016 03:09 PM
Last Visited	‎10-24-2017 09:15 PM
Posts	29

Cloudera Community

Re: Hive ODBC on Kerberos

Re: BLOB with ORC in Hive

Re: Difference between mr and Tez?

Re: HTTPFS - Configure and Run with HDP

Re: Hive ODBC on Kerberos

Re: BLOB with ORC in Hive

Re: Increasing hive.server2.thrift.max.worker.thre...

Re: Hive error: Vertex failed

Re: HIVE MR VS TEZ difference in output, ,Hi,

Re: Falcon Hive Integration

Re: Parameters for Multi-Homing