Member since
09-26-2016
29
Posts
0
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
7736 | 09-15-2017 12:06 PM | |
1615 | 09-07-2017 05:52 PM |
07-28-2020
11:31 PM
"I highly recommend skimming quickly over following slides, specially starting from slide 7. http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey" This slide is not there at the path
... View more
09-17-2019
11:02 AM
Hi @dstreev Thanks for your article, I was checking and correct me if I'm wrong, but the same could be done using Knox service, that comes by default with HDP, it's that correct? Or there is some extra feature with this service? Regards Gerard
... View more
09-07-2017
05:52 PM
Here's the recommendation from a Hive SME: You should start by checking off the typical recommendations https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.5/bk_hive-performance-tuning/bk_hive-performance-tuning.pdf Especially partitioning, depending on how you are accessing your datetime field you may not benefit at all from partitioning pruning. A safe / proven path is to partition by date and use either an explicit partition key filter or a dimension lookup that allows Hive to infer partition keys from the datetime field. I don't recall seeing any other specific blob tuning techniques. Ideally you would lazy load the BLOB only if the ID matches but I don't believe there is a way to control that. One way to get closer to that is to have the ID / datetime mapping in a separate table without the BLOBs. Populating the list of datetimes (query 1) would be faster that way. Other thoughts: You should try Hive 2 (in HDP: enable LLAP) which has a bucket pruning optimization, if you cluster by ID it would scan fewer files. I see you are on 2.3 but this could be an incentive to move. You may try experimenting with ORC stripe sizes. You might try compressing the blobs to speed the search for a specific ID (if it is a point lookup). The application would need to decompress it. Long story short, only the 2 pruning options above are system-level optimizations, other than that you are probably looking at dealing with this at the app layer.
... View more
08-09-2017
01:48 PM
The recommended approach is to add another Hiveserver2 on another machine. Increasing the thread count will help in the short term, but is not the recommended solution.
... View more
05-22-2019
04:27 AM
Looks, it's tez issue comes from "fs.permissions.umask-mode" setting. https://community.hortonworks.com/questions/246302/hive-tez-vertex-failed-error-during-reduce-phase-h.html
... View more
11-30-2017
07:55 AM
hdp 2.3 and hive 1.2 the hive.enforce.bucketing is default true What is the need to set?
... View more
05-03-2017
04:21 PM
Very good article Rahul. Quick question: Does the table have to be partitioned? I'm trying to replicate a non-partitioned table with UI and I'm getting an exception. default/FalconWebException:FalconException:java.net.URISyntaxException:Partition Details are missing. How can I replicate this table using the UI?
... View more
08-27-2018
08:30 PM
The article doesn't indicate this, so for reference, the listed HDFS settings do not exist by default. These settings, as shown below, need to go into hdfs-site.xml, which is done in Ambari by adding fields under "Custom hdfs-site". dfs.namenode.rpc-bind-host=0.0.0.0 dfs.namenode.servicerpc-bind-host=0.0.0.0 dfs.namenode.http-bind-host=0.0.0.0 dfs.namenode.https-bind-host=0.0.0.0 Additionally, I found that after making this change, both NameNodes under HA came up as stand-by; the article at https://community.hortonworks.com/articles/2307/adding-a-service-rpc-port-to-an-existing-ha-cluste.html got me the missing step of running a ZK format. I have not tested the steps below against a Production cluster and if you foolishly choose to follow these steps, you do so at a very large degree of risk (you could lose all of the data in your cluster). That said, this worked for me in a non-Prod environment: 01) Note the Active NameNode. 02) In Ambari, stop ALL services except for ZooKeeper. 03) In Ambari, make the indicated changes to HDFS. 04) Get to the command line on the Active NameNode (see Step 1 above). 05) At the command line you opened in Step 4, run: `sudo -u hdfs hdfs zkfc -formatZK` 06) Start the JournalNodes. 07) Start the zKFCs. 08) Start the NameNodes, which should come up as Active and Standby. If they don't, you're on your own (see the "high risk" caveat above). 09) Start the DataNodes. 10) Restart / Refresh any remaining HDFS components which have stale configs. 11) Start the remaining cluster services. It would be great if HWX could vet my procedure and update the article accordingly (hint, hint).
... View more