Member since 
    
	
		
		
		09-26-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                29
            
            
                Posts
            
        
                0
            
            
                Kudos Received
            
        
                2
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 8849 | 09-15-2017 12:06 PM | |
| 2032 | 09-07-2017 05:52 PM | 
			
    
	
		
		
		07-28-2020
	
		
		11:31 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 "I highly recommend skimming quickly over following slides, specially starting from slide 7.  http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey"     This slide is not there at the path 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-17-2019
	
		
		11:02 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @dstreev   Thanks for your article, I was checking and correct me if I'm wrong, but the same could be done using Knox service, that comes by default with HDP, it's that correct?  Or there is some extra feature with this service?  Regards  Gerard 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-07-2017
	
		
		05:52 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Here's the recommendation from a Hive SME:  You should start by checking off the typical recommendations https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.5/bk_hive-performance-tuning/bk_hive-performance-tuning.pdf   Especially partitioning, depending on how you are accessing your datetime field you may not benefit at all from partitioning pruning.  A safe / proven path is to partition by date and use either an explicit partition key filter or a dimension lookup that allows Hive to infer partition keys from the datetime field.  I don't recall seeing any other specific blob tuning techniques. Ideally you would lazy load the BLOB only if the ID matches but I don't believe there is a way to control that.  One way to get closer to that is to have the ID / datetime mapping in a separate table without the BLOBs. Populating the list of datetimes (query 1) would be faster that way.  Other thoughts:  You should try Hive 2 (in HDP: enable LLAP) which has a bucket pruning optimization, if you cluster by ID it would scan fewer files. I see you are on 2.3 but this could be an incentive to move.  You may try experimenting with ORC stripe sizes.  You might try compressing the blobs to speed the search for a specific ID (if it is a point lookup). The application would need to decompress it.  Long story short, only the 2 pruning options above are system-level optimizations, other than that you are probably looking at dealing with this at the app layer. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-09-2017
	
		
		01:48 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 The recommended approach is to add another Hiveserver2 on another machine. Increasing the thread count will help in the short term, but is not the recommended solution. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-22-2019
	
		
		04:27 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Looks, it's tez issue comes from "fs.permissions.umask-mode" setting.  https://community.hortonworks.com/questions/246302/hive-tez-vertex-failed-error-during-reduce-phase-h.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-30-2017
	
		
		07:55 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 hdp 2.3  and hive 1.2    the hive.enforce.bucketing is default true  What is the need to set? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-03-2017
	
		
		04:21 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Very good article Rahul. Quick question: Does the table have to be partitioned? I'm trying to replicate a non-partitioned table with UI and I'm getting an exception.  default/FalconWebException:FalconException:java.net.URISyntaxException:Partition Details are missing.  How can I replicate this table using the UI? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		08-27-2018
	
		
		08:30 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 The article doesn't indicate this, so for reference, the listed HDFS settings do not exist by default.  These settings, as shown below, need to go into hdfs-site.xml, which is done in Ambari by adding fields under "Custom hdfs-site".  dfs.namenode.rpc-bind-host=0.0.0.0  dfs.namenode.servicerpc-bind-host=0.0.0.0  dfs.namenode.http-bind-host=0.0.0.0  dfs.namenode.https-bind-host=0.0.0.0  Additionally, I found that after making this change, both NameNodes under HA came up as stand-by; the article at https://community.hortonworks.com/articles/2307/adding-a-service-rpc-port-to-an-existing-ha-cluste.html got me the missing step of running a ZK format.  I have not tested the steps below against a Production cluster and if you foolishly choose to follow these steps, you do so at a very large degree of risk (you could lose all of the data in your cluster).  That said, this worked for me in a non-Prod environment:  01) Note the Active NameNode.  02) In Ambari, stop ALL services except for ZooKeeper.  03) In Ambari, make the indicated changes to HDFS.  04) Get to the command line on the Active NameNode (see Step 1 above).  05) At the command line you opened in Step 4, run:  `sudo -u hdfs hdfs zkfc -formatZK`  06) Start the JournalNodes.  07) Start the zKFCs.  08) Start the NameNodes, which should come up as Active and Standby.  If they don't, you're on your own (see the "high risk" caveat above).  09) Start the DataNodes.  10) Restart / Refresh any remaining HDFS components which have stale configs.  11) Start the remaining cluster services.  It would be great if HWX could vet my procedure and update the article accordingly (hint, hint). 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
        







