Member since 
    
	
		
		
		01-09-2019
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                401
            
            
                Posts
            
        
                163
            
            
                Kudos Received
            
        
                80
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 2598 | 06-21-2017 03:53 PM | |
| 4306 | 03-14-2017 01:24 PM | |
| 2396 | 01-25-2017 03:36 PM | |
| 3841 | 12-20-2016 06:19 PM | |
| 2103 | 12-14-2016 05:24 PM | 
			
    
	
		
		
		05-25-2016
	
		
		02:41 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 This is a case of corrupt pig.tar.gz in hdfs (/hdp/apps/<version>/pig) folder. I am not sure how it ended up with a corrupt version there on a fresh install based on ambari. But once I manually updated with pig.tar.gz from /usr/hdp/<version>/pig/, the error got resolved.   However, confusing part is pig view throwing a completed unrelated error (File does not exist at /user/rmutyal/pig/jobs/test_23-05-2016-14-46-54/stdout)  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-23-2016
	
		
		05:36 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 HA failover is automatic by default if you enabled failover from ambari. Mapreduce jobs won't fail during a failover scenario. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-23-2016
	
		
		04:12 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 No luck with that. This is a cluster with https configured for ambari 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-23-2016
	
		
		03:43 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Please check if proxy users are properly set.   https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.2.0/bk_ambari_views_guide/content/_setup_HDFS_proxy_user.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-23-2016
	
		
		02:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Pig jobs work from gateway node but fail from ambari-view. I crosschecked configs with proxyusers and I have ambariid which is running ambari-server configured there.   Error from RM shows this.   AM Container for appattempt_1463770749228_0048_000001 exited with exitCode: -1000
For more detailed output, check application tracking page:http://<hostname>:8088/cluster/app/application_1463770749228_0048Then, click on links to logs of each attempt.
Diagnostics: ExitCodeException exitCode=2:
gzip: /grid/6/hadoop/yarn/local/filecache/33_tmp/tmp_pig.tar.gz: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
Failing this attempt  Error from Pig view is File does not exist: /user/rmutyal/pig/jobs/test_23-05-2016-14-46-54/stdout but webhcat user configuration looks alright. What am I missing?  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
 - 
						
							
		
			Apache Ambari
 - 
						
							
		
			Apache Hive
 
			
    
	
		
		
		05-23-2016
	
		
		01:29 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		2 Kudos
		
	
				
		
	
		
					
							 You don't need to restart HDP 2.4 cluster. But it is recommended to decommission the node with dead disk, change the disk and add it back to the cluster. This will ensure that data is evenly distributed on all the data disks on this node.   1. To decommission, go to ambari -> Host -> Datanode. That has an option to decommission.   2. Go to nodemanager to decommission it as well.  3. Once it goes to decommissioned state, stop Datanode and nodemanager on that host and replace the disk.   4. Start datanode and nodemanager back.  5.  You will see an option to recommission at the same place. You can click on it to take it out of decommissioned state.  No other services across the cluster need to be stopped and if you have more than 3 datanodes and your default rep. factor is 3, all services will continue.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-22-2016
	
		
		05:56 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hadoop is distributed filesystem and distributed compute, so you can store and process any kind of data. I know that a lot of examples point of csv and DB imports since they are the most common use cases.   I will give a list of ways of how the data that you listed can be used and processed in hadoop. You can see some blogs and public repos for examples.   1. csv  Like you said you will see a lot of examples including in our sandbox tutorials.   2. doc You can put raw 'doc' documents into hdfs and use tika or tesseract to do OCR from these documents.  3. audio and video. You can put raw data again in hdfs. Processing depends on what you want to do with this data. You can extract metadata out of this data using yarn.   4. relational DB. You can take a look at sqoop examples on how you can ingest relations DB into HDFS and use hive/hcatalog to access this data.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-21-2016
	
		
		01:50 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 If this is a production cluster and you are on support, I suggest opening a support ticket since any tweaks can lead to data loss.   Before you more further, please take a back of NN metadata and edits from journal nodes.  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-20-2016
	
		
		08:21 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 If you are looking for opensource volume level encryption tools, we have seen LUKS being used. There will be some overhead from LUKS.   You can take a look at LUKS at https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/7/html/Security_Guide/sec-Encryption.html 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		05-20-2016
	
		
		02:29 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 You can increase memory on your mappers. Take a look at mapreduce.map.memory.mb, mapreduce.reduce.memory.mb, mapreduce.map.java.opts and mapreduce.reduce.java.opts.   I think your mapreduce.map.memory.mb is set to 256MB based on the error. I don't know what else is running on your 3GB node and what heap is given, but you maybe able to allocate 1GB of it to yarn (container memory). It is also possible to get it to run on 15GB node by using node labels. You can also switch off nodemanager on 3GB node if other processes are running no this, so it uses 15GB node.  
						
					
					... View more