Member since 
    
	
		
		
		04-20-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                10
            
            
                Posts
            
        
                1
            
            
                Kudos Received
            
        
                0
            
            
                Solutions
            
        
			
    
	
		
		
		03-01-2018
	
		
		11:49 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi @Dominika Bialek and @stevel,  Thank you for your valuable inputs. The following set of configurations worked for me even for 1 TB file size each. It depends on the infrastructure, network bandwidth between nodes and upload speed (bandwidth) from data node to S3 etc. It took several iterations to do stress tests with many files of small file sizes, small number of big files etc and tune fast buffer size, multipart size and thresholds based on my cluster infrastructure speeds.  -Dmapreduce.task.timeout=0 \
-Dfs.s3a.fast.upload=true \
-Dfs.s3a.fast.buffer.size=157286400 \
-Dfs.s3a.multipart.size=314572800 \
-Dfs.s3a.multipart.threshold=1073741824 \
-Dmapreduce.map.memory.mb=8192 \
-Dmapreduce.map.java.opts=-Xmx7290m \
-Dfs.s3a.max.total.tasks=1 \
-Dfs.s3a.threads.max=10 \
-bandwidth 1024
  Thanks  Surya 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		02-21-2018
	
		
		06:58 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hello,  I have a cluster with HDP 2.5.0.2 (HDFS version 2.7.1.2.5) and I'm trying to distcp large file (200G) from on-premise cluster to Netapp S3, using fast upload. a map task is launched for each file and each map task reads something like this  Task:
task_1518158406102_3329_m_000000
Status:
(84.5% Copying hdfs://HDP1/user/backup/backup-test/2tb_of_200M_each/test_io_0 to s3a://hadoop-acceptance-tests/user/backup/backup-test/test_io_0 [169.1G/200.0G] > map)  Once the progress reaches 100% (200.0G/200.0G), then it starts again from 0% (0%/200.0G). This repeats for few times.  Here is the command that I used to trigger distcp  hadoop distcp -Dfs.s3a.endpoint=s3.in.myhost:8082 -Dfs.s3a.access.key=XXXXXXXXX -Dfs.s3a.secret.key=YYYYYYYYYYY -Dfs.s3a.signing-algorithm=S3SignerType \
-Dfs.s3a.buffer.dir=<Local_path>/tmp-hadoop_backup \
-Dfs.s3a.fast.upload=true \
-Dfs.s3a.fast.buffer.size=1048576 \
-Dfs.s3a.multipart.size=10485760 \
-Dfs.s3a.multipart.threshold=10485760 \
-Dmapreduce.map.memory.mb=8192 \
-Dmapreduce.map.java.opts=-Xmx7360m \
-log /backup/log/distcp_2tb \
-m=300 -bandwidth 1024 \
-skipcrccheck \
-update hdfs:/user/backup/load-test/2tb_of_200M_each s3a://hadoop-acceptance-tests/user/backup/load-test/
  Q1. What is the meaning of running each task several times?  Q2. After running the job for several hours, job fails with not even copying a single file. It worked (copied) for 10G, 50G and 100G file size each.  Q3. Any recommendations/ thumb rules to change/tune the configs based on input file size and upload speeds to s3?  Thanks  Surya 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
			
    
	
		
		
		03-08-2017
	
		
		11:26 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Dear Team,  Presently an Ambari managed HDP cluster is running perfectly alright. Due to a change of ownership of the infrastructure, there is a change in the IP ranges, hostnames and domain names of the hosts.  So, I have to apply the changes while the cluster is running as usual (or may be with a minimal outages.)  Someone might have done this exercise before. Please share any reference links and/or steps you might have documented.  Do we have any scripts already come with the installation?  Thanks in advance.  The following information may help:  Ambari version 2.4.x  HDP version 2.5  Kerberos - Not Enabled  AD integrated - Yes  SSL certificates - None  Thanks  Surya 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
			
    
	
		
		
		07-07-2016
	
		
		04:41 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Deepak Sharma, Thank you for the quick reply..  I was trying to bring the github code to easily readable format in excel spreadsheet. For instance, for HDFS service the service checks are as follows..      Trying to prepare the same for rest of the services.. if anyone already did this exercise, please save my time. 🙂  Thanks  Surya 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		07-04-2016
	
		
		09:22 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hi All,  I was running the "Run Service Checks" from service actions and getting a overall status as success. I'm interested to know what all the checks covered under this for each service.  Is there a documentation on this? (or) anyone listed out this? Please share the information  Thanks,  Surya 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Ambari
			
    
	
		
		
		04-21-2016
	
		
		05:34 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Hive didn't work after enabling Ranger. HiverServer2 failed to restart. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hive
- 
						
							
		
			Apache Ranger
 
        








