Member since 
    
	
		
		
		02-24-2016
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                175
            
            
                Posts
            
        
                56
            
            
                Kudos Received
            
        
                3
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 1928 | 06-16-2017 10:40 AM | |
| 16488 | 05-27-2016 04:06 PM | |
| 1632 | 03-17-2016 01:29 PM | 
			
    
	
		
		
		10-24-2016
	
		
		08:05 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Simon Elliston Ball, @Jonas Straub  
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-24-2016
	
		
		08:04 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 Hi Folks,  I have a nightly job to copy data from Cluster-1 to Cluster-2 using DistCp. Now the issue comes with secured, classified data which is stored on the Source Cluster-1 using TDE and various other techniques. Was referring to the documentation of distCp and looks like it puts the data first on the /tmp wanted to know where does it create this /tmp directory?  on Source Cluster HDFS <root>/tmp OR   <HDFS_ROOT>/<Very_secured_Data_Dir>/tmp ?  Thanks,  SS 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Hadoop
			
    
	
		
		
		10-14-2016
	
		
		10:01 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks @bikas, @lgeorge  Does it mean configuring "Configuring Spark for Wire Encryption" from the documentation http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/spark-encryption.html we will get "data encryption" for the data which is moved inside the network between the single job (between executors) as parts of the tasks. (I.e. during the internal transit of the data among the nodes inside the cluster).   Does "wire encryption for Spark" touch other avenues/benefits also?  Many thanks,  SS 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-10-2016
	
		
		01:37 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I cannot say which is the best, but in production we configured HDP 2.4.2 with 0.5.2 Ranger and 0.9.0.1 Kafka and worked well. If you are going with newer version you might use newer versions of both :). 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-10-2016
	
		
		01:14 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Hi all,  I was going through the latest documentation on Hortonworks website : http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_spark-component-guide/content/spark-encryption.html  I am unable to understand the following line :  -   Configuring Spark for Wire Encryption  "You can configure Spark to protect sensitive data in transit by enabling wire encryption. Spark supports SSL for broadcast and file server protocols, and it uses SASL encryption for the block transfer service. Note, however, that wire encryption is not yet supported for shuffle files, cached data, and other application files."  - Protect sensitive data in transit by encryption ( seems to be for data ingestion part but how? From Kafka?)  - park supports SSL for broadcast and file server protocols,.... (OK)  -however, that wire encryption is not yet supported for shuffle files, cached data, and other application files.".  So where do I get data encrypted and where data is secured and unsecured during the start of the job to execution is finished?  Can someone please enlighten on this?  BTW: In Spark's context, where wire encryption comes into picture?  Many thanks, 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			Apache Spark
			
    
	
		
		
		10-10-2016
	
		
		10:50 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Jonas Straub, @Simon Elliston Ball, @Ana Gillan,@Guilherme Braccialli 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-10-2016
	
		
		10:47 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 Hi experts,  This question is mostly related to DR and backup.  We already have two clusters ( where are exactly same in configuration and one is master and another is hot standby). To mitigate the risk further, we think of a 'cold backup', where we can store the HDFS data just like previous tape based backup solutions. And want to have this stored in our data center. (not on cloud)   We do not want to invest another cluster and use distcp based approach. Want to backup only hdfs data.  What could be the best solution/approach/design around the same.  Let me know if more inputs required.  Many thanks,  SS 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
		
			
				
						
							Labels:
						
						
		
			
	
					
			
		
	
	
	
	
				
		
	
	
- Labels:
- 
						
							
		
			HDFS
			
    
	
		
		
		10-03-2016
	
		
		01:51 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 I think we also have same problem with Kafka 0.9, Spark 1.6.1 and HDP 2.4.2 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-03-2016
	
		
		01:50 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 @Ali Bajwa we have HDP 2.4.2 and when we try to consume the messages form the Secured Kafka topics using Spark Streaming (spark 1.6.1) we can't consume any messages.   I followed the documentation on https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.2/bk_spark-guide/content/spark-streaming-kafka-kerb.html   Was this patch after 2.4.2 or am I missing something.  Thanks. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-15-2016
	
		
		03:06 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Thanks guys,  The missing bit was Kerberbos libraries on the third party machine where we are running the publishing application.  Thanks,  SS 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		 
         
					
				













