Member since 
    
	
		
		
		09-24-2015
	
	
	
	
	
	
	
	
	
	
	
	
	
	
			
      
                10
            
            
                Posts
            
        
                9
            
            
                Kudos Received
            
        
                3
            
            
                Solutions
            
        My Accepted Solutions
| Title | Views | Posted | 
|---|---|---|
| 3142 | 01-08-2016 05:58 PM | |
| 2124 | 09-24-2015 04:14 AM | 
			
    
	
		
		
		01-14-2016
	
		
		05:31 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Is the Ranger plugin properly installed?  For example, do you any evidence of it in Ranger Audit logs, e.g. kafaka server connecting to Ranger to download policies or access log indicating that access was allowed by ranger? 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		01-08-2016
	
		
		05:58 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 Some additional things to consider:   Cost of transporting data: Azure bills for network usage.  This is not an issue, for example, if the MSSQL that you are ingesting data from is also in Azure.    If data is going to live in the cluster for long, e.g. several weeks, then your best bang for buck is going to be to host it in your datacenter on bare metal.  Obviously, an important argument in favor of HDInsight would be savings in terms of ease of managing the cluster.  Also lack of in house speed, skill and ability to host a cluster in your DC would preclude this option.  Why is that?  Because it goes against the grain of a basic tenet of Hadoop: "take processing to data instead of taking processing to data".  HDInsight does not store data data locally; it is stored in Azure Blob Storage.  So all data must be brought to processing (from Azure cloud storage to computer nodes of the cluster).  This is more important if you are doing I/O heavy processing, e.g. running data intensive MR loads like hive queries against data in DFS backed by Azure Blob Storage.  For comparison, if you were running, say, a Spark load then this may not be an issue because the main bottleneck is compute not data transport.     In general, HDInsight might be best suited for a targeted workload where you fire up a temporary cluster do your analysis and then take it down.  For completeness, I should mention that HDInsight does have a tiny local DFS but that is to store temporary files created during MR runs. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		11-09-2015
	
		
		05:22 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
				
		
	
		
					
							 Audit logging happens in the plugin.  Please review HS2 and NN logs for cause. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		10-23-2015
	
		
		10:32 PM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Decision about using Page blob vs Block blob can be bit more nuanced, at least, when it comes to using Azure Blob store for HDFS.  This page provides good overview: https://hadoop.apache.org/docs/current/hadoop-azure/index.html#Page_Blob_Support_and_Configuration. 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-24-2015
	
		
		04:17 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		1 Kudo
		
	
				
		
	
		
					
							 Indeed!  We have an Apache JIRA created for it already and it is a prime candidate to get scheduled soon.  In the meantime we are also working to have this documented in the interim.  Best, 
						
					
					... View more
				
			
			
			
			
			
			
			
			
			
		
			
    
	
		
		
		09-24-2015
	
		
		04:14 AM
	
	
	
	
	
	
	
	
	
	
	
	
	
	
		
	
				
		
			
					
	
		3 Kudos
		
	
				
		
	
		
					
							 Yes. This is known issue. You can get around this by "pre-creating" the database ranger,ranger_audit with the latin1 character set.  create database ranger CHARACTER SET=latin1;
create database ranger_audit CHARACTER SET=latin1;
 
						
					
					... View more