Member since
09-24-2015
10
Posts
9
Kudos Received
3
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2248 | 01-08-2016 05:58 PM | |
1520 | 09-24-2015 04:14 AM |
01-14-2016
05:31 PM
1 Kudo
Is the Ranger plugin properly installed? For example, do you any evidence of it in Ranger Audit logs, e.g. kafaka server connecting to Ranger to download policies or access log indicating that access was allowed by ranger?
... View more
01-08-2016
05:58 PM
3 Kudos
Some additional things to consider: Cost of transporting data: Azure bills for network usage. This is not an issue, for example, if the MSSQL that you are ingesting data from is also in Azure. If data is going to live in the cluster for long, e.g. several weeks, then your best bang for buck is going to be to host it in your datacenter on bare metal. Obviously, an important argument in favor of HDInsight would be savings in terms of ease of managing the cluster. Also lack of in house speed, skill and ability to host a cluster in your DC would preclude this option. Why is that? Because it goes against the grain of a basic tenet of Hadoop: "take processing to data instead of taking processing to data". HDInsight does not store data data locally; it is stored in Azure Blob Storage. So all data must be brought to processing (from Azure cloud storage to computer nodes of the cluster). This is more important if you are doing I/O heavy processing, e.g. running data intensive MR loads like hive queries against data in DFS backed by Azure Blob Storage. For comparison, if you were running, say, a Spark load then this may not be an issue because the main bottleneck is compute not data transport. In general, HDInsight might be best suited for a targeted workload where you fire up a temporary cluster do your analysis and then take it down. For completeness, I should mention that HDInsight does have a tiny local DFS but that is to store temporary files created during MR runs.
... View more
11-09-2015
05:22 PM
Audit logging happens in the plugin. Please review HS2 and NN logs for cause.
... View more
10-23-2015
10:32 PM
1 Kudo
Decision about using Page blob vs Block blob can be bit more nuanced, at least, when it comes to using Azure Blob store for HDFS. This page provides good overview: https://hadoop.apache.org/docs/current/hadoop-azure/index.html#Page_Blob_Support_and_Configuration.
... View more
09-24-2015
04:17 AM
1 Kudo
Indeed! We have an Apache JIRA created for it already and it is a prime candidate to get scheduled soon. In the meantime we are also working to have this documented in the interim. Best,
... View more
09-24-2015
04:14 AM
3 Kudos
Yes. This is known issue. You can get around this by "pre-creating" the database ranger,ranger_audit with the latin1 character set. create database ranger CHARACTER SET=latin1;
create database ranger_audit CHARACTER SET=latin1;
... View more