Member since
04-27-2016
14
Posts
19
Kudos Received
0
Solutions
05-20-2016
09:15 AM
Hi @azeltov, Could you run R-Studio remote, i.e. on your laptop and connect to a cluster? Cheers, Christian
... View more
05-20-2016
08:43 AM
Hi @Neeraj Sabharwal, Do you have any updates on when this will be available in the code and then in HDP? It is key for enterprise deployment. The Jira is still open. Thanks, Christian
... View more
05-19-2016
09:38 AM
2 Kudos
And to be more detailed on one aspect of that point. If you stored sensitive information on HDFS it will still be stored physically just not easily accessible anymore. So if you format HDFS to destroy the information you will not succeed. Your best bet is to treat the disks that store HDFS with a common overwrite procedure as you would do when removing sensitive information from local file systems.
... View more
05-10-2016
01:48 PM
1 Kudo
Hello @Issaq Mohammad, Here are some useful posts on file formats: Getting started with Text and Apache Hive Optimising Hadoop with Text and Hive Faster Big Data on Hadoop with Hive and RCFile Optimising Hadoop with Text and Hive
I hope that helps you to navigate the space a bit better.
... View more
05-10-2016
01:42 PM
Hi Ed, It would be useful to know if you are aiming for HA or performance. Since it is a small cluster you may use it as a POC and not care much about HA, I don't know. One option not mentioned below is going with 3 masters and 3 slaves in a small HA cluster setup. That allows you to balance services on the masters more and/or dedicate one to be mostly an edge node. If security is a topic that may come in handy. Cheers, Christian
... View more
05-10-2016
01:32 PM
Good point @Pradeep Bhadani, if you want to 'force' a check of specific blocks you can read the corresponding files, e.g. via Hive or MR, and run check command afterwards to see if an error was found. The reasoning is the expense incurred from checking a whole filesystem that may be PBs across hundreds of nodes.
... View more
05-09-2016
03:58 PM
Hi Ben, Is there any useful information you could share on your failure that may help with debugging or finding an alternative?
... View more
05-06-2016
09:57 AM
1 Kudo
Note, if you are running your cluster in the cloud or use virtualization you may end up in a situation where multiple VMs run on the same physical host. In that case, a physical failure may have the grave consequences that you lose data, e.g. if all replica are stored on the same physical host. The likelihood of this depends on the cloud provider and may be high or remote. Be aware of this risk and prepare with copies on highly durable (object) storage like S3 for DR.
... View more