Member since
09-26-2015
135
Posts
85
Kudos Received
26
Solutions
About
Steve's a hadoop committer mostly working on cloud integration
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2625 | 02-27-2018 04:47 PM | |
5097 | 03-03-2017 10:04 PM | |
2655 | 02-16-2017 10:18 AM | |
1399 | 01-20-2017 02:15 PM | |
10563 | 01-20-2017 02:02 PM |
11-25-2015
02:02 PM
There is work ongoing on this: as people note, it's not something the app can do itself.. the RM needs to make the decisions Look at YARN-4389 , which will allow individual apps to set a threshold for failures on a node before the RM blacklists it.
... View more
11-24-2015
11:29 PM
Vinay -we do have the ATS integration in this TP, we just omitted something to bridge from the binding classes used in Ambari since the 1.2/1.3 previews to their current names. That's now been fixed, and there is the workaround listed above to address it directly.
... View more
11-24-2015
11:27 PM
Actually, you don't need to install slider on your cluster at all: you just need it on your local system -it will install itself via HDFS and YARN. This lets you use different versions of slider for different applications, and upgrade without needing any cluster admin.
... View more
11-18-2015
04:34 PM
1 Kudo
Hadoop doesn't work properly on Power parts; various outstanding JIRAs related to porting the native code. IBMs not put in any real effort to address those or testing it —and until then worrying about HDP support is moot.
... View more
11-18-2015
04:30 PM
smaller blocks take up more space in the namenode tables, so in a large cluster, small blocks come at a price. What small block sizes can do is allow for more workers to get at the data (half the blocksize == twice the bandwidth), but it also means that code that works with > 128MB of data isn't going to get all the data local to a machine, so more network traffic may occur. And, for apps that spin up fast, you may find that 128 MB blocks are streamed through fast enough that the overhead of scheduling containers and starting up the JVMs outweighs the extra bandwidth opportunities. So the notion of "optimal size" isn't really so clear cut. If you've got a big cluster and you are running out of NN heap space, you're going to want to have a bigger block size whether or not your code likes it. Otherwise, it may depend on your data and the uses made of it. As an experiment, try to save copies of the same data with different block sizes. Then see which is faster to query
... View more
11-09-2015
04:02 PM
1 Kudo
you shouldn't be running services as root, for obvious reasons. If you are on an insecure cluster, then all YARN jobs submitted will run as the service wide user. If that is "root", then your entire cluster belongs to the first malicious person running a job. If you are running on a kerberos cluster -as you should- you need separate accounts for every individual user of the cluster, so you aren't saving on any setup effort
... View more
10-23-2015
10:08 AM
I can't think of anything obvious, but theres some online instructions on using Chrome; that may behave differently: http://www.ghostar.org/2015/06/google-chrome-spnego-and-webhdfs-on-hadoop/
... View more
10-19-2015
03:37 PM
You could maybe do it in hbase if there were specific columns for regions, e.g. the EU-UK columns would hold uk stuff --then use the security settings to restrict access to those columns by users within the group of people. That won't address replication: HBase data has to live in the place the HDFS Cluster is, be it EU, US or elsewhere.
... View more
10-09-2015
09:14 AM
Looking at the source; it's almost possible to do it, but the checks are hidden in some private/protected code. It's something that could be made public -why not file a JIRA on the Apache server?
... View more
- « Previous
- Next »