About stevel

stevel · ‎11-25-2015

There is work ongoing on this: as people note, it's not something the app can do itself.. the RM needs to make the decisions Look at YARN-4389 , which will allow individual apps to set a threshold for failures on a node before the RM blacklists it.

stevel · ‎11-24-2015

I'd go with the extra DNS entries for the different services.

stevel · ‎11-24-2015

Vinay -we do have the ATS integration in this TP, we just omitted something to bridge from the binding classes used in Ambari since the 1.2/1.3 previews to their current names. That's now been fixed, and there is the workaround listed above to address it directly.

stevel · ‎11-24-2015

Actually, you don't need to install slider on your cluster at all: you just need it on your local system -it will install itself via HDFS and YARN. This lets you use different versions of slider for different applications, and upgrade without needing any cluster admin.

stevel · ‎11-18-2015

Hadoop doesn't work properly on Power parts; various outstanding JIRAs related to porting the native code. IBMs not put in any real effort to address those or testing it —and until then worrying about HDP support is moot.

stevel · ‎11-18-2015

smaller blocks take up more space in the namenode tables, so in a large cluster, small blocks come at a price. What small block sizes can do is allow for more workers to get at the data (half the blocksize == twice the bandwidth), but it also means that code that works with > 128MB of data isn't going to get all the data local to a machine, so more network traffic may occur. And, for apps that spin up fast, you may find that 128 MB blocks are streamed through fast enough that the overhead of scheduling containers and starting up the JVMs outweighs the extra bandwidth opportunities. So the notion of "optimal size" isn't really so clear cut. If you've got a big cluster and you are running out of NN heap space, you're going to want to have a bigger block size whether or not your code likes it. Otherwise, it may depend on your data and the uses made of it. As an experiment, try to save copies of the same data with different block sizes. Then see which is faster to query

stevel · ‎11-09-2015

you shouldn't be running services as root, for obvious reasons. If you are on an insecure cluster, then all YARN jobs submitted will run as the service wide user. If that is "root", then your entire cluster belongs to the first malicious person running a job. If you are running on a kerberos cluster -as you should- you need separate accounts for every individual user of the cluster, so you aren't saving on any setup effort

stevel · ‎10-23-2015

I can't think of anything obvious, but theres some online instructions on using Chrome; that may behave differently: http://www.ghostar.org/2015/06/google-chrome-spnego-and-webhdfs-on-hadoop/

stevel · ‎10-19-2015

You could maybe do it in hbase if there were specific columns for regions, e.g. the EU-UK columns would hold uk stuff --then use the security settings to restrict access to those columns by users within the group of people. That won't address replication: HBase data has to live in the place the HDFS Cluster is, be it EU, US or elsewhere.

stevel · ‎10-09-2015

Looking at the source; it's almost possible to do it, but the checks are hidden in some private/protected code. It's something that could be made public -why not file a JIRA on the Apache server?

Online	Offline
Last Visited	‎03-13-2023 07:42 AM

Name	Steve Loughran
Location	Bristol, England
Member Since	‎09-26-2015 10:24 AM
Last Visited	‎03-13-2023 07:42 AM
Posts	135
Kudos received	85

Cloudera Community

Re: Hbase Restore using the Backup ID from S3 thro...

Re: What is EMRFS? Is it a file system in AWS that...

Re: How to hotswap Data node hard disk without sto...

Re: HDFS Encryption Data at Rest - in Non-Kerberiz...

Re: Spark Weird Error

Re: Can we avoid Resource Manager to retry failed ...

Re: Facilitating HDP cluster hostname change

Re: Spark 1.5.1 Tech Preview

Re: Slider service install not required for slider...

Re: Dose HDP support IBM Power Systems Servers?

Re: What factors warrant going to a higher hdfs bl...

Re: Running all services as same user

Re: SPNEGO access for oozie/timelineserver ui with...

Re: Geographically Distributed HBase

Re: Is there a way to get Ticket expiration time f...