Reply
Highlighted
New Contributor
Posts: 1
Registered: ‎04-24-2014

Launching an EC2 cluster with Whirr

Hi,

 

I'm trying to launch an EC2 cluster using Apache Whirr. My configuration is as follows:

 

whirr.cluster-name=myhadoopcluster
whirr.instance-templates=1 hadoop-namenode+yarn-resourcemanager+mapreduce-historyserver,2 hadoop-datanode+yarn-nodemanager
whirr.provider=aws-ec2
whirr.identity=amazon_key
whirr.credential=amazon_secret
whirr.private-key-file=${sys:user.home}/.ssh/whirr
whirr.public-key-file=${sys:user.home}/.ssh/whirr.pub
whirr.env.mapreduce_version=2
whirr.env.repo=cdh4
whirr.hadoop.install-function=install_cdh_hadoop
whirr.hadoop.configure-function=configure_cdh_hadoop
whirr.mr_jobhistory.start-function=start_cdh_mr_jobhistory
whirr.yarn.configure-function=configure_cdh_yarn
whirr.yarn.start-function=start_cdh_yarn
whirr.hardware-id=m1.medium
whirr.image-id=us-east-1/ami-ccb35ea5
whirr.location-id=us-east-1
whirr.aws-ec2-spot-price=0.017

 

I'm following the docs showed in this link (http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH5/latest/CDH5-Installation-Guide/c... that. My cluster seems to be up after a few minutes, so I can ssh-connect both the master and the 2 slaves nodes. I even have access to the ResourceManager Web UI and the Namenode. Services as datanodes are also up. Unfortunately when I tried to run a yarn job from command line I get the next error:

 

-bash-3.2$ hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples.jar wordcount input output
14/04/24 14:14:49 WARN conf.Configuration: session.id is deprecated. Instead, use dfs.metrics.session-id
14/04/24 14:14:49 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
14/04/24 14:14:49 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/user/ivan814841750/.staging/job_local814841750_0001
14/04/24 14:14:49 ERROR security.UserGroupInformation: PriviledgedActionException as:ivan (auth:SIMPLE) cause:ENOENT: No such file or directory
ENOENT: No such file or directory
at org.apache.hadoop.io.nativeio.NativeIO.chmod(Native Method)
at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:581)
at org.apache.hadoop.fs.FilterFileSystem.setPermission(FilterFileSystem.java:427)
at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:579)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:171)
at org.apache.hadoop.mapreduce.JobSubmitter.copyAndConfigureFiles(JobSubmitter.java:293)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:364)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1286)
at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1283)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:416)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1283)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1304)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:84)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:72)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:144)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:622)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

 

I have some other doubts about whirr (I'm very new to this technology) and launching a cluster with it:

1)  After I launch my cluster, Whirr gives me the ssh command I have to run in order to have access to my nodes. The problem I have is that even I can have access to them I would like to access directly to Amazon using those keys. I have noticed that in my AWS console, cluster's instances have a different key pair (something like jclouds#myhadoopcluster#a7f). As long as it's not possible to download those keys, I was wondering how to initialize the cluster witth a valid keypair to access to AWS machines afterwards.

 

2) Another issue that I have find is that even that I have tried several Amazon images to run the cluster, all of them have failed but the one is pointed in the cloudera docs. Is there any list where valild AMI are defined?

 

 

Regards,

I.

Posts: 1,896
Kudos: 433
Solutions: 303
Registered: ‎07-31-2013

Re: Launching an EC2 cluster with Whirr

Am unsure on your "other doubts" so I'll defer to an AMZN expert on those.

For your posted error though, it appears the node you're submitting the job from does not have proper client configurations under /etc/hadoop/conf/. The job's trying to run locally, which should not happen. You may want to check your local configuration files, especially mapred-site.xml should be carrying a "mapreduce.framework.name" property with its value as "yarn", aside of fully configured yarn-site, hdfs-site and core-site xmls.
Announcements

Our community is getting a little larger. And a lot better.


Learn More about the Cloudera and Hortonworks community merger planned for late July and early August.