Member since
01-22-2014
62
Posts
0
Kudos Received
0
Solutions
12-18-2014
05:32 AM
Hi, I have a local repository pointing to which I want to do the CM5 installation. I have created the local.repo file in /etc/yum.repos.d and given the repo path there (and it is accessible). I execute the command ./cloudera-manager-installer.bin --skip_repo_package=1 to install cloudera manager from the local repo. But in the cloudera manager, when I procedd with the installation, the installation fails as a new cloudera-manager.repo file is created in the /etc/yum.repos.d directory everytime and it is pointing to the archive.cloudera.com site. Hence my installation is failing with the below message. Please help to solve this. Repository cloudera-manager is listed more than once in the configuration
http://archive.cloudera.com/cm5/redhat/6/x86_64/cm/5.2.0/repodata/repomd.xml: [Errno -1] Error importing repomd.xml for cloudera-manager: Damaged repomd.xml file
Trying other mirror.
Error: Cannot retrieve repository metadata (repomd.xml) for repository: cloudera-manager. Please verify its path and try again
... View more
Labels:
- Labels:
-
Cloudera Manager
10-07-2014
11:20 PM
When I execute the following in yarn-client mode its working fine and giving the result properly, but when i try to run in Yarn-cluster mode i am getting error spark-submit --class org.apache.spark.examples.SparkPi --master yarn-client /home/abc/spark/examples/lib/spark-examples_2.10-1.0.0-cdh5.1.0.jar 10 The above code works fine, but when i execute the same code in yarn cluster mode i amgetting the following error. 14/10/07 09:40:24 INFO Client: Application report from ASM:
application identifier: application_1412117173893_1150
appId: 1150
clientToAMToken: Token { kind: YARN_CLIENT_TOKEN, service: }
appDiagnostics:
appMasterHost: N/A
appQueue: root.default
appMasterRpcPort: -1
appStartTime: 1412689195537
yarnAppState: ACCEPTED
distributedFinalState: UNDEFINED
appTrackingUrl: http://spark.abcd.com:8088/proxy/application_1412117173893_1150/
appUser: abc
14/10/07 09:40:25 INFO Client: Application report from ASM:
application identifier: application_1412117173893_1150
appId: 1150
clientToAMToken: null
appDiagnostics: Application application_1412117173893_1150 failed 2 times due to AM Container for appattempt_1412117173893_1150_000002 exited with exitCode: 1 due to: Exception from container-launch:
org.apache.hadoop.util.Shell$ExitCodeException:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:511)
at org.apache.hadoop.util.Shell.run(Shell.java:424)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:656)
at org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor.launchContainer(LinuxContainerExecutor.java:279)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:300)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:81)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
main : command provided 1
main : user is abc
main : requested yarn user is abc
Container exited with a non-zero exit code 1
.Failing this attempt.. Failing the application.
appMasterHost: N/A
appQueue: root.default
appMasterRpcPort: -1
appStartTime: 1412689195537
yarnAppState: FAILED
distributedFinalState: FAILED
appTrackingUrl: spark.abcd.com:8088/cluster/app/application_1412117173893_1150
appUser: abc Where may be the problem? sometimes when i try to execute in yarn-cluster mode i am getting the following , but i dint see any result 14/10/08 01:51:57 INFO Client: Application report from ASM:
application identifier: application_1412117173893_1442
appId: 1442
clientToAMToken: Token { kind: YARN_CLIENT_TOKEN, service: }
appDiagnostics:
appMasterHost: spark.abcd.com
appQueue: root.default
appMasterRpcPort: 0
appStartTime: 1412747485673
yarnAppState: FINISHED
distributedFinalState: SUCCEEDED
appTrackingUrl: http://spark.abcd.com:8088/proxy/application_1412117173893_1442/A
appUser: abc Thanks
... View more
Labels:
- Labels:
-
Apache Spark
09-15-2014
04:51 AM
I am joining two datasets , first one coming from stream and second one which is in HDFS. After joining the two datasets , I need to apply filter on the joined datasets, but here I am facing as issue. Please assist to resolve. I am using the code below, val streamkv = streamrecs.map(_.split("~")).map(r => ( r(0), (r(5), r(6))))
val HDFSlines = sc.textFile("/user/Rest/sample.dat").map(_.split("~")).map(r
=> ( r(1), (r(0) r(3),r(4),)))
val streamwindow = streamkv.window(Minutes(1)) val join1 = streamwindow.transform(joinRDD => { joinRDD.join(HDFSlines)} ) I am getting the following error, when I use the filter val tofilter = join1.filter {
| case (_, (_, _),(_,_,device)) =>
| device.contains("iPhone")
| }.count() error: constructor cannot be instantiated to expected type; found : (T1, T2, T3) required: (String, ((String, String), (String, String, String))) case (_, (_, _),(_,_,device)) => How can I solve this error?.
... View more
Labels:
- Labels:
-
Apache Spark
09-12-2014
06:35 AM
Hi - Does it make a difference if I use a "--master yarn-client" or " --master yarn-cluster" for this error in "spark-submit" since yarn-client uses a local driver?
... View more
09-12-2014
06:10 AM
By latest do you mean the version 1.1.0? So does the version 1.0.0 that comes with CDH5.1 does not have this feature?
... View more
09-12-2014
03:09 AM
Hi, I am streaming data in Spark and doing a join operation with a batch file in HDFS. I am joining one window of the stream with HDFS. I want to calculate the time taken to do this join (for each window) using the below code, but it did not work. (the output was 0 always). I am using the Spark-Shell for this code. Any suggestions on how to achieve this? Thanks! val jobstarttime = System.currentTimeMillis();
val ssc = new StreamingContext(sc, Seconds(60))
val streamrecs = ssc.socketTextStream("10.11.12.13", 5549)
val streamkv = streamrecs.map(_.split("~")).map(r => ( r(0), (r(5), r(6))))
val streamwindow = streamkv.window(Minutes(2))
val HDFSlines = sc.textFile("/user/batchdata").map(_.split("~")).map(r => ( r(1), (r(0))))
val outfile = new PrintWriter(new File("//home//user1//metrics1" ))
val joinstarttime = System.currentTimeMillis();
val join1 = streamwindow.transform(joinRDD => { joinRDD.join(HDFSlines)} )
val joinsendtime = System.currentTimeMillis();
val jointime = (joinsendtime - joinstarttime)/1000
val J = jointime.toString()
val J1 = "\n Time taken for Joining is " + J
outfile.write(J1)
join1.print()
val savestarttime = System.currentTimeMillis();
join1.saveAsTextFiles("/user/joinone5")
val savesendtime = System.currentTimeMillis();
val savetime = (savesendtime - savestarttime)/1000
val S = savetime.toString()
val S1 = "\n Time taken for Saving is " + S
outfile.write(S1)
ssc.start()
outfile.close()
ssc.awaitTermination()
... View more
Labels:
- Labels:
-
Apache Spark
09-11-2014
11:38 PM
Thanks. Please clarify the below - What is the port range that I need to ask the admin team to open on each worker node? And what are these ports used for, Spark Workers already use the port 7078 right? Are these random ports opened for each spark job ?
... View more