Reply
Cloudera Employee
Posts: 435
Registered: ‎07-12-2013

Re: I cannot access programmatically a file within a CDH 5.4.2.0 image running in vmware

Also, note that there's a script that tries to detect a public IP and set
up the hosts file for you on boot. If you're going to edit it manually, you
probably want to comment out the line in
/etc/init.d/cloudera-quickstart-init that calls
/usr/bin/cloudera-quickstart-ip. I don't remember which version that was
added in. It might have been 5.5 - so if your VM doesn't have
/usr/bin/cloudera-quickstart-ip you can ignore this post and safely edit
the hosts file anyway.
Explorer
Posts: 25
Registered: ‎12-10-2014

Re: I cannot access programmatically a file within a CDH 5.4.2.0 image running in vmware

Hi Sean

this is how it looks /etc/hosts by default, when the image is restarted:

[cloudera@quickstart ~]$ cat /etc/hosts
127.0.0.1 quickstart.cloudera quickstart localhost localhost.domain
Explorer
Posts: 25
Registered: ‎12-10-2014

Re: I cannot access programmatically a file within a CDH 5.4.2.0 image running in vmware

Hi Sean, that change does not have any effect, spark-worker doesn't run, even if i try to restart it manually with sudo service spark-worker restart, it failed as soon as i ask by its status.

Also, Hue does not work, i think it happens because internally uses quickstart.cloudera in order to talk with another component...

I am starting to think that this vmware image is useless to develop something related with spark, i cannot run anything...
Explorer
Posts: 25
Registered: ‎12-10-2014

Re: I cannot access programmatically a file within a CDH 5.4.2.0 image running in vmware

Hi Sean,

 

doing this provokes that spark-worker doesn't run and Hue does not work properly, i think that it needs internally to use quickstart.cloudera.

If i restart again the vmware image and i redo again what i did within /etc/init.d/cloudera-quickstart-init with the call to cloudera-quickstart-ip, i can login again to Hue, and spark-history-manager runs properly:

 

I have noticed that running a ps xa | grep spark...

 

[cloudera@quickstart ~]$ ps xa | grep spark
 6330 ?        Sl     0:03 /usr/java/jdk1.7.0_67-cloudera/bin/java -cp /usr/lib/spark/conf/:/usr/lib/spark/lib/spark-assembly-1.6.0-cdh5.7.0-hadoop2.6.0-cdh5.7.0.jar:/etc/hadoop/conf/:/usr/lib/spark/lib/spark-assembly.jar:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hive/lib/*:/usr/lib/flume-ng/lib/*:/usr/lib/paquet/lib/*:/usr/lib/avro/lib/* -Dspark.deploy.defaultCores=4 -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.master.Master
 6499 ?        Sl     0:04 /usr/java/jdk1.7.0_67-cloudera/bin/java -cp /usr/lib/spark/conf/:/usr/lib/spark/lib/spark-assembly-1.6.0-cdh5.7.0-hadoop2.6.0-cdh5.7.0.jar:/etc/hadoop/conf/:/usr/lib/spark/lib/spark-assembly.jar:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hive/lib/*:/usr/lib/flume-ng/lib/*:/usr/lib/paquet/lib/*:/usr/lib/avro/lib/* -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.worker.Worker spark://quickstart.cloudera:7077
 6674 ?        Sl     0:05 /usr/java/jdk1.7.0_67-cloudera/bin/java -cp /etc/spark/conf/:/usr/lib/spark/lib/spark-assembly-1.6.0-cdh5.7.0-hadoop2.6.0-cdh5.7.0.jar:/etc/hadoop/conf/:/usr/lib/spark/lib/spark-assembly.jar:/usr/lib/hadoop/lib/*:/usr/lib/hadoop/*:/usr/lib/hadoop-hdfs/lib/*:/usr/lib/hadoop-hdfs/*:/usr/lib/hadoop-mapreduce/lib/*:/usr/lib/hadoop-mapreduce/*:/usr/lib/hadoop-yarn/lib/*:/usr/lib/hadoop-yarn/*:/usr/lib/hive/lib/*:/usr/lib/flume-ng/lib/*:/usr/lib/paquet/lib/*:/usr/lib/avro/lib/* -Dspark.history.fs.logDirectory=hdfs:///user/spark/applicationHistory -Dspark.history.ui.port=18088 -Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.history.HistoryServer
 6915 pts/0    R+     0:00 grep spark

As you can see, spark-master runs with 4 cores (-Dspark.deploy.defaultCores=4) with no dedicated cores to worker, is it normal? 

 

[cloudera@quickstart ~]$ sudo service spark-worker status
Spark worker is running                                    [  OK  ]
[cloudera@quickstart ~]$ sudo service spark-master status
Spark master is running                                    [  OK  ]
[cloudera@quickstart ~]$ sudo service spark-history-server status
Spark history-server is running                            [  OK  ]

As you can see, it looks normal, but examining http://quickstart.cloudera:18080/ , the spark-master, i can see that 

 

    URL: spark://192.168.30.137:7077
    REST URL: spark://192.168.30.137:6066 (cluster mode)
    Alive Workers: 0
    Cores in use: 0 Total, 0 Used
    Memory in use: 0.0 B Total, 0.0 B Used
    Applications: 0 Running, 0 Completed
    Drivers: 0 Running, 0 Completed
    Status: ALIVE

zero cores in use from zero in total! with no memory, that´s strange, because spark-master tries to use 4 cores (-Dspark.deploy.defaultCores=4) and 1GB (-Xms1g -Xmx1g -XX:MaxPermSize=256m)

 

Then, you can see this output from spark-worker:

 

    ID: worker-20160603111341-192.168.30.137-7078
    Master URL:
    Cores: 4 (0 Used)
    Memory: 6.7 GB (0.0 B Used)

the master URL is not setup and it is using 4 cores with 6.7 GB when spark-worker is running with this setup:

 

-Xms1g -Xmx1g -XX:MaxPermSize=256m org.apache.spark.deploy.worker.Worker spark://quickstart.cloudera:7077

What do you think, what can i do in order to continue developing my project? because, that is what i want, use this vmware image to develop this project using the hdfs to load a tiny file, only 16MB.

 

What most annoys me is that the code works perfectly in the spark-shell  of the virtual image but when I try to make it run programmatically creating the unix with SBT-pack command does not work.

 

Regards

 

Alonso

Cloudera Employee
Posts: 28
Registered: ‎11-24-2015

Re: I cannot access programmatically a file within a CDH 5.4.2.0 image running in vmware

It looks like the networking issue is resolved with the changes to hostfiles. For the remaining issues you may have better luck posting in the Spark forum specifically (http://community.cloudera.com/t5/Advanced-Analytics-Apache-Spark/bd-p/Spark) - I suspect outside of the forum there won't be that many readers familiar with the tricker parts of Spark configuration and SBT-pack in particular.

Highlighted
Explorer
Posts: 25
Registered: ‎12-10-2014

Re: I cannot access programmatically a file within a CDH 5.4.2.0 image running in vmware

Thank you Sean, but the link that you have provided are returning this message:

The core node you are trying to access was not found, it may have been deleted. Please refresh your original page and try the operation again.

EDIT, now i have noticed that there is an extra ")".