Support Questions

Find answers, ask questions, and share your expertise

Unable to see completed application in Spark 2 history web UI

avatar
Master Collaborator

Hello Community,

 
I'm using Spark 2.3 and Spark 1.6.0 in my cluster with Cloudera distribution 5.13.0.
 
Both are configured to run on Yarn, but i'm unable to see completed application in Spark2 history server, while in Spark 1.6.0 i did.
 
1) I checked the HDFS permissions for both folders and both have the same permissions.
 
drwxrwxrwt   - cloudera-scm spark          0 2018-08-08 00:46 /user/spark/applicationHistory
drwxrwxrwt   - cloudera-scm spark          0 2018-08-08 00:46 /user/spark/spark2ApplicationHistory
 
The applications file itself running with permissions 770 in both.
 
-rwxrwx---   3  fawzea spark     4743751 2018-08-07 23:32 /user/spark/spark2ApplicationHistory/application_1527404701551_672816_1
-rwxrwx---   3  fawzea spark       134315 2018-08-08 00:41 /user/spark/applicationHistory/application_1527404701551_673359_1
 
2) No error in the Spark2 history server log.
 
3) Compared the configurations between Spark 1.6 and Spark 2.3 like system user, enable log, etc ... all looks the same.
 
4) Once i changed the permissions for the above Spark2 applications to 777, i was able to see the application in the spark2 history server UI.
 
Tried to figure out if the 2 Sparks UIs running with different users but was unable to find it.
 
Anyone who ran into this issue and solved it? 
 
Thanks in advance.
5 REPLIES 5

avatar
Expert Contributor

You may need to make sure the process owner of the Spark2 history server (by default it is spark user as well), belongs to the group "spark". So that the spark2 history server process would be able to read all the spark2 event log files.

 

You can check the process owner with " ps -ef |grep java| grep SPARK2" on the node where spark2 history server runs on.

avatar
Master Collaborator

Hi @Yuexin Zhang

 

Thanks for your response.

 

The /var/run/process and the ps -ef show that the user and the group is cloudera-scm

 

/var/run/cloudera-scm-agent/process
[root@serever process]# ll | grep Spark
[root@server process]# ll | grep SPARK
drwxr-x--x 7 cloudera-scm cloudera-scm 280 May 27 03:05 19175-spark_on_yarn-SPARK_YARN_HISTORY_SERVER
drwxr-x--x 8 cloudera-scm cloudera-scm 300 May 27 03:17 19240-spark2_on_yarn-SPARK2_YARN_HISTORY_SERVER

 

1829 cloudera  20   0 6682m 451m  33m S  0.3  0.4 379:36.38 /var/jdk8/bin/java -cp /var/run/cloudera-scm-agent/process/19240-spark2_on_yar

 

Also it's intersting for me why it's working for Spark 1.6 and not for Spark2.

 

It may also worth mentioning that my cluster is running with single user "cloudera-scm" as i'm using the cloudera manager in express version

 

 

 

avatar
Expert Contributor

Okay, since the process owner is cloudera-scm,  one way to fix the issue is adding cloudera-scm user to 'spark' group on all nodes.

avatar
Master Collaborator

@Yuexin Zhang Thanks for your response.

 

since i'm accessing it from the Spark History UI, i'm not sure if the UI is running with cloudera-scm user.

 

few things i'm trying to figure out which may hekp me to find a solution for this issue.

 

1- How it working on spark 1.6 differenty, and not in Spark 2, in Spark 1.6 the jobs under hdfs://name-node/user/spark/applicationhistory is written with the user cloudera-scm and group spark with permissions 770.

 

2- How i can know with which user the UI is pulling the data?

 

3- can i change the permission of the files under the hdfs spark history dir by adding specific config?

 

for example : something like spark.eventLog.permissions=755

avatar
Cloudera Employee

Hi,

 

Check for the total no of applications in the Application history path, if the total no of files is more try to increase the heap size and look whether it works. Alternatively look for the spark history server logs too for any errors.

 

Thanks

AKR