Created on 05-13-2015 01:25 PM - edited 09-16-2022 02:28 AM
I understand Spark History Server is an independent module (not related to YARN's Job History Server).
I have deployed CDH 5.4 (via parcels) but Spark History Server is not there!!
<Q1> How do I install Spark History Server? via parcels or via RPMs?
<Q2> Any special configuration for deploying Spark History Server?
<Q3> what port Spark History Server is running on?
<Q4> So far I have deployed 1 Spark Master (Master Web UI), several Spark Workers.
What other 'services' could be deployed?
For instane, YARN has: ResourceManager WE UI, HistoryServer Web UI, Dynamic Resource Pools.
Created 05-13-2015 03:19 PM
Basically, I have to instantiate these steps via a CP API Python script:
To add the History Server:
1.Go to the Spark service.
2.Click the Instances tab.
3.Click the Add Role Instances button.
4.Select a host in the column under History Server, then click OK.
5.Click Continue.
6.Check the checkbox next to the History Server role.
7.Select Actions for Selected > Start and click Start.
8.Click Close when the action completes.
Created 05-13-2015 01:53 PM
The History Server is part of the "Spark" service and is one of the roles you deploy through it. You don't have to configure it specially, but you can, including what port it's on. Normally you would not run a Spark master or worker at all, but just use YARN; I'd advise that. There are not other Spark services besides these 3.
Created 05-13-2015 02:08 PM
Actually, here is what I have deployed/confgured for Spark:
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
scm=> select * from services where service_id = 24;
service_id | optimistic_lock_version | name | service_type | cluster_id | maintenance_count | display_name | generation
------------+-------------------------+--------+--------------+------------+-------------------+--------------+------------
24 | 34 | spark0 | SPARK | 25 | 0 | spark0 | 1
(1 row)
scm=> select role_type, configured_status, host_id from roles where service_id = 24;
role_type | configured_status | host_id
--------------+-------------------+---------
SPARK_WORKER | RUNNING | 1
GATEWAY | NA | 4
GATEWAY | NA | 5
GATEWAY | NA | 6
GATEWAY | NA | 3
GATEWAY | NA | 1
GATEWAY | NA | 2
SPARK_WORKER | RUNNING | 2
SPARK_WORKER | RUNNING | 6
SPARK_WORKER | RUNNING | 8
SPARK_WORKER | RUNNING | 5
SPARK_WORKER | RUNNING | 7
SPARK_WORKER | RUNNING | 3
SPARK_MASTER | RUNNING | 4
(14 rows)
<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>
That tells me that the 'Spark History Server' role is not installed.
Do I have to install it, and if so how??
Thank you!
Created 05-13-2015 02:09 PM
Yes you add this role to a server just like with any other service/role in CM. Look at the Spark service. Spark Gateway is a "role" but not a server process, FWIW. Just means spark-submit et al can be run on that machine.
Created 05-13-2015 02:14 PM
I have been using CM API python scripts for adding Hadoop services into a CDH cluster.
I would like to add the Spark History Server role by calling a script.
Could you please provide me with some samples/links/docs to create it.
Thank you!
Created 05-13-2015 03:19 PM
Basically, I have to instantiate these steps via a CP API Python script:
To add the History Server:
1.Go to the Spark service.
2.Click the Instances tab.
3.Click the Add Role Instances button.
4.Select a host in the column under History Server, then click OK.
5.Click Continue.
6.Check the checkbox next to the History Server role.
7.Select Actions for Selected > Start and click Start.
8.Click Close when the action completes.