Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Audit Spark

Audit Spark

Explorer

Greetings to all,

Please maybe someone knows some way to audit the users and scripts running on spark.

What I would like to know in the audit would be:

- User running the script

- Script or statement that execute

- Date and Time

- IP

- Etc.

Thank you

1 REPLY 1
Highlighted

Re: Audit Spark

Super Collaborator

Hi @Angel Chiluisa,

You can get this information from Yarn using the application type SPARK(or any other if you wish)

yarn application -appStates ALL -appTypes SPARK -list 2>/dev/null | egrep "^application" | tr -s " " " " | cut -d " " -f1 | xargs -I appid yarn application -status appid 2>/dev/null

Example Output :

[centos@projecthdpm0 ~]$ yarn application -appStates ALL  -appTypes SPARK -list 2>/dev/null | egrep "^application" | tr -s " " " " | cut -d " " -f1 | xargs -I appid yarn application -status appid 2>/dev/null
Application Report :
	Application-Id : application_1505008980354_0002
	Application-Name : Spark shell
	Application-Type : SPARK
	User : hdfs
	Queue : default
	Application Priority : null
	Start-Time : 1505012591238
	Finish-Time : 1505012969092
	Progress : 100%
	State : FINISHED
	Final-State : SUCCEEDED
	Tracking-URL : projecthdpe0.field.hortonworks.com:18080/history/application_1505008980354_0002/1
	RPC Port : 0
	AM Host : 172.26.206.110
	Aggregate Resource Allocation : 2060380 MB-seconds, 1132 vcore-seconds
	Log Aggregation Status : TIME_OUT
	Diagnostics :
	Unmanaged Application : false
	Application Node Label Expression : <Not set>
	AM container Node Label Expression : <DEFAULT_PARTITION>
Application Report :
	Application-Id : application_1505008980354_0004
	Application-Name : Spark shell
	Application-Type : SPARK
	User : hdfs
	Queue : default
	Application Priority : null
	Start-Time : 1505013401185
	Finish-Time : 1505013533692
	Progress : 100%
	State : FINISHED
	Final-State : SUCCEEDED
	Tracking-URL : projecthdpe0.field.hortonworks.com:18080/history/application_1505008980354_0004/1
	RPC Port : 0
	AM Host : 172.26.206.112
	Aggregate Resource Allocation : 702626 MB-seconds, 387 vcore-seconds
	Log Aggregation Status : SUCCEEDED
	Diagnostics :
	Unmanaged Application : false
	Application Node Label Expression : <Not set>
	AM container Node Label Expression : <DEFAULT_PARTITION>
Application Report :
	Application-Id : application_1505008980354_0003
	Application-Name : Spark shell
	Application-Type : SPARK
	User : hdfs
	Queue : default
	Application Priority : null
	Start-Time : 1505013316968
	Finish-Time : 1505013357933
	Progress : 100%
	State : FINISHED
	Final-State : SUCCEEDED
	Tracking-URL : projecthdpe0.field.hortonworks.com:18080/history/application_1505008980354_0003/1
	RPC Port : 0
	AM Host : 172.26.206.108
	Aggregate Resource Allocation : 231356 MB-seconds, 124 vcore-seconds
	Log Aggregation Status : TIME_OUT
	Diagnostics :
	Unmanaged Application : false
	Application Node Label Expression : <Not set>
	AM container Node Label Expression : <DEFAULT_PARTITION>

the start time and end times are in Epoch format and that can be converted to GMT or any other times depends up on the customer location.

* please note that there can be many logs which will be passed from the initial command, so if you are just after the jobs and their start times first half of the command will do the job for you as the application will be tagged with start time in name

yarn application -appStates ALL  -appTypes SPARK -list 2>/dev/null

Hope this helps !!