Reply
New Contributor
Posts: 1
Registered: ‎07-30-2018

Unknown origin of jobs submitted by dr.who

I setup a small 4-node Cloudera Cluster just for POC purposes and am facing this pain point. As visible in the screenshot, I can a lot of jobs submitted by the default Hadoop user, dr.who. 

This is what I have tried so far: 
I restricted the internet access to the cluster completely by restricting the ports to one particular vpc/subnet.
I checked out this thread before posting this, but I am not able to solve my issue or debug the root cause. 

I tried repeating the cluster setup from scratch a few times as the dire case thinking I might have done some mistake in the configuration. But, following the documentation by the word, I could not isolate any issue there as well.

 

OBSERVATIONS: 

No logs are available for the applications in the Job history UI and hence no conclusive cause has ben identified.

No pattern observed on the whole with job submission. Sometimes, batch jobs and an individual job a few times.

Automatically killed after remaining in the "pending" state for a few minutes and new jobs are spawned.

 

 

Also checked out this blog post: 
https://blog.cloudera.com/blog/2017/01/how-to-secure-internet-exposed-apache-hadoop/Alien jobs.png

Posts: 1,760
Kudos: 378
Solutions: 281
Registered: ‎07-31-2013

Re: Unknown origin of jobs submitted by dr.who

Would it be possible for you to share your RM log via pastebin or similar?

 

Have you tried looking at the report of IPs connecting into the cluster, specifically to the RM host and its web UI port (that's where the 'dr.who' comes from, default assumed username for unsecured setups), to try and see if there are still external connections that's not explaniable?

 

Failing that, is it possible you may have more of a local exploit running that's behind this - i.e. the host OS itself is compromised? Has this persisted even after making an entirely new environment?

 

Also since you have gone over the two topic links, have you considered securing your cluster with strong authentication with Kerberos, and also enabling strong HTTP authentication for the Hadoop (HDFS and YARN) web consoles?

Highlighted
Contributor
Posts: 41
Registered: ‎06-24-2018

Re: Unknown origin of jobs submitted by dr.who

[ Edited ]

I am facing the same issue, can any body help ? I am new to big data stuff so as per my understanding jobs should be initiated when i trigger them. Untill last two weeks this is happening which is causing node manager to exit and resource manager as well.
Resource manager logs :

9:32:00.514 PM WARN RMAuditLogger
USER=dr.who OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1533839222301_0028 failed 3 times due to AM Container for appattempt_1533839222301_0028_000003 exited with exitCode: 0
Diagnostics: Failing this attempt. Failing the application. APPID=application_1533839222301_0028
9:32:00.514 PM INFO RMAppManager$ApplicationSummary
appId=application_1533839222301_0028,name=hadoop,user=dr.who,queue=root.users.dr_dot_who,state=FAILED,vCores:0>
9:32:00.785 PM INFO RMContainerImpl
container_1533839222301_0029_03_000001 Container Transitioned from NEW to ALLOCATED
9:32:00.786 PM INFO RMAuditLogger
USER=dr.who OPERATION=AM Allocated Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1533839222301_0029 CONTAINERID=container_1533839222301_0029_03_000001
9:32:00.786 PM INFO SchedulerNode
Assigned container container_1533839222301_0029_03_000001 of capacity <memory:1024, vCores:1> on host pan0142.panoulu.net:8041, which has 1 containers, <memory:1024, vCores:1> used and <memory:0, vCores:1> available after allocation
9:32:00.786 PM INFO NMTokenSecretManagerInRM
Sending NMToken for nodeId : pan0142.panoulu.net:8041 for container : container_1533839222301_0029_03_000001
9:32:00.786 PM INFO RMContainerImpl
container_1533839222301_0029_03_000001 Container Transitioned from ALLOCATED to ACQUIRED
9:32:00.786 PM INFO NMTokenSecretManagerInRM
Clear node set for appattempt_1533839222301_0029_000003
9:32:00.786 PM INFO RMAppAttemptImpl
Storing attempt: AppId: application_1533839222301_0029 AttemptId: appattempt_1533839222301_0029_000003 MasterContainer: Container: [ContainerId: container_1533839222301_0029_03_000001, NodeId: pan0142.panoulu.net:8041, NodeHttpAddress: pan0142.panoulu.net:8042, Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 185.38.3.142:8041 }, ]
9:32:00.787 PM INFO RMAppAttemptImpl
appattempt_1533839222301_0029_000003 State change from SCHEDULED to ALLOCATED_SAVING on event = CONTAINER_ALLOCATED
9:32:00.787 PM INFO RMAppAttemptImpl
appattempt_1533839222301_0029_000003 State change from ALLOCATED_SAVING to ALLOCATED on event = ATTEMPT_NEW_SAVED
9:32:00.787 PM INFO AMLauncher
Launching masterappattempt_1533839222301_0029_000003
9:32:00.788 PM INFO AMLauncher
Setting up container Container: [ContainerId: container_1533839222301_0029_03_000001, NodeId: pan0142.panoulu.net:8041, NodeHttpAddress: pan0142.panoulu.net:8042, Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 185.38.3.142:8041 }, ] for AM appattempt_1533839222301_0029_000003
9:32:00.788 PM INFO AMRMTokenSecretManager
Create AMRMToken for ApplicationAttempt: appattempt_1533839222301_0029_000003
9:32:00.788 PM INFO AMRMTokenSecretManager
Creating password for appattempt_1533839222301_0029_000003
9:32:00.813 PM INFO AMLauncher
Done launching container Container: [ContainerId: container_1533839222301_0029_03_000001, NodeId: pan0142.panoulu.net:8041, NodeHttpAddress: , Resource: <memory:1024, vCores:1>, Priority: 0, Token: Token { kind: ContainerToken, service: 185.38.3.142:8041 }, ] for AM appattempt_1533839222301_0029_000003
9:32:00.813 PM INFO RMAppAttemptImpl
appattempt_1533839222301_0029_000003 State change from ALLOCATED to LAUNCHED on event = LAUNCHED
9:32:01.788 PM INFO RMContainerImpl
container_1533839222301_0029_03_000001 Container Transitioned from ACQUIRED to RUNNING
9:32:02.635 PM INFO RMContainerImpl
container_1533839222301_0029_03_000001 Container Transitioned from RUNNING to COMPLETED
9:32:02.635 PM INFO FSAppAttempt
Completed container: container_1533839222301_0029_03_000001 in state: COMPLETED event:FINISHED
9:32:02.635 PM INFO RMAuditLogger
USER=dr.who OPERATION=AM Released Container TARGET=SchedulerApp RESULT=SUCCESS APPID=application_1533839222301_0029 CONTAINERID=container_1533839222301_0029_03_000001
9:32:02.635 PM INFO SchedulerNode
Released container container_1533839222301_0029_03_000001 of capacity <memory:1024, vCores:1> on host pan0142.panoulu.net:8041, which currently has 0 containers, <memory:0, vCores:0> used and <memory:1024, vCores:2> available, release resources=true
9:32:02.635 PM INFO FairScheduler
Application attempt appattempt_1533839222301_0029_000003 released container container_1533839222301_0029_03_000001 on node: host: pan0142.panoulu.net:8041 #containers=0 available=1024 used=0 with event: FINISHED
9:32:02.635 PM INFO RMAppAttemptImpl
Updating application attempt appattempt_1533839222301_0029_000003 with final state: FAILED, and exit status: 0
9:32:02.635 PM INFO RMAppAttemptImpl
appattempt_1533839222301_0029_000003 State change from LAUNCHED to FINAL_SAVING on event = CONTAINER_FINISHED
9:32:02.635 PM INFO ApplicationMasterService
Unregistering app attempt : appattempt_1533839222301_0029_000003
9:32:02.636 PM INFO AMRMTokenSecretManager
Application finished, removing password for appattempt_1533839222301_0029_000003
9:32:02.636 PM INFO RMAppAttemptImpl
appattempt_1533839222301_0029_000003 State change from FINAL_SAVING to FAILED on event = ATTEMPT_UPDATE_SAVED
9:32:02.636 PM INFO RMAppImpl
The number of failed attempts is 3. The max attempts is 3
9:32:02.636 PM INFO RMAppImpl
Updating application application_1533839222301_0029 with final state: FAILED
9:32:02.636 PM INFO RMStateStore
Updating info for app: application_1533839222301_0029
9:32:02.636 PM INFO RMAppImpl
application_1533839222301_0029 State change from ACCEPTED to FINAL_SAVING on event = ATTEMPT_FAILED
9:32:02.636 PM INFO FairScheduler
Application appattempt_1533839222301_0029_000003 is done. finalState=FAILED
9:32:02.636 PM INFO AppSchedulingInfo
Application application_1533839222301_0029 requests cleared
9:32:02.649 PM INFO RMAppImpl
Application application_1533839222301_0029 failed 3 times due to AM Container for appattempt_1533839222301_0029_000003 exited with exitCode: 0
For more detailed output, check application tracking page:, click on links to logs of each attempt.
Diagnostics: Failing this attempt. Failing the application.
9:32:02.649 PM INFO RMAppImpl
application_1533839222301_0029 State change from FINAL_SAVING to FAILED on event = APP_UPDATE_SAVED
9:32:02.649 PM WARN RMAuditLogger
USER=dr.who OPERATION=Application Finished - Failed TARGET=RMAppManager RESULT=FAILURE DESCRIPTION=App failed with state: FAILED PERMISSIONS=Application application_1533839222301_0029 failed 3 times due to AM Container for appattempt_1533839222301_0029_000003 exited with exitCode: 0
For more detailed output, check application tracking page:, click on links to logs of each attempt.
Diagnostics: Failing this attempt. Failing the application. APPID=application_1533839222301_0029
9:32:02.649 PM INFO RMAppManager$ApplicationSummary
appId=application_1533839222301_0029,name=hadoop,user=dr.who,queue=root.users.dr_dot_who,state=FAILED,trackingUrl=
9:37:02.294 PM INFO AbstractYarnScheduler
Release request cache is cleaned up
9:37:31.576 PM INFO ClientRMService
Allocated new applicationId: 30
9:37:35.975 PM INFO RMAppImpl
Storing application with id application_1533839222301_0030
9:37:35.976 PM INFO RMStateStore
Storing info for app: application_1533839222301_0030
9:37:35.976 PM INFO RMAppImpl
application_1533839222301_0030 State change from NEW to NEW_SAVING on event = START
9:37:35.976 PM INFO RMAppImpl
application_1533839222301_0030 State change from NEW_SAVING to SUBMITTED on event = APP_NEW_SAVED
9:37:35.976 PM INFO ClientRMService
Application with id 30 submitted by user dr.who
9:37:35.976 PM INFO RMAuditLogger
USER=dr.who OPERATION=Submit Application Request TARGET=ClientRMService RESULT=SUCCESS APPID=application_1533839222301_0030
9:37:35.976 PM WARN QueuePlacementRule
Name dr.who is converted to dr_dot_who when it is used as a queue name.
9:37:35.976 PM INFO FairScheduler
Accepted application application_1533839222301_0030 from user: dr.who, in queue: root.users.dr_dot_who, currently num of applications: 1
9:37:35.977 PM INFO RMAppImpl
application_1533839222301_0030 State change from SUBMITTED to ACCEPTED on event = APP_ACCEPTED
9:37:35.977 PM INFO ApplicationMasterService
Registering app attempt : appattempt_1533839222301_0030_000001
9:37:35.977 PM INFO RMAppAttemptImpl
appattempt_1533839222301_0030_000001 State change from NEW to SUBMITTED on event = START
9:37:35.977 PM INFO FairScheduler
Added Application Attempt appattempt_1533839222301_0030_000001 to scheduler from user: dr.who
9:37:35.977 PM INFO RMAppAttemptImpl
appattempt_1533839222301_0030_000001 State change from SUBMITTED to SCHEDULED on event = ATTEMPT_ADDED
9:37:36.238 PM INFO RMContainerImpl
container_1533839222301_0030_01_000001 Container Transitioned from NEW to ALLOCATED

Announcements