Support Questions
Find answers, ask questions, and share your expertise

Cant launch deploy MaaS in Yarn

Cant launch deploy MaaS in Yarn

New Contributor

Can't figure out why I cant deploy the MaaS dga example in my Metron cluster. Here is the log from YARN when i try to execute the command "$METRON_HOME/bin/maas_deploy.sh -zq node1:2181 -lmp /dga -hmp /user/root/models -mo ADD -m 512 -n dga -v 1.0 -ni 1"

17/06/01 16:22:59 INFO service.ApplicationMaster: [ADD]: Received request for model dga:1.0x1 containers of size 512M at path /user/root/models
17/06/01 16:23:01 INFO impl.AMRMClientImpl: Received new token for : node4.metron:45454
17/06/01 16:23:01 INFO callback.ContainerRequestListener: Got response from RM for container ask, allocatedCnt=2
17/06/01 16:23:01 INFO service.ApplicationMaster: Found container id of 3298534883331
17/06/01 16:23:01 INFO callback.ContainerRequestListener: Launching shell command on a new container., containerId=container_e03_1496291138813_0002_01_000003, containerNode=node4.metron:45454, containerNodeURI=node4.metron:8042, containerResourceMemory=512, containerResourceVirtualCores=1
17/06/01 16:23:01 INFO callback.ContainerRequestListener: Launching shell command on a new container., containerId=container_e03_1496291138813_0002_01_000004, containerNode=node1.metron:45454, containerNodeURI=node1.metron:8042, containerResourceMemory=512, containerResourceVirtualCores=1
17/06/01 16:23:01 INFO callback.LaunchContainer: Setting up container launch container for containerid=container_e03_1496291138813_0002_01_000003
17/06/01 16:23:01 INFO callback.LaunchContainer: Local Directory Contents
17/06/01 16:23:01 INFO callback.LaunchContainer:   6 - tmp
17/06/01 16:23:01 INFO callback.LaunchContainer:   74 - container_tokens
17/06/01 16:23:01 INFO callback.LaunchContainer:   12 - .container_tokens.crc
17/06/01 16:23:01 INFO callback.LaunchContainer:   3612 - launch_container.sh
17/06/01 16:23:01 INFO callback.LaunchContainer:   40 - .launch_container.sh.crc
17/06/01 16:23:01 INFO callback.LaunchContainer:   653 - default_container_executor_session.sh
17/06/01 16:23:01 INFO callback.LaunchContainer:   16 - .default_container_executor_session.sh.crc
17/06/01 16:23:01 INFO callback.LaunchContainer:   707 - default_container_executor.sh
17/06/01 16:23:01 INFO callback.LaunchContainer:   16 - .default_container_executor.sh.crc
17/06/01 16:23:01 INFO callback.LaunchContainer:   10091315 - AppMaster.jar
17/06/01 16:23:01 INFO callback.LaunchContainer: Localizing /user/root/models
17/06/01 16:23:01 INFO callback.LaunchContainer: Model payload: /user/root/models
17/06/01 16:23:01 INFO callback.LaunchContainer: AppJAR Location: hdfs://node1.metron:8020/user/root/MaaS/application_1496291138813_0002/AppMaster.jar
17/06/01 16:23:01 INFO callback.LaunchContainer: Localized dga.py -> LocatedFileStatus{path=hdfs://node1.metron:8020/user/root/models/dga.py; isDirectory=false; length=821; replication=3; blocksize=134217728; modification_time=1496298179635; access_time=1496298179573; owner=root; group=root; permission=rw-r--r--; isSymlink=false}
17/06/01 16:23:01 INFO callback.LaunchContainer: Localized rest.sh -> LocatedFileStatus{path=hdfs://node1.metron:8020/user/root/models/rest.sh; isDirectory=false; length=25; replication=3; blocksize=134217728; modification_time=1496298179570; access_time=1496298179530; owner=root; group=root; permission=rw-r--r--; isSymlink=false}
17/06/01 16:23:01 INFO callback.LaunchContainer: dga.py localized: scheme: "hdfs" host: "node1.metron" port: 8020 file: "/user/root/models/dga.py"
17/06/01 16:23:01 INFO callback.LaunchContainer: rest.sh localized: scheme: "hdfs" host: "node1.metron" port: 8020 file: "/user/root/models/rest.sh"
17/06/01 16:23:01 INFO callback.LaunchContainer: Executing container command: {{JAVA_HOME}}/bin/java org.apache.metron.maas.service.runner.Runner -ci 3298534883331 -zq node1:2181 -zr /metron/maas/config -s rest.sh -n dga -hn node4.metron -v 1.0 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr
17/06/01 16:23:01 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_e03_1496291138813_0002_01_000003
17/06/01 16:23:01 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node4.metron:45454
17/06/01 16:23:01 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_e03_1496291138813_0002_01_000003
17/06/01 16:23:01 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node4.metron:45454
17/06/01 16:24:00 ERROR service.ApplicationMaster: Received a null request...
17/06/01 16:28:38 ERROR service.ApplicationMaster: Received a null request...
17/06/01 16:28:51 ERROR service.ApplicationMaster: Received a null request...
17/06/01 16:32:09 ERROR service.ApplicationMaster: Received a null request...
17/06/01 16:33:04 INFO callback.ContainerRequestListener: Got response from RM for container ask, completedCnt=1
17/06/01 16:33:04 INFO callback.ContainerRequestListener: Got container status for containerID=container_e03_1496291138813_0002_01_000003, state=COMPLETE, exitStatus=1, diagnostics=Exception from container-launch.
Container id: container_e03_1496291138813_0002_01_000003
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
	at org.apache.hadoop.util.Shell.runCommand(Shell.java:933)
	at org.apache.hadoop.util.Shell.run(Shell.java:844)
	at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123)
	at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
	at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Container exited with a non-zero exit code 1

17/06/01 16:33:04 INFO callback.ContainerRequestListener: REMOVING CONTAINER container_e03_1496291138813_0002_01_000003
17/06/01 16:33:04 WARN discovery.ServiceDiscoverer: Unable to find registered model associated with container container_e03_1496291138813_0002_01_000003
17/06/01 16:33:04 ERROR discovery.ServiceDiscoverer: Unable to unregister container container_e03_1496291138813_0002_01_000003 due to: Unable.
java.lang.IllegalStateException: Unable.
	at org.apache.metron.maas.discovery.ServiceDiscoverer.unregisterByContainer(ServiceDiscoverer.java:209)
	at org.apache.metron.maas.service.callback.ContainerRequestListener.onContainersCompleted(ContainerRequestListener.java:121)
	at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:300)
17/06/01 16:35:40 INFO callback.ContainerRequestListener: Got response from RM for container ask, completedCnt=1
17/06/01 16:35:40 INFO callback.ContainerRequestListener: Got container status for containerID=container_e03_1496291138813_0002_01_000004, state=COMPLETE, exitStatus=-100, diagnostics=Container expired since it was unused
17/06/01 16:35:40 INFO callback.ContainerRequestListener: REMOVING CONTAINER container_e03_1496291138813_0002_01_000004
17/06/01 16:35:40 WARN discovery.ServiceDiscoverer: Unable to find registered model associated with container container_e03_1496291138813_0002_01_000004
17/06/01 16:35:40 ERROR discovery.ServiceDiscoverer: Unable to unregister container container_e03_1496291138813_0002_01_000004 due to: Unable.
java.lang.IllegalStateException: Unable.
	at org.apache.metron.maas.discovery.ServiceDiscoverer.unregisterByContainer(ServiceDiscoverer.java:209)
	at org.apache.metron.maas.service.callback.ContainerRequestListener.onContainersCompleted(ContainerRequestListener.java:121)
	at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:300)

ERROR service.ApplicationMaster occured when I used the command "$METRON_HOME/bin/maas_deploy.sh -zq node1:2181 -mo LIST"

Has anyone else experienced this?

11 REPLIES 11

Re: Cant launch deploy MaaS in Yarn

I am having the same same issue. Were you able to find a resolution?

Re: Cant launch deploy MaaS in Yarn

Super Collaborator

@Chris and @cduby - I have faced a similar issue.

One way I got around this was to provide full permissions to the HDFS folder at /user/root/*, kill existing/running YARN MaaS application and start afresh.

Re: Cant launch deploy MaaS in Yarn

@asubramanian- I am facing similar issue, giving full permission to folder /user/root/* and deploying again is not solving the issue

From the application container logs , I see below exception. I feel this is the reason for container launch failure. Could please let me know how to overcome the below issue. Thanks in Advance

Following the example in http://metron.apache.org/current-book/metron-analytics/metron-maas-service/index.html

Unable to parse args: -ci 4398046511106 -zq node1:2181 -zr /metron/maas/config -s -n dga -hn node1.c.upload-161114.internal -v 1.0
org.apache.commons.cli.MissingArgumentException: Missing argument for option: s
    at org.apache.commons.cli.Parser.processArgs(Parser.java:343)
    at org.apache.commons.cli.Parser.processOption(Parser.java:393)
    at org.apache.commons.cli.Parser.parse(Parser.java:199)
    at org.apache.commons.cli.Parser.parse(Parser.java:85)
    at org.apache.metron.maas.service.runner.Runner$RunnerOptions.parse(Runner.java:139)
    at org.apache.metron.maas.service.runner.Runner.main(Runner.java:170)
Exception in thread "main" org.apache.commons.cli.MissingArgumentException: Missing argument for option: s
    at org.apache.commons.cli.Parser.processArgs(Parser.java:343)
    at org.apache.commons.cli.Parser.processOption(Parser.java:393)
    at org.apache.commons.cli.Parser.parse(Parser.java:199)
    at org.apache.commons.cli.Parser.parse(Parser.java:85)
    at org.apache.metron.maas.service.runner.Runner$RunnerOptions.parse(Runner.java:139)
    at org.apache.metron.maas.service.runner.Runner.main(Runner.java:170)

<br>

Re: Cant launch deploy MaaS in Yarn

Super Collaborator

@Girish N, I see from the output you have pasted that there is a missing argument for '-s'. Can you check on that again? Also, please paste the whole output along with command that you ran.

Re: Cant launch deploy MaaS in Yarn

@asubramanian,

Re-ran everything still the same missing argument error and the log from yarn is as below.

Command Executed

1.$METRON_HOME/bin/maas_service.sh -zq node1:2181

2. $METRON_HOME/bin/maas_deploy.sh -zq node1:2181 -lmp /root/mock_dga/ -hmp /user/root/models -mo ADD -m 512 -n dga -v 1.0 -ni 1

17/11/23 05:12:31 INFO impl.TimelineClientImpl: Timeline service address: http://node1.c.upload-161114.internal:8188/ws/v1/timeline/
17/11/23 05:12:31 INFO client.RMProxy: Connecting to ResourceManager at node1.c.upload-161114.internal/10.128.0.2:8050
17/11/23 05:12:31 INFO client.AHSProxy: Connecting to Application History server at node1.c.upload-161114.internal/10.128.0.2:10200
17/11/23 05:12:33 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
17/11/23 05:12:33 INFO compress.CodecPool: Got brand-new decompressor [.deflate]
Container: container_e04_1511328189618_0011_01_000002 on node1.c.upload-161114.internal_45454_1511347091091
===========================================================================================================
LogType:directory.info
Log Upload Time:Wed Nov 22 10:38:11 +0000 2017
LogLength:2093
Log Contents:
ls -l:
total 44
lrwxrwxrwx. 1 yarn hadoop   100 Nov 22 10:31 AppMaster.jar -> /hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/filecache/10/AppMaster.jar
-rw-r--r--. 1 yarn hadoop    74 Nov 22 10:31 container_tokens
-rwx------. 1 yarn hadoop   653 Nov 22 10:31 default_container_executor_session.sh
-rwx------. 1 yarn hadoop   707 Nov 22 10:31 default_container_executor.sh
lrwxrwxrwx. 1 yarn hadoop    93 Nov 22 10:31 dga.py -> /hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/filecache/11/dga.py
-rwx------. 1 yarn hadoop 32429 Nov 22 10:31 launch_container.sh
lrwxrwxrwx. 1 yarn hadoop    94 Nov 22 10:31 rest.py -> /hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/filecache/12/rest.py
drwx--x---. 2 yarn hadoop     6 Nov 22 10:31 tmp
find -L . -maxdepth 5 -ls:
478152002    4 drwx--x---   3 yarn     hadoop       4096 Nov 22 10:31 .
494954822    0 drwx--x---   2 yarn     hadoop          6 Nov 22 10:31 ./tmp
478152014    4 -rw-r--r--   1 yarn     hadoop         74 Nov 22 10:31 ./container_tokens
478152015    4 -rw-r--r--   1 yarn     hadoop         12 Nov 22 10:31 ./.container_tokens.crc
478152017   32 -rwx------   1 yarn     hadoop      32429 Nov 22 10:31 ./launch_container.sh
478152018    4 -rw-r--r--   1 yarn     hadoop        264 Nov 22 10:31 ./.launch_container.sh.crc
478152019    4 -rwx------   1 yarn     hadoop        653 Nov 22 10:31 ./default_container_executor_session.sh
478152020    4 -rw-r--r--   1 yarn     hadoop         16 Nov 22 10:31 ./.default_container_executor_session.sh.crc
478152021    4 -rwx------   1 yarn     hadoop        707 Nov 22 10:31 ./default_container_executor.sh
478152022    4 -rw-r--r--   1 yarn     hadoop         16 Nov 22 10:31 ./.default_container_executor.sh.crc
1233149577 19300 -r-x------   1 yarn     hadoop   19761989 Nov 22 10:30 ./AppMaster.jar
461402463    4 -r-x------   1 yarn     hadoop         26 Nov 22 10:31 ./rest.py
444611662    4 -r-x------   1 yarn     hadoop        744 Nov 22 10:31 ./dga.py
broken symlinks(find -L . -maxdepth 5 -type l -ls):

End of LogType:directory.info

LogType:launch_container.sh
Log Upload Time:Wed Nov 22 10:38:11 +0000 2017
LogLength:32429
Log Contents:
#!/bin/bash

export LOCAL_DIRS="/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011"
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/usr/hdp/current/hadoop-client/conf"}
export NM_HTTP_PORT="8042"
export JAVA_HOME=${JAVA_HOME:-"/usr/jdk64/jdk1.8.0_77"}
export LOG_DIRS="/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002"
export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
"
export NM_PORT="45454"
export USER="root"
export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/usr/hdp/current/hadoop-yarn-nodemanager"}
export CLASSPATH="$CLASSPATH:./*:$CLASSPATH:....(too many parameters so removed from logs)
export NM_HOST="node1.c.upload-161114.internal"
export HADOOP_TOKEN_FILE_LOCATION="/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/container_tokens"
export NM_AUX_SERVICE_spark_shuffle=""
export LOCAL_USER_DIRS="/hadoop/yarn/local/usercache/root/"
export LOGNAME="root"
export JVM_PID="$$"
export PWD="/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002"
export HOME="/home/"
export NM_AUX_SERVICE_spark2_shuffle=""
export CONTAINER_ID="container_e04_1511328189618_0011_01_000002"
export MALLOC_ARENA_MAX="4"
ln -sf "/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/filecache/10/AppMaster.jar" "AppMaster.jar"
hadoop_shell_errorcode=$?
if [ $hadoop_shell_errorcode -ne 0 ]
then
  exit $hadoop_shell_errorcode
fi
ln -sf "/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/filecache/12/rest.py" "rest.py"
hadoop_shell_errorcode=$?
if [ $hadoop_shell_errorcode -ne 0 ]
then
  exit $hadoop_shell_errorcode
fi
ln -sf "/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/filecache/11/dga.py" "dga.py"
hadoop_shell_errorcode=$?
if [ $hadoop_shell_errorcode -ne 0 ]
then
  exit $hadoop_shell_errorcode
fi
# Creating copy of launch script
cp "launch_container.sh" "/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/launch_container.sh"
chmod 640 "/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/launch_container.sh"
# Determining directory contents
echo "ls -l:" 1>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/directory.info"
ls -l 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/directory.info"
echo "find -L . -maxdepth 5 -ls:" 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/directory.info"
find -L . -maxdepth 5 -ls 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/directory.info"
echo "broken symlinks(find -L . -maxdepth 5 -type l -ls):" 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/directory.info"
find -L . -maxdepth 5 -type l -ls 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/directory.info"
exec /bin/bash -c "$JAVA_HOME/bin/java org.apache.metron.maas.service.runner.Runner -ci 4398046511106 -zq node1:2181 -zr /metron/maas/config -s -n -hn node1.c.upload-161114.internal -v 1.0 1>/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/stdout 2>/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/stderr"
hadoop_shell_errorcode=$?
if [ $hadoop_shell_errorcode -ne 0 ]
then
  exit $hadoop_shell_errorcode
fi

End of LogType:launch_container.sh

LogType:stderr
Log Upload Time:Wed Nov 22 10:38:11 +0000 2017
LogLength:1110
Log Contents:
Unable to parse args: -ci 4398046511106 -zq node1:2181 -zr /metron/maas/config -s -n -hn node1.c.upload-161114.internal -v 1.0
org.apache.commons.cli.MissingArgumentException: Missing argument for option: s
    at org.apache.commons.cli.Parser.processArgs(Parser.java:343)
    at org.apache.commons.cli.Parser.processOption(Parser.java:393)
    at org.apache.commons.cli.Parser.parse(Parser.java:199)
    at org.apache.commons.cli.Parser.parse(Parser.java:85)
    at org.apache.metron.maas.service.runner.Runner$RunnerOptions.parse(Runner.java:139)
    at org.apache.metron.maas.service.runner.Runner.main(Runner.java:170)
Exception in thread "main" org.apache.commons.cli.MissingArgumentException: Missing argument for option: s
    at org.apache.commons.cli.Parser.processArgs(Parser.java:343)
    at org.apache.commons.cli.Parser.processOption(Parser.java:393)
    at org.apache.commons.cli.Parser.parse(Parser.java:199)
    at org.apache.commons.cli.Parser.parse(Parser.java:85)
    at org.apache.metron.maas.service.runner.Runner$RunnerOptions.parse(Runner.java:139)
    at org.apache.metron.maas.service.runner.Runner.main(Runner.java:170)

End of LogType:stderr

LogType:stdout
Log Upload Time:Wed Nov 22 10:38:11 +0000 2017
LogLength:347
Log Contents:
usage: MaaSRunner
 -ci,--container_id <arg>   Container ID
 -h,--help                  This screen
 -hn,--hostname <arg>       Hostname for container
 -n,--name <arg>            Name
 -s,--script <arg>          Script Path
 -v,--version <arg>         Version
 -zq,--zk_quorum <arg>      Zookeeper Quorum
 -zr,--zk_root <arg>        Zookeeper Root

End of LogType:stdout

Container: container_e04_1511328189618_0011_01_000001 on node1.c.upload-161114.internal_45454_1511347091091
===========================================================================================================
LogType:AppMaster.stderr
Log Upload Time:Wed Nov 22 10:38:11 +0000 2017
LogLength:39099
Log Contents:
17/11/22 10:30:15 INFO service.ApplicationMaster: Initializing ApplicationMaster
17/11/22 10:30:16 INFO service.ApplicationMaster: Application master for app, appId=11, clustertimestamp=1511328189618, attemptId=1
17/11/22 10:30:16 INFO service.ApplicationMaster: Starting ApplicationMaster
17/11/22 10:30:16 INFO yarn.YarnUtils: Executing with tokens:
17/11/22 10:30:16 INFO yarn.YarnUtils: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (appAttemptId { application_id { id: 11 cluster_timestamp: 1511328189618 } attemptId: 1 } keyId: -121482599)
17/11/22 10:30:16 INFO impl.TimelineClientImpl: Timeline service address: http://node1.c.upload-161114.internal:8188/ws/v1/timeline/
17/11/22 10:30:17 INFO client.RMProxy: Connecting to ResourceManager at node1.c.upload-161114.internal/10.128.0.2:8030
17/11/22 10:30:17 INFO impl.NMClientAsyncImpl: Upper bound of the thread pool size is 500
17/11/22 10:30:17 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
17/11/22 10:30:18 INFO service.ApplicationMaster: Max mem capabililty of resources in this cluster 27648
17/11/22 10:30:18 INFO service.ApplicationMaster: Max vcores capabililty of resources in this cluster 6
17/11/22 10:30:18 INFO imps.CuratorFrameworkImpl: Starting
17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:host.name=node1.c.upload-161114.internal
17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_77
17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/jdk64/jdk1.8.0_77/jre
17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:java.class.path=$CLASSPATH: (too many parameters so removed from logs)
17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:java.library.path=::/usr/hdp/2.5.3.0-37/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.3.0-37/hadoop/lib/native::/usr/hdp/2.5.3.0-37/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.3.0-37/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.5.3.0-37/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.3.0-37/hadoop/lib/native:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp
17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:os.version=3.10.0-693.5.2.el7.x86_64
17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:user.name=yarn
17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/yarn
17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:user.dir=/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001
17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=node1:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@e27ba81
17/11/22 10:30:18 INFO zookeeper.ClientCnxn: Opening socket connection to server node1.c.upload-161114.internal/10.128.0.2:2181. Will not attempt to authenticate using SASL (unknown error)
17/11/22 10:30:18 INFO zookeeper.ClientCnxn: Socket connection established to node1.c.upload-161114.internal/10.128.0.2:2181, initiating session
17/11/22 10:30:18 INFO zookeeper.ClientCnxn: Session establishment complete on server node1.c.upload-161114.internal/10.128.0.2:2181, sessionid = 0x15fe233644e0114, negotiated timeout = 40000
17/11/22 10:30:18 INFO state.ConnectionStateManager: State change: CONNECTED
17/11/22 10:30:19 INFO service.ApplicationMaster: Ready to accept requests...
17/11/22 10:31:17 INFO service.ApplicationMaster: [ADD]: Received request for model null:1.0x1 containers of size 512M at path /user/root/models
17/11/22 10:31:19 INFO impl.AMRMClientImpl: Received new token for : node1.c.upload-161114.internal:45454
17/11/22 10:31:19 INFO callback.ContainerRequestListener: Got response from RM for container ask, allocatedCnt=1
17/11/22 10:31:19 INFO service.ApplicationMaster: Found container id of 4398046511106
17/11/22 10:31:19 INFO callback.ContainerRequestListener: Launching shell command on a new container., containerId=container_e04_1511328189618_0011_01_000002, containerNode=node1.c.upload-161114.internal:45454, containerNodeURI=node1.c.upload-161114.internal:8042, containerResourceMemory=9216, containerResourceVirtualCores=1
17/11/22 10:31:19 INFO callback.LaunchContainer: Setting up container launch container for containerid=container_e04_1511328189618_0011_01_000002
17/11/22 10:31:19 INFO callback.LaunchContainer: Local Directory Contents
17/11/22 10:31:19 INFO callback.LaunchContainer:   6 - tmp
17/11/22 10:31:19 INFO callback.LaunchContainer:   74 - container_tokens
17/11/22 10:31:19 INFO callback.LaunchContainer:   12 - .container_tokens.crc
17/11/22 10:31:19 INFO callback.LaunchContainer:   3646 - launch_container.sh
17/11/22 10:31:19 INFO callback.LaunchContainer:   40 - .launch_container.sh.crc
17/11/22 10:31:19 INFO callback.LaunchContainer:   653 - default_container_executor_session.sh
17/11/22 10:31:19 INFO callback.LaunchContainer:   16 - .default_container_executor_session.sh.crc
17/11/22 10:31:19 INFO callback.LaunchContainer:   707 - default_container_executor.sh
17/11/22 10:31:19 INFO callback.LaunchContainer:   16 - .default_container_executor.sh.crc
17/11/22 10:31:19 INFO callback.LaunchContainer:   19761989 - AppMaster.jar
17/11/22 10:31:19 INFO callback.LaunchContainer: Localizing /user/root/models
17/11/22 10:31:19 INFO callback.LaunchContainer: Model payload: /user/root/models
17/11/22 10:31:19 INFO callback.LaunchContainer: AppJAR Location: hdfs://node1.c.upload-161114.internal:8020/user/root/MaaS/application_1511328189618_0011/AppMaster.jar
17/11/22 10:31:19 INFO callback.LaunchContainer: Localized dga.py -> LocatedFileStatus{path=hdfs://node1.c.upload-161114.internal:8020/user/root/models/dga.py; isDirectory=false; length=744; replication=3; blocksize=134217728; modification_time=1511346677472; access_time=1511346677462; owner=root; group=hdfs; permission=rw-r--r--; isSymlink=false}
17/11/22 10:31:19 INFO callback.LaunchContainer: Localized rest.py -> LocatedFileStatus{path=hdfs://node1.c.upload-161114.internal:8020/user/root/models/rest.py; isDirectory=false; length=26; replication=3; blocksize=134217728; modification_time=1511346677456; access_time=1511346677299; owner=root; group=hdfs; permission=rw-r--r--; isSymlink=false}
17/11/22 10:31:19 INFO callback.LaunchContainer: AppMaster.jar localized: scheme: "hdfs" host: "node1.c.upload-161114.internal" port: 8020 file: "/user/root/MaaS/application_1511328189618_0011/AppMaster.jar"
17/11/22 10:31:19 INFO callback.LaunchContainer: dga.py localized: scheme: "hdfs" host: "node1.c.upload-161114.internal" port: 8020 file: "/user/root/models/dga.py"
17/11/22 10:31:19 INFO callback.LaunchContainer: rest.py localized: scheme: "hdfs" host: "node1.c.upload-161114.internal" port: 8020 file: "/user/root/models/rest.py"
17/11/22 10:31:19 INFO callback.LaunchContainer: Executing container command: {{JAVA_HOME}}/bin/java org.apache.metron.maas.service.runner.Runner -ci 4398046511106 -zq node1:2181 -zr /metron/maas/config -s -n -hn node1.c.upload-161114.internal -v 1.0 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr
17/11/22 10:31:19 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_e04_1511328189618_0011_01_000002
17/11/22 10:31:19 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1.c.upload-161114.internal:45454
17/11/22 10:31:19 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_e04_1511328189618_0011_01_000002
17/11/22 10:31:19 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1.c.upload-161114.internal:45454
17/11/22 10:31:20 INFO callback.ContainerRequestListener: Got response from RM for container ask, completedCnt=1
17/11/22 10:31:20 INFO callback.ContainerRequestListener: Got container status for containerID=container_e04_1511328189618_0011_01_000002, state=COMPLETE, exitStatus=1, diagnostics=Exception from container-launch.
Container id: container_e04_1511328189618_0011_01_000002
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:933)
    at org.apache.hadoop.util.Shell.run(Shell.java:844)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:225)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)


Container exited with a non-zero exit code 1

17/11/22 10:31:20 INFO callback.ContainerRequestListener: REMOVING CONTAINER container_e04_1511328189618_0011_01_000002
17/11/22 10:31:20 WARN discovery.ServiceDiscoverer: Unable to find registered model associated with container container_e04_1511328189618_0011_01_000002
17/11/22 10:31:20 ERROR discovery.ServiceDiscoverer: Unable to unregister container container_e04_1511328189618_0011_01_000002 due to: Unable.
java.lang.IllegalStateException: Unable.
    at org.apache.metron.maas.discovery.ServiceDiscoverer.unregisterByContainer(ServiceDiscoverer.java:204)
    at org.apache.metron.maas.service.callback.ContainerRequestListener.onContainersCompleted(ContainerRequestListener.java:121)
    at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:300)

End of LogType:AppMaster.stderr

LogType:AppMaster.stdout
Log Upload Time:Wed Nov 22 10:38:11 +0000 2017
LogLength:0
Log Contents:

End of LogType:AppMaster.stdout

LogType:directory.info
Log Upload Time:Wed Nov 22 10:38:11 +0000 2017
LogLength:1638
Log Contents:
ls -l:
total 16
lrwxrwxrwx. 1 yarn hadoop  100 Nov 22 10:30 AppMaster.jar -> /hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/filecache/10/AppMaster.jar
-rw-r--r--. 1 yarn hadoop   74 Nov 22 10:30 container_tokens
-rwx------. 1 yarn hadoop  653 Nov 22 10:30 default_container_executor_session.sh
-rwx------. 1 yarn hadoop  707 Nov 22 10:30 default_container_executor.sh
-rwx------. 1 yarn hadoop 3646 Nov 22 10:30 launch_container.sh
drwx--x---. 2 yarn hadoop    6 Nov 22 10:30 tmp
find -L . -maxdepth 5 -ls:
1258621687    4 drwx--x---   3 yarn     hadoop       4096 Nov 22 10:30 .
1275089340    0 drwx--x---   2 yarn     hadoop          6 Nov 22 10:30 ./tmp
1258621688    4 -rw-r--r--   1 yarn     hadoop         74 Nov 22 10:30 ./container_tokens
1258621689    4 -rw-r--r--   1 yarn     hadoop         12 Nov 22 10:30 ./.container_tokens.crc
1258621690    4 -rwx------   1 yarn     hadoop       3646 Nov 22 10:30 ./launch_container.sh
1258621691    4 -rw-r--r--   1 yarn     hadoop         40 Nov 22 10:30 ./.launch_container.sh.crc
1258621692    4 -rwx------   1 yarn     hadoop        653 Nov 22 10:30 ./default_container_executor_session.sh
1258621693    4 -rw-r--r--   1 yarn     hadoop         16 Nov 22 10:30 ./.default_container_executor_session.sh.crc
1258621984    4 -rwx------   1 yarn     hadoop        707 Nov 22 10:30 ./default_container_executor.sh
1258621995    4 -rw-r--r--   1 yarn     hadoop         16 Nov 22 10:30 ./.default_container_executor.sh.crc
1233149577 19300 -r-x------   1 yarn     hadoop   19761989 Nov 22 10:30 ./AppMaster.jar
broken symlinks(find -L . -maxdepth 5 -type l -ls):

End of LogType:directory.info

LogType:launch_container.sh
Log Upload Time:Wed Nov 22 10:38:11 +0000 2017
LogLength:3646
Log Contents:
#!/bin/bash

export LOCAL_DIRS="/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011"
export APPLICATION_WEB_PROXY_BASE="/proxy/application_1511328189618_0011"
export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/usr/hdp/current/hadoop-client/conf"}
export MAX_APP_ATTEMPTS="2"
export NM_HTTP_PORT="8042"
export JAVA_HOME=${JAVA_HOME:-"/usr/jdk64/jdk1.8.0_77"}
export LOG_DIRS="/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001"
export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA=
"
export NM_PORT="45454"
export USER="root"
export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/usr/hdp/current/hadoop-yarn-nodemanager"}
export CLASSPATH="$CLASSPATH:./*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:./log4j.properties"
export APP_SUBMIT_TIME_ENV="1511346614692"
export NM_HOST="node1.c.upload-161114.internal"
export HADOOP_TOKEN_FILE_LOCATION="/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/container_tokens"
export NM_AUX_SERVICE_spark_shuffle=""
export LOCAL_USER_DIRS="/hadoop/yarn/local/usercache/root/"
export LOGNAME="root"
export JVM_PID="$$"
export PWD="/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001"
export HOME="/home/"
export NM_AUX_SERVICE_spark2_shuffle=""
export CONTAINER_ID="container_e04_1511328189618_0011_01_000001"
export MALLOC_ARENA_MAX="4"
ln -sf "/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/filecache/10/AppMaster.jar" "AppMaster.jar"
hadoop_shell_errorcode=$?
if [ $hadoop_shell_errorcode -ne 0 ]
then
  exit $hadoop_shell_errorcode
fi
# Creating copy of launch script
cp "launch_container.sh" "/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/launch_container.sh"
chmod 640 "/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/launch_container.sh"
# Determining directory contents
echo "ls -l:" 1>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/directory.info"
ls -l 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/directory.info"
echo "find -L . -maxdepth 5 -ls:" 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/directory.info"
find -L . -maxdepth 5 -ls 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/directory.info"
echo "broken symlinks(find -L . -maxdepth 5 -type l -ls):" 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/directory.info"
find -L . -maxdepth 5 -type l -ls 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/directory.info"
exec /bin/bash -c "$JAVA_HOME/bin/java -Xmx10m org.apache.metron.maas.service.ApplicationMaster -zq node1:2181 -zr /metron/maas/config -aj hdfs://node1.c.upload-161114.internal:8020/user/root/MaaS/application_1511328189618_0011/AppMaster.jar 1>/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/AppMaster.stdout 2>/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/AppMaster.stderr "
hadoop_shell_errorcode=$?
if [ $hadoop_shell_errorcode -ne 0 ]
then
  exit $hadoop_shell_errorcode
fi

End of LogType:launch_container.sh

Re: Cant launch deploy MaaS in Yarn

Super Collaborator

@Girish N that's strange. Is this a full dev vagrant deployment? What version are you running? Also, if possible, I would advice trying this again afresh, just to rule out any misconfiguration and to confirm if the problem is repeatable.

In the meantime, I will spin up a full dev on my environment with the latest release bits to validate.

Re: Cant launch deploy MaaS in Yarn

Super Collaborator

@Girish N, I was able to fire up a full dev environment, follow through the steps at:

https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service#example

And I was able to get Mock DGA model working. I created a HCC article with the steps I followed to get it working on a full dev platform. Please see if it helps:

https://community.hortonworks.com/articles/149376/metron-model-as-a-service-maas-full-dev-platform.h...

Re: Cant launch deploy MaaS in Yarn

Thanks @asubramanian,

Will give it a try.

Re: Cant launch deploy MaaS in Yarn

@asubramanian

I re ran with the steps mentioned in the article , i don't see any errors/exceptions, i used to see earlier.

Now after starting service and deploying, when I try to List the models deployed, I don't see any output with the one mentioned in the article.

From the ResourceManager UI, I see the MaaS service is running, but the allocated memory is more than the one specified from the maas deploy command.

42831-screenshot-from-2017-12-01-15-25-44.png

MaaS list command - $MATRON_HOME/bin/maas_deploy.sh -zq node1:2181 -mo LIST

[metron@node1 bin]$ ./maas_deploy.sh -zq node1:2181 -mo LIST
17/12/01 10:06:44 INFO imps.CuratorFrameworkImpl: Starting
17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-37--1, built on 11/29/2016 17:59 GMT
17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:host.name=node1.c.upload-161114.internal
17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_77
17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/jdk64/jdk1.8.0_77/jre
17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/usr/hdp/2.5.3.0-37/hadoop/conf:...
17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:java.library.path=:/usr/hdp/2.5.3.0-37/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.3.0-37/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir
17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir
17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA>
17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux
17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64
17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:os.version=3.10.0-693.5.2.el7.x86_64
17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:user.name=metron
17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/metron
17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:user.dir=/usr/metron/0.4.2/bin
17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=node1:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@72fa021
17/12/01 10:06:44 INFO zookeeper.ClientCnxn: Opening socket connection to server node1.c.upload-161114.internal/10.128.0.2:2181. Will not attempt to authenticate using SASL (unknown error)
17/12/01 10:06:44 INFO zookeeper.ClientCnxn: Socket connection established to node1.c.upload-161114.internal/10.128.0.2:2181, initiating session
17/12/01 10:06:44 INFO zookeeper.ClientCnxn: Session establishment complete on server node1.c.upload-161114.internal/10.128.0.2:2181, sessionid = 0x16010f3a7a10388, negotiated timeout = 40000
17/12/01 10:06:44 INFO state.ConnectionStateManager: State change: CONNECTED
17/12/01 10:06:45 INFO zookeeper.ZooKeeper: Session: 0x16010f3a7a10388 closed
17/12/01 10:06:45 INFO zookeeper.ClientCnxn: EventThread shut down

I tried giving all permission to hdfs folder "/user/metron"