Created 06-05-2017 12:31 PM
Can't figure out why I cant deploy the MaaS dga example in my Metron cluster. Here is the log from YARN when i try to execute the command "$METRON_HOME/bin/maas_deploy.sh -zq node1:2181 -lmp /dga -hmp /user/root/models -mo ADD -m 512 -n dga -v 1.0 -ni 1"
17/06/01 16:22:59 INFO service.ApplicationMaster: [ADD]: Received request for model dga:1.0x1 containers of size 512M at path /user/root/models 17/06/01 16:23:01 INFO impl.AMRMClientImpl: Received new token for : node4.metron:45454 17/06/01 16:23:01 INFO callback.ContainerRequestListener: Got response from RM for container ask, allocatedCnt=2 17/06/01 16:23:01 INFO service.ApplicationMaster: Found container id of 3298534883331 17/06/01 16:23:01 INFO callback.ContainerRequestListener: Launching shell command on a new container., containerId=container_e03_1496291138813_0002_01_000003, containerNode=node4.metron:45454, containerNodeURI=node4.metron:8042, containerResourceMemory=512, containerResourceVirtualCores=1 17/06/01 16:23:01 INFO callback.ContainerRequestListener: Launching shell command on a new container., containerId=container_e03_1496291138813_0002_01_000004, containerNode=node1.metron:45454, containerNodeURI=node1.metron:8042, containerResourceMemory=512, containerResourceVirtualCores=1 17/06/01 16:23:01 INFO callback.LaunchContainer: Setting up container launch container for containerid=container_e03_1496291138813_0002_01_000003 17/06/01 16:23:01 INFO callback.LaunchContainer: Local Directory Contents 17/06/01 16:23:01 INFO callback.LaunchContainer: 6 - tmp 17/06/01 16:23:01 INFO callback.LaunchContainer: 74 - container_tokens 17/06/01 16:23:01 INFO callback.LaunchContainer: 12 - .container_tokens.crc 17/06/01 16:23:01 INFO callback.LaunchContainer: 3612 - launch_container.sh 17/06/01 16:23:01 INFO callback.LaunchContainer: 40 - .launch_container.sh.crc 17/06/01 16:23:01 INFO callback.LaunchContainer: 653 - default_container_executor_session.sh 17/06/01 16:23:01 INFO callback.LaunchContainer: 16 - .default_container_executor_session.sh.crc 17/06/01 16:23:01 INFO callback.LaunchContainer: 707 - default_container_executor.sh 17/06/01 16:23:01 INFO callback.LaunchContainer: 16 - .default_container_executor.sh.crc 17/06/01 16:23:01 INFO callback.LaunchContainer: 10091315 - AppMaster.jar 17/06/01 16:23:01 INFO callback.LaunchContainer: Localizing /user/root/models 17/06/01 16:23:01 INFO callback.LaunchContainer: Model payload: /user/root/models 17/06/01 16:23:01 INFO callback.LaunchContainer: AppJAR Location: hdfs://node1.metron:8020/user/root/MaaS/application_1496291138813_0002/AppMaster.jar 17/06/01 16:23:01 INFO callback.LaunchContainer: Localized dga.py -> LocatedFileStatus{path=hdfs://node1.metron:8020/user/root/models/dga.py; isDirectory=false; length=821; replication=3; blocksize=134217728; modification_time=1496298179635; access_time=1496298179573; owner=root; group=root; permission=rw-r--r--; isSymlink=false} 17/06/01 16:23:01 INFO callback.LaunchContainer: Localized rest.sh -> LocatedFileStatus{path=hdfs://node1.metron:8020/user/root/models/rest.sh; isDirectory=false; length=25; replication=3; blocksize=134217728; modification_time=1496298179570; access_time=1496298179530; owner=root; group=root; permission=rw-r--r--; isSymlink=false} 17/06/01 16:23:01 INFO callback.LaunchContainer: dga.py localized: scheme: "hdfs" host: "node1.metron" port: 8020 file: "/user/root/models/dga.py" 17/06/01 16:23:01 INFO callback.LaunchContainer: rest.sh localized: scheme: "hdfs" host: "node1.metron" port: 8020 file: "/user/root/models/rest.sh" 17/06/01 16:23:01 INFO callback.LaunchContainer: Executing container command: {{JAVA_HOME}}/bin/java org.apache.metron.maas.service.runner.Runner -ci 3298534883331 -zq node1:2181 -zr /metron/maas/config -s rest.sh -n dga -hn node4.metron -v 1.0 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr 17/06/01 16:23:01 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_e03_1496291138813_0002_01_000003 17/06/01 16:23:01 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node4.metron:45454 17/06/01 16:23:01 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_e03_1496291138813_0002_01_000003 17/06/01 16:23:01 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node4.metron:45454 17/06/01 16:24:00 ERROR service.ApplicationMaster: Received a null request... 17/06/01 16:28:38 ERROR service.ApplicationMaster: Received a null request... 17/06/01 16:28:51 ERROR service.ApplicationMaster: Received a null request... 17/06/01 16:32:09 ERROR service.ApplicationMaster: Received a null request... 17/06/01 16:33:04 INFO callback.ContainerRequestListener: Got response from RM for container ask, completedCnt=1 17/06/01 16:33:04 INFO callback.ContainerRequestListener: Got container status for containerID=container_e03_1496291138813_0002_01_000003, state=COMPLETE, exitStatus=1, diagnostics=Exception from container-launch. Container id: container_e03_1496291138813_0002_01_000003 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:933) at org.apache.hadoop.util.Shell.run(Shell.java:844) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:237) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 17/06/01 16:33:04 INFO callback.ContainerRequestListener: REMOVING CONTAINER container_e03_1496291138813_0002_01_000003 17/06/01 16:33:04 WARN discovery.ServiceDiscoverer: Unable to find registered model associated with container container_e03_1496291138813_0002_01_000003 17/06/01 16:33:04 ERROR discovery.ServiceDiscoverer: Unable to unregister container container_e03_1496291138813_0002_01_000003 due to: Unable. java.lang.IllegalStateException: Unable. at org.apache.metron.maas.discovery.ServiceDiscoverer.unregisterByContainer(ServiceDiscoverer.java:209) at org.apache.metron.maas.service.callback.ContainerRequestListener.onContainersCompleted(ContainerRequestListener.java:121) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:300) 17/06/01 16:35:40 INFO callback.ContainerRequestListener: Got response from RM for container ask, completedCnt=1 17/06/01 16:35:40 INFO callback.ContainerRequestListener: Got container status for containerID=container_e03_1496291138813_0002_01_000004, state=COMPLETE, exitStatus=-100, diagnostics=Container expired since it was unused 17/06/01 16:35:40 INFO callback.ContainerRequestListener: REMOVING CONTAINER container_e03_1496291138813_0002_01_000004 17/06/01 16:35:40 WARN discovery.ServiceDiscoverer: Unable to find registered model associated with container container_e03_1496291138813_0002_01_000004 17/06/01 16:35:40 ERROR discovery.ServiceDiscoverer: Unable to unregister container container_e03_1496291138813_0002_01_000004 due to: Unable. java.lang.IllegalStateException: Unable. at org.apache.metron.maas.discovery.ServiceDiscoverer.unregisterByContainer(ServiceDiscoverer.java:209) at org.apache.metron.maas.service.callback.ContainerRequestListener.onContainersCompleted(ContainerRequestListener.java:121) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:300)
ERROR service.ApplicationMaster occured when I used the command "$METRON_HOME/bin/maas_deploy.sh -zq node1:2181 -mo LIST"
Has anyone else experienced this?
Created 07-17-2017 11:27 PM
I am having the same same issue. Were you able to find a resolution?
Created 08-22-2017 09:23 AM
Created 11-22-2017 09:35 AM
@asubramanian- I am facing similar issue, giving full permission to folder /user/root/* and deploying again is not solving the issue
From the application container logs , I see below exception. I feel this is the reason for container launch failure. Could please let me know how to overcome the below issue. Thanks in Advance
Following the example in http://metron.apache.org/current-book/metron-analytics/metron-maas-service/index.html
Unable to parse args: -ci 4398046511106 -zq node1:2181 -zr /metron/maas/config -s -n dga -hn node1.c.upload-161114.internal -v 1.0 org.apache.commons.cli.MissingArgumentException: Missing argument for option: s at org.apache.commons.cli.Parser.processArgs(Parser.java:343) at org.apache.commons.cli.Parser.processOption(Parser.java:393) at org.apache.commons.cli.Parser.parse(Parser.java:199) at org.apache.commons.cli.Parser.parse(Parser.java:85) at org.apache.metron.maas.service.runner.Runner$RunnerOptions.parse(Runner.java:139) at org.apache.metron.maas.service.runner.Runner.main(Runner.java:170) Exception in thread "main" org.apache.commons.cli.MissingArgumentException: Missing argument for option: s at org.apache.commons.cli.Parser.processArgs(Parser.java:343) at org.apache.commons.cli.Parser.processOption(Parser.java:393) at org.apache.commons.cli.Parser.parse(Parser.java:199) at org.apache.commons.cli.Parser.parse(Parser.java:85) at org.apache.metron.maas.service.runner.Runner$RunnerOptions.parse(Runner.java:139) at org.apache.metron.maas.service.runner.Runner.main(Runner.java:170) <br>
Created 11-22-2017 02:13 PM
@Girish N, I see from the output you have pasted that there is a missing argument for '-s'. Can you check on that again? Also, please paste the whole output along with command that you ran.
Created 11-23-2017 05:28 AM
Re-ran everything still the same missing argument error and the log from yarn is as below.
Command Executed
1.$METRON_HOME/bin/maas_service.sh -zq node1:2181
2. $METRON_HOME/bin/maas_deploy.sh -zq node1:2181 -lmp /root/mock_dga/ -hmp /user/root/models -mo ADD -m 512 -n dga -v 1.0 -ni 1
17/11/23 05:12:31 INFO impl.TimelineClientImpl: Timeline service address: http://node1.c.upload-161114.internal:8188/ws/v1/timeline/ 17/11/23 05:12:31 INFO client.RMProxy: Connecting to ResourceManager at node1.c.upload-161114.internal/10.128.0.2:8050 17/11/23 05:12:31 INFO client.AHSProxy: Connecting to Application History server at node1.c.upload-161114.internal/10.128.0.2:10200 17/11/23 05:12:33 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 17/11/23 05:12:33 INFO compress.CodecPool: Got brand-new decompressor [.deflate] Container: container_e04_1511328189618_0011_01_000002 on node1.c.upload-161114.internal_45454_1511347091091 =========================================================================================================== LogType:directory.info Log Upload Time:Wed Nov 22 10:38:11 +0000 2017 LogLength:2093 Log Contents: ls -l: total 44 lrwxrwxrwx. 1 yarn hadoop 100 Nov 22 10:31 AppMaster.jar -> /hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/filecache/10/AppMaster.jar -rw-r--r--. 1 yarn hadoop 74 Nov 22 10:31 container_tokens -rwx------. 1 yarn hadoop 653 Nov 22 10:31 default_container_executor_session.sh -rwx------. 1 yarn hadoop 707 Nov 22 10:31 default_container_executor.sh lrwxrwxrwx. 1 yarn hadoop 93 Nov 22 10:31 dga.py -> /hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/filecache/11/dga.py -rwx------. 1 yarn hadoop 32429 Nov 22 10:31 launch_container.sh lrwxrwxrwx. 1 yarn hadoop 94 Nov 22 10:31 rest.py -> /hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/filecache/12/rest.py drwx--x---. 2 yarn hadoop 6 Nov 22 10:31 tmp find -L . -maxdepth 5 -ls: 478152002 4 drwx--x--- 3 yarn hadoop 4096 Nov 22 10:31 . 494954822 0 drwx--x--- 2 yarn hadoop 6 Nov 22 10:31 ./tmp 478152014 4 -rw-r--r-- 1 yarn hadoop 74 Nov 22 10:31 ./container_tokens 478152015 4 -rw-r--r-- 1 yarn hadoop 12 Nov 22 10:31 ./.container_tokens.crc 478152017 32 -rwx------ 1 yarn hadoop 32429 Nov 22 10:31 ./launch_container.sh 478152018 4 -rw-r--r-- 1 yarn hadoop 264 Nov 22 10:31 ./.launch_container.sh.crc 478152019 4 -rwx------ 1 yarn hadoop 653 Nov 22 10:31 ./default_container_executor_session.sh 478152020 4 -rw-r--r-- 1 yarn hadoop 16 Nov 22 10:31 ./.default_container_executor_session.sh.crc 478152021 4 -rwx------ 1 yarn hadoop 707 Nov 22 10:31 ./default_container_executor.sh 478152022 4 -rw-r--r-- 1 yarn hadoop 16 Nov 22 10:31 ./.default_container_executor.sh.crc 1233149577 19300 -r-x------ 1 yarn hadoop 19761989 Nov 22 10:30 ./AppMaster.jar 461402463 4 -r-x------ 1 yarn hadoop 26 Nov 22 10:31 ./rest.py 444611662 4 -r-x------ 1 yarn hadoop 744 Nov 22 10:31 ./dga.py broken symlinks(find -L . -maxdepth 5 -type l -ls): End of LogType:directory.info LogType:launch_container.sh Log Upload Time:Wed Nov 22 10:38:11 +0000 2017 LogLength:32429 Log Contents: #!/bin/bash export LOCAL_DIRS="/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011" export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/usr/hdp/current/hadoop-client/conf"} export NM_HTTP_PORT="8042" export JAVA_HOME=${JAVA_HOME:-"/usr/jdk64/jdk1.8.0_77"} export LOG_DIRS="/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002" export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= " export NM_PORT="45454" export USER="root" export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/usr/hdp/current/hadoop-yarn-nodemanager"} export CLASSPATH="$CLASSPATH:./*:$CLASSPATH:....(too many parameters so removed from logs) export NM_HOST="node1.c.upload-161114.internal" export HADOOP_TOKEN_FILE_LOCATION="/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/container_tokens" export NM_AUX_SERVICE_spark_shuffle="" export LOCAL_USER_DIRS="/hadoop/yarn/local/usercache/root/" export LOGNAME="root" export JVM_PID="$$" export PWD="/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002" export HOME="/home/" export NM_AUX_SERVICE_spark2_shuffle="" export CONTAINER_ID="container_e04_1511328189618_0011_01_000002" export MALLOC_ARENA_MAX="4" ln -sf "/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/filecache/10/AppMaster.jar" "AppMaster.jar" hadoop_shell_errorcode=$? if [ $hadoop_shell_errorcode -ne 0 ] then exit $hadoop_shell_errorcode fi ln -sf "/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/filecache/12/rest.py" "rest.py" hadoop_shell_errorcode=$? if [ $hadoop_shell_errorcode -ne 0 ] then exit $hadoop_shell_errorcode fi ln -sf "/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/filecache/11/dga.py" "dga.py" hadoop_shell_errorcode=$? if [ $hadoop_shell_errorcode -ne 0 ] then exit $hadoop_shell_errorcode fi # Creating copy of launch script cp "launch_container.sh" "/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/launch_container.sh" chmod 640 "/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/launch_container.sh" # Determining directory contents echo "ls -l:" 1>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/directory.info" ls -l 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/directory.info" echo "find -L . -maxdepth 5 -ls:" 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/directory.info" find -L . -maxdepth 5 -ls 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/directory.info" echo "broken symlinks(find -L . -maxdepth 5 -type l -ls):" 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/directory.info" find -L . -maxdepth 5 -type l -ls 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/directory.info" exec /bin/bash -c "$JAVA_HOME/bin/java org.apache.metron.maas.service.runner.Runner -ci 4398046511106 -zq node1:2181 -zr /metron/maas/config -s -n -hn node1.c.upload-161114.internal -v 1.0 1>/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/stdout 2>/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000002/stderr" hadoop_shell_errorcode=$? if [ $hadoop_shell_errorcode -ne 0 ] then exit $hadoop_shell_errorcode fi End of LogType:launch_container.sh LogType:stderr Log Upload Time:Wed Nov 22 10:38:11 +0000 2017 LogLength:1110 Log Contents: Unable to parse args: -ci 4398046511106 -zq node1:2181 -zr /metron/maas/config -s -n -hn node1.c.upload-161114.internal -v 1.0 org.apache.commons.cli.MissingArgumentException: Missing argument for option: s at org.apache.commons.cli.Parser.processArgs(Parser.java:343) at org.apache.commons.cli.Parser.processOption(Parser.java:393) at org.apache.commons.cli.Parser.parse(Parser.java:199) at org.apache.commons.cli.Parser.parse(Parser.java:85) at org.apache.metron.maas.service.runner.Runner$RunnerOptions.parse(Runner.java:139) at org.apache.metron.maas.service.runner.Runner.main(Runner.java:170) Exception in thread "main" org.apache.commons.cli.MissingArgumentException: Missing argument for option: s at org.apache.commons.cli.Parser.processArgs(Parser.java:343) at org.apache.commons.cli.Parser.processOption(Parser.java:393) at org.apache.commons.cli.Parser.parse(Parser.java:199) at org.apache.commons.cli.Parser.parse(Parser.java:85) at org.apache.metron.maas.service.runner.Runner$RunnerOptions.parse(Runner.java:139) at org.apache.metron.maas.service.runner.Runner.main(Runner.java:170) End of LogType:stderr LogType:stdout Log Upload Time:Wed Nov 22 10:38:11 +0000 2017 LogLength:347 Log Contents: usage: MaaSRunner -ci,--container_id <arg> Container ID -h,--help This screen -hn,--hostname <arg> Hostname for container -n,--name <arg> Name -s,--script <arg> Script Path -v,--version <arg> Version -zq,--zk_quorum <arg> Zookeeper Quorum -zr,--zk_root <arg> Zookeeper Root End of LogType:stdout Container: container_e04_1511328189618_0011_01_000001 on node1.c.upload-161114.internal_45454_1511347091091 =========================================================================================================== LogType:AppMaster.stderr Log Upload Time:Wed Nov 22 10:38:11 +0000 2017 LogLength:39099 Log Contents: 17/11/22 10:30:15 INFO service.ApplicationMaster: Initializing ApplicationMaster 17/11/22 10:30:16 INFO service.ApplicationMaster: Application master for app, appId=11, clustertimestamp=1511328189618, attemptId=1 17/11/22 10:30:16 INFO service.ApplicationMaster: Starting ApplicationMaster 17/11/22 10:30:16 INFO yarn.YarnUtils: Executing with tokens: 17/11/22 10:30:16 INFO yarn.YarnUtils: Kind: YARN_AM_RM_TOKEN, Service: , Ident: (appAttemptId { application_id { id: 11 cluster_timestamp: 1511328189618 } attemptId: 1 } keyId: -121482599) 17/11/22 10:30:16 INFO impl.TimelineClientImpl: Timeline service address: http://node1.c.upload-161114.internal:8188/ws/v1/timeline/ 17/11/22 10:30:17 INFO client.RMProxy: Connecting to ResourceManager at node1.c.upload-161114.internal/10.128.0.2:8030 17/11/22 10:30:17 INFO impl.NMClientAsyncImpl: Upper bound of the thread pool size is 500 17/11/22 10:30:17 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0 17/11/22 10:30:18 INFO service.ApplicationMaster: Max mem capabililty of resources in this cluster 27648 17/11/22 10:30:18 INFO service.ApplicationMaster: Max vcores capabililty of resources in this cluster 6 17/11/22 10:30:18 INFO imps.CuratorFrameworkImpl: Starting 17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT 17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:host.name=node1.c.upload-161114.internal 17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_77 17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation 17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/jdk64/jdk1.8.0_77/jre 17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:java.class.path=$CLASSPATH: (too many parameters so removed from logs) 17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:java.library.path=::/usr/hdp/2.5.3.0-37/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.3.0-37/hadoop/lib/native::/usr/hdp/2.5.3.0-37/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.3.0-37/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir:/usr/hdp/2.5.3.0-37/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.3.0-37/hadoop/lib/native:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp 17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64 17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:os.version=3.10.0-693.5.2.el7.x86_64 17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:user.name=yarn 17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/yarn 17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Client environment:user.dir=/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001 17/11/22 10:30:18 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=node1:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@e27ba81 17/11/22 10:30:18 INFO zookeeper.ClientCnxn: Opening socket connection to server node1.c.upload-161114.internal/10.128.0.2:2181. Will not attempt to authenticate using SASL (unknown error) 17/11/22 10:30:18 INFO zookeeper.ClientCnxn: Socket connection established to node1.c.upload-161114.internal/10.128.0.2:2181, initiating session 17/11/22 10:30:18 INFO zookeeper.ClientCnxn: Session establishment complete on server node1.c.upload-161114.internal/10.128.0.2:2181, sessionid = 0x15fe233644e0114, negotiated timeout = 40000 17/11/22 10:30:18 INFO state.ConnectionStateManager: State change: CONNECTED 17/11/22 10:30:19 INFO service.ApplicationMaster: Ready to accept requests... 17/11/22 10:31:17 INFO service.ApplicationMaster: [ADD]: Received request for model null:1.0x1 containers of size 512M at path /user/root/models 17/11/22 10:31:19 INFO impl.AMRMClientImpl: Received new token for : node1.c.upload-161114.internal:45454 17/11/22 10:31:19 INFO callback.ContainerRequestListener: Got response from RM for container ask, allocatedCnt=1 17/11/22 10:31:19 INFO service.ApplicationMaster: Found container id of 4398046511106 17/11/22 10:31:19 INFO callback.ContainerRequestListener: Launching shell command on a new container., containerId=container_e04_1511328189618_0011_01_000002, containerNode=node1.c.upload-161114.internal:45454, containerNodeURI=node1.c.upload-161114.internal:8042, containerResourceMemory=9216, containerResourceVirtualCores=1 17/11/22 10:31:19 INFO callback.LaunchContainer: Setting up container launch container for containerid=container_e04_1511328189618_0011_01_000002 17/11/22 10:31:19 INFO callback.LaunchContainer: Local Directory Contents 17/11/22 10:31:19 INFO callback.LaunchContainer: 6 - tmp 17/11/22 10:31:19 INFO callback.LaunchContainer: 74 - container_tokens 17/11/22 10:31:19 INFO callback.LaunchContainer: 12 - .container_tokens.crc 17/11/22 10:31:19 INFO callback.LaunchContainer: 3646 - launch_container.sh 17/11/22 10:31:19 INFO callback.LaunchContainer: 40 - .launch_container.sh.crc 17/11/22 10:31:19 INFO callback.LaunchContainer: 653 - default_container_executor_session.sh 17/11/22 10:31:19 INFO callback.LaunchContainer: 16 - .default_container_executor_session.sh.crc 17/11/22 10:31:19 INFO callback.LaunchContainer: 707 - default_container_executor.sh 17/11/22 10:31:19 INFO callback.LaunchContainer: 16 - .default_container_executor.sh.crc 17/11/22 10:31:19 INFO callback.LaunchContainer: 19761989 - AppMaster.jar 17/11/22 10:31:19 INFO callback.LaunchContainer: Localizing /user/root/models 17/11/22 10:31:19 INFO callback.LaunchContainer: Model payload: /user/root/models 17/11/22 10:31:19 INFO callback.LaunchContainer: AppJAR Location: hdfs://node1.c.upload-161114.internal:8020/user/root/MaaS/application_1511328189618_0011/AppMaster.jar 17/11/22 10:31:19 INFO callback.LaunchContainer: Localized dga.py -> LocatedFileStatus{path=hdfs://node1.c.upload-161114.internal:8020/user/root/models/dga.py; isDirectory=false; length=744; replication=3; blocksize=134217728; modification_time=1511346677472; access_time=1511346677462; owner=root; group=hdfs; permission=rw-r--r--; isSymlink=false} 17/11/22 10:31:19 INFO callback.LaunchContainer: Localized rest.py -> LocatedFileStatus{path=hdfs://node1.c.upload-161114.internal:8020/user/root/models/rest.py; isDirectory=false; length=26; replication=3; blocksize=134217728; modification_time=1511346677456; access_time=1511346677299; owner=root; group=hdfs; permission=rw-r--r--; isSymlink=false} 17/11/22 10:31:19 INFO callback.LaunchContainer: AppMaster.jar localized: scheme: "hdfs" host: "node1.c.upload-161114.internal" port: 8020 file: "/user/root/MaaS/application_1511328189618_0011/AppMaster.jar" 17/11/22 10:31:19 INFO callback.LaunchContainer: dga.py localized: scheme: "hdfs" host: "node1.c.upload-161114.internal" port: 8020 file: "/user/root/models/dga.py" 17/11/22 10:31:19 INFO callback.LaunchContainer: rest.py localized: scheme: "hdfs" host: "node1.c.upload-161114.internal" port: 8020 file: "/user/root/models/rest.py" 17/11/22 10:31:19 INFO callback.LaunchContainer: Executing container command: {{JAVA_HOME}}/bin/java org.apache.metron.maas.service.runner.Runner -ci 4398046511106 -zq node1:2181 -zr /metron/maas/config -s -n -hn node1.c.upload-161114.internal -v 1.0 1><LOG_DIR>/stdout 2><LOG_DIR>/stderr 17/11/22 10:31:19 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_e04_1511328189618_0011_01_000002 17/11/22 10:31:19 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1.c.upload-161114.internal:45454 17/11/22 10:31:19 INFO impl.NMClientAsyncImpl: Processing Event EventType: QUERY_CONTAINER for Container container_e04_1511328189618_0011_01_000002 17/11/22 10:31:19 INFO impl.ContainerManagementProtocolProxy: Opening proxy : node1.c.upload-161114.internal:45454 17/11/22 10:31:20 INFO callback.ContainerRequestListener: Got response from RM for container ask, completedCnt=1 17/11/22 10:31:20 INFO callback.ContainerRequestListener: Got container status for containerID=container_e04_1511328189618_0011_01_000002, state=COMPLETE, exitStatus=1, diagnostics=Exception from container-launch. Container id: container_e04_1511328189618_0011_01_000002 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:933) at org.apache.hadoop.util.Shell.run(Shell.java:844) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1123) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:225) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:317) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:83) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Container exited with a non-zero exit code 1 17/11/22 10:31:20 INFO callback.ContainerRequestListener: REMOVING CONTAINER container_e04_1511328189618_0011_01_000002 17/11/22 10:31:20 WARN discovery.ServiceDiscoverer: Unable to find registered model associated with container container_e04_1511328189618_0011_01_000002 17/11/22 10:31:20 ERROR discovery.ServiceDiscoverer: Unable to unregister container container_e04_1511328189618_0011_01_000002 due to: Unable. java.lang.IllegalStateException: Unable. at org.apache.metron.maas.discovery.ServiceDiscoverer.unregisterByContainer(ServiceDiscoverer.java:204) at org.apache.metron.maas.service.callback.ContainerRequestListener.onContainersCompleted(ContainerRequestListener.java:121) at org.apache.hadoop.yarn.client.api.async.impl.AMRMClientAsyncImpl$CallbackHandlerThread.run(AMRMClientAsyncImpl.java:300) End of LogType:AppMaster.stderr LogType:AppMaster.stdout Log Upload Time:Wed Nov 22 10:38:11 +0000 2017 LogLength:0 Log Contents: End of LogType:AppMaster.stdout LogType:directory.info Log Upload Time:Wed Nov 22 10:38:11 +0000 2017 LogLength:1638 Log Contents: ls -l: total 16 lrwxrwxrwx. 1 yarn hadoop 100 Nov 22 10:30 AppMaster.jar -> /hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/filecache/10/AppMaster.jar -rw-r--r--. 1 yarn hadoop 74 Nov 22 10:30 container_tokens -rwx------. 1 yarn hadoop 653 Nov 22 10:30 default_container_executor_session.sh -rwx------. 1 yarn hadoop 707 Nov 22 10:30 default_container_executor.sh -rwx------. 1 yarn hadoop 3646 Nov 22 10:30 launch_container.sh drwx--x---. 2 yarn hadoop 6 Nov 22 10:30 tmp find -L . -maxdepth 5 -ls: 1258621687 4 drwx--x--- 3 yarn hadoop 4096 Nov 22 10:30 . 1275089340 0 drwx--x--- 2 yarn hadoop 6 Nov 22 10:30 ./tmp 1258621688 4 -rw-r--r-- 1 yarn hadoop 74 Nov 22 10:30 ./container_tokens 1258621689 4 -rw-r--r-- 1 yarn hadoop 12 Nov 22 10:30 ./.container_tokens.crc 1258621690 4 -rwx------ 1 yarn hadoop 3646 Nov 22 10:30 ./launch_container.sh 1258621691 4 -rw-r--r-- 1 yarn hadoop 40 Nov 22 10:30 ./.launch_container.sh.crc 1258621692 4 -rwx------ 1 yarn hadoop 653 Nov 22 10:30 ./default_container_executor_session.sh 1258621693 4 -rw-r--r-- 1 yarn hadoop 16 Nov 22 10:30 ./.default_container_executor_session.sh.crc 1258621984 4 -rwx------ 1 yarn hadoop 707 Nov 22 10:30 ./default_container_executor.sh 1258621995 4 -rw-r--r-- 1 yarn hadoop 16 Nov 22 10:30 ./.default_container_executor.sh.crc 1233149577 19300 -r-x------ 1 yarn hadoop 19761989 Nov 22 10:30 ./AppMaster.jar broken symlinks(find -L . -maxdepth 5 -type l -ls): End of LogType:directory.info LogType:launch_container.sh Log Upload Time:Wed Nov 22 10:38:11 +0000 2017 LogLength:3646 Log Contents: #!/bin/bash export LOCAL_DIRS="/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011" export APPLICATION_WEB_PROXY_BASE="/proxy/application_1511328189618_0011" export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/usr/hdp/current/hadoop-client/conf"} export MAX_APP_ATTEMPTS="2" export NM_HTTP_PORT="8042" export JAVA_HOME=${JAVA_HOME:-"/usr/jdk64/jdk1.8.0_77"} export LOG_DIRS="/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001" export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= " export NM_PORT="45454" export USER="root" export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/usr/hdp/current/hadoop-yarn-nodemanager"} export CLASSPATH="$CLASSPATH:./*:$HADOOP_CONF_DIR:/usr/hdp/current/hadoop-client/*:/usr/hdp/current/hadoop-client/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:./log4j.properties" export APP_SUBMIT_TIME_ENV="1511346614692" export NM_HOST="node1.c.upload-161114.internal" export HADOOP_TOKEN_FILE_LOCATION="/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/container_tokens" export NM_AUX_SERVICE_spark_shuffle="" export LOCAL_USER_DIRS="/hadoop/yarn/local/usercache/root/" export LOGNAME="root" export JVM_PID="$$" export PWD="/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001" export HOME="/home/" export NM_AUX_SERVICE_spark2_shuffle="" export CONTAINER_ID="container_e04_1511328189618_0011_01_000001" export MALLOC_ARENA_MAX="4" ln -sf "/hadoop/yarn/local/usercache/root/appcache/application_1511328189618_0011/filecache/10/AppMaster.jar" "AppMaster.jar" hadoop_shell_errorcode=$? if [ $hadoop_shell_errorcode -ne 0 ] then exit $hadoop_shell_errorcode fi # Creating copy of launch script cp "launch_container.sh" "/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/launch_container.sh" chmod 640 "/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/launch_container.sh" # Determining directory contents echo "ls -l:" 1>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/directory.info" ls -l 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/directory.info" echo "find -L . -maxdepth 5 -ls:" 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/directory.info" find -L . -maxdepth 5 -ls 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/directory.info" echo "broken symlinks(find -L . -maxdepth 5 -type l -ls):" 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/directory.info" find -L . -maxdepth 5 -type l -ls 1>>"/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/directory.info" exec /bin/bash -c "$JAVA_HOME/bin/java -Xmx10m org.apache.metron.maas.service.ApplicationMaster -zq node1:2181 -zr /metron/maas/config -aj hdfs://node1.c.upload-161114.internal:8020/user/root/MaaS/application_1511328189618_0011/AppMaster.jar 1>/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/AppMaster.stdout 2>/hadoop/yarn/log/application_1511328189618_0011/container_e04_1511328189618_0011_01_000001/AppMaster.stderr " hadoop_shell_errorcode=$? if [ $hadoop_shell_errorcode -ne 0 ] then exit $hadoop_shell_errorcode fi End of LogType:launch_container.sh
Created 11-24-2017 04:54 AM
@Girish N that's strange. Is this a full dev vagrant deployment? What version are you running? Also, if possible, I would advice trying this again afresh, just to rule out any misconfiguration and to confirm if the problem is repeatable.
In the meantime, I will spin up a full dev on my environment with the latest release bits to validate.
Created 11-28-2017 09:01 AM
@Girish N, I was able to fire up a full dev environment, follow through the steps at:
https://github.com/apache/metron/tree/master/metron-analytics/metron-maas-service#example
And I was able to get Mock DGA model working. I created a HCC article with the steps I followed to get it working on a full dev platform. Please see if it helps:
Created 11-29-2017 05:26 AM
Thanks @asubramanian,
Will give it a try.
Created on 12-01-2017 10:28 AM - edited 08-17-2019 11:37 PM
I re ran with the steps mentioned in the article , i don't see any errors/exceptions, i used to see earlier.
Now after starting service and deploying, when I try to List the models deployed, I don't see any output with the one mentioned in the article.
From the ResourceManager UI, I see the MaaS service is running, but the allocated memory is more than the one specified from the maas deploy command.
MaaS list command - $MATRON_HOME/bin/maas_deploy.sh -zq node1:2181 -mo LIST
[metron@node1 bin]$ ./maas_deploy.sh -zq node1:2181 -mo LIST 17/12/01 10:06:44 INFO imps.CuratorFrameworkImpl: Starting 17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-37--1, built on 11/29/2016 17:59 GMT 17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:host.name=node1.c.upload-161114.internal 17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_77 17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation 17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/jdk64/jdk1.8.0_77/jre 17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:java.class.path=/usr/hdp/2.5.3.0-37/hadoop/conf:... 17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:java.library.path=:/usr/hdp/2.5.3.0-37/hadoop/lib/native/Linux-amd64-64:/usr/hdp/2.5.3.0-37/hadoop/lib/native:/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir 17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/var/lib/ambari-agent/tmp/hadoop_java_io_tmpdir 17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:java.compiler=<NA> 17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:os.name=Linux 17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:os.arch=amd64 17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:os.version=3.10.0-693.5.2.el7.x86_64 17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:user.name=metron 17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:user.home=/home/metron 17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Client environment:user.dir=/usr/metron/0.4.2/bin 17/12/01 10:06:44 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=node1:2181 sessionTimeout=60000 watcher=org.apache.curator.ConnectionState@72fa021 17/12/01 10:06:44 INFO zookeeper.ClientCnxn: Opening socket connection to server node1.c.upload-161114.internal/10.128.0.2:2181. Will not attempt to authenticate using SASL (unknown error) 17/12/01 10:06:44 INFO zookeeper.ClientCnxn: Socket connection established to node1.c.upload-161114.internal/10.128.0.2:2181, initiating session 17/12/01 10:06:44 INFO zookeeper.ClientCnxn: Session establishment complete on server node1.c.upload-161114.internal/10.128.0.2:2181, sessionid = 0x16010f3a7a10388, negotiated timeout = 40000 17/12/01 10:06:44 INFO state.ConnectionStateManager: State change: CONNECTED 17/12/01 10:06:45 INFO zookeeper.ZooKeeper: Session: 0x16010f3a7a10388 closed 17/12/01 10:06:45 INFO zookeeper.ClientCnxn: EventThread shut down
I tried giving all permission to hdfs folder "/user/metron"
Created 12-01-2017 04:07 PM
Hey @Girish N, I saw the same issue with model name and URL not listed with the -mo LIST command. I had to destroy the VM and re-deploy in order to get it working.
From the ResourceManager UI, I see the MaaS service is running, but the allocated memory is more than the one specified from the maas deploy command.
Hm.. I am not sure about this.
Created 03-30-2018 09:34 AM
@asubramanian/ @Girish N: I am facing the same issue ..."Session closed immediately". Would you please let me know how to sort out this issue and would you please share your deployment steps which you followed. I am using 0.4.1 metron version. Do i need to move on to 0.4.2 for maas to work?