About TimothySpann

TimothySpann · ‎02-26-2018

This is for people preparing to attend my talk on Deep Learning at DataWorks Summit Berling 2018 (https://dataworkssummit.com/berlin-2018/#agenda) on Thursday April 19, 2018 at 11:50AM Berlin time. Motivation: Use the Power of Our Massive HDP Hadoop Cluster and YARN to Run My Apache MXNet For a real-world Deep Learning problem, you would want to have some GPUs in your cluster and in the near term YARN 3.0 will allow you to manage those GPU resources. For now, install a few GPUs in special HDP cluster just for training Data Science jobs. You can make this cluster compute and RAM heavy for use by TensorFlow, Apache MXNet, Apache Spark and other YARN workloads. In my example, we are running inception against an image, this could be from a security camera, a drone or other industrial purposes. We will dive deeper into industrial IIOT use cases here: If you are in Philadelphia, please join me. If not, all of the content will be shared on slideshare, github and here. To set this up, we will be running Apache MXNet on Centos 7 HDP 2.6.4 nodes. We are going to run Apache MXNet Python scripts on our Hadoop cluster!!! Let's get this installed! git clone https://github.com/apache/incubator-mxnet.git The installation instructions at Apache MXNet's website (http://mxnet.incubator.apache.org/install/index.html) are amazing. Pick your platform and your style. I am doing this the simplest way on Linux path with pip. We need to install OpenCV to handle images in Python. So we install that and all the build tools that OpenCV requires to build it and Apache MXNet. Follow the install details here: https://community.hortonworks.com/articles/174227/apache-deep-learning-101-using-apache-mxnet-on-an.html YARN Cluster Submit for Apache MXNet This requires some additional libraries and the Java JDK to compile the DMLC submission code. yum install java-1.8.0-openjdk yum install java-1.8.0-openjdk-devel pip install kubernetes git clone https://github.com/dmlc/dmlc-core.git cd dmlc-core make cd tracker/yarn ./build.sh To run my example that saves to HDFS export JAVA_HOME=/usr/jdk64/jdk1.8.0_112 pip install pydoop You will need to match your version of Hadoop and JDK. You will need to be on a HDP node or have HDP client with all the correct environment variables! source yarnsubmit.sh export HADOOP_HOME=/usr/hdp/2.6.4.0-91/hadoop export HADOOP_HDFS_HOME=/usr/hdp/2.6.4.0-91/hadoop-hdfs export hdfs_home=/usr/hdp/2.6.4.0-91/hadoop-hdfs export hadoop_hdfs_home=/usr/hdp/2.6.4.0-91/hadoop-hdfs /opt/demo/dmlc-core/tracker/dmlc-submit --cluster yarn --num-workers 1 --server-cores 2 --server-memory 1G --log-level DEBUG --log-file /opt/demo/logs/mxnet.log /opt/demo/incubator-mxnet/analyzeyarn.py We are using the DMLC Job Tracker for YARN job submission: https://github.com/dmlc/dmlc-core/tree/master/tracker You will need to do a git clone on the directory. I have installed on an HDP node as follows: /opt/demo/incubator-mxnet When I run my program I can example the Apache YARN logs via the command line tool like so: yarn logs -applicationId application_1517883514475_0588 18/02/25 02:41:56 INFO client.RMProxy: Connecting to ResourceManager at princeton0.field.hortonworks.com/172.26.200.216:8050 18/02/25 02:41:57 INFO client.AHSProxy: Connecting to Application History server at princeton0.field.hortonworks.com/172.26.200.216:10200 18/02/25 02:42:00 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library 18/02/25 02:42:00 INFO compress.CodecPool: Got brand-new decompressor [.deflate] Container: container_e01_1517883514475_0588_01_000001 on princeton0.field.hortonworks.com_45454 LogAggregationType: AGGREGATED =============================================================================================== LogType:directory.info LogLastModifiedTime:Sun Feb 25 02:40:08 +0000 2018 LogLength:2119 LogContents: ls -l: total 20 lrwxrwxrwx. 1 yarn hadoop 101 Feb 25 02:40 analyzeyarn.py -> /hadoop/yarn/local/usercache/root/appcache/application_1517883514475_0588/filecache/11/analyzeyarn.py -rw-r--r--. 1 yarn hadoop 75 Feb 25 02:40 container_tokens -rwx------. 1 yarn hadoop 653 Feb 25 02:40 default_container_executor_session.sh -rwx------. 1 yarn hadoop 707 Feb 25 02:40 default_container_executor.sh lrwxrwxrwx. 1 yarn hadoop 100 Feb 25 02:40 dmlc-yarn.jar -> /hadoop/yarn/local/usercache/root/appcache/application_1517883514475_0588/filecache/12/dmlc-yarn.jar -rwx------. 1 yarn hadoop 4808 Feb 25 02:40 launch_container.sh lrwxrwxrwx. 1 yarn hadoop 98 Feb 25 02:40 launcher.py -> /hadoop/yarn/local/usercache/root/appcache/application_1517883514475_0588/filecache/10/launcher.py drwx--x---. 2 yarn hadoop 6 Feb 25 02:40 tmp find -L . -maxdepth 5 -ls: 419611738 4 drwx--x--- 3 yarn hadoop 4096 Feb 25 02:40 . 436305675 0 drwx--x--- 2 yarn hadoop 6 Feb 25 02:40 ./tmp 419611739 4 -rw-r--r-- 1 yarn hadoop 75 Feb 25 02:40 ./container_tokens 419611740 4 -rw-r--r-- 1 yarn hadoop 12 Feb 25 02:40 ./.container_tokens.crc 419611741 8 -rwx------ 1 yarn hadoop 4808 Feb 25 02:40 ./launch_container.sh 419611742 4 -rw-r--r-- 1 yarn hadoop 48 Feb 25 02:40 ./.launch_container.sh.crc 419611743 4 -rwx------ 1 yarn hadoop 653 Feb 25 02:40 ./default_container_executor_session.sh 419611744 4 -rw-r--r-- 1 yarn hadoop 16 Feb 25 02:40 ./.default_container_executor_session.sh.crc 419611745 4 -rwx------ 1 yarn hadoop 707 Feb 25 02:40 ./default_container_executor.sh 419611746 4 -rw-r--r-- 1 yarn hadoop 16 Feb 25 02:40 ./.default_container_executor.sh.crc 394926208 24 -r-x------ 1 yarn hadoop 21427 Feb 25 02:40 ./dmlc-yarn.jar 361654889 4 -r-x------ 1 yarn hadoop 2765 Feb 25 02:40 ./launcher.py 378183873 4 -r-x------ 1 yarn hadoop 3815 Feb 25 02:40 ./analyzeyarn.py broken symlinks(find -L . -maxdepth 5 -type l -ls): End of LogType:directory.info ******************************************************************************* Container: container_e01_1517883514475_0588_01_000001 on princeton0.field.hortonworks.com_45454 LogAggregationType: AGGREGATED =============================================================================================== LogType:launch_container.sh LogLastModifiedTime:Sun Feb 25 02:40:08 +0000 2018 LogLength:4808 LogContents: #!/bin/bash set -o pipefail -e export PRELAUNCH_OUT="/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000001/prelaunch.out" exec >"${PRELAUNCH_OUT}" export PRELAUNCH_ERR="/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000001/prelaunch.err" exec 2>"${PRELAUNCH_ERR}" echo "Setting up env variables" export PATH="/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent" export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/usr/hdp/2.6.4.0-91/hadoop/conf"} export DMLC_NUM_SERVER="0" export MAX_APP_ATTEMPTS="2" export DMLC_WORKER_CORES="1" export DMLC_WORKER_MEMORY_MB="1024" export DMLC_SERVER_MEMORY_MB="1024" export JAVA_HOME=${JAVA_HOME:-"/usr/jdk64/jdk1.8.0_112"} export LANG="en_US.UTF-8" export APP_SUBMIT_TIME_ENV="1519526407793" export NM_HOST="princeton0.field.hortonworks.com" export DMLC_JOB_ARCHIVES="" export DMLC_SERVER_CORES="2" export LOGNAME="root" export JVM_PID="$$" export DMLC_TRACKER_PORT="9091" export PWD="/hadoop/yarn/local/usercache/root/appcache/application_1517883514475_0588/container_e01_1517883514475_0588_01_000001" export LOCAL_DIRS="/hadoop/yarn/local/usercache/root/appcache/application_1517883514475_0588" export APPLICATION_WEB_PROXY_BASE="/proxy/application_1517883514475_0588" export NM_HTTP_PORT="8042" export DMLC_TRACKER_URI="172.26.200.216" export LOG_DIRS="/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000001" export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= " export NM_PORT="45454" export USER="root" export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/usr/hdp/2.6.4.0-91/hadoop-yarn"} export CLASSPATH="$CLASSPATH:.:*:/usr/hdp/2.6.4.0-91/hadoop/conf:/usr/hdp/2.6.4.0-91/hadoop/*:/usr/hdp/2.6.4.0-91/hadoop/lib/*:/usr/hdp/current/hadoop-hdfs-client/*:/usr/hdp/current/hadoop-hdfs-client/lib/*:/usr/hdp/current/hadoop-yarn-client/*:/usr/hdp/current/hadoop-yarn-client/lib/*:/usr/hdp/current/ext/hadoop/*" export DMLC_NUM_WORKER="1" export DMLC_JOB_CLUSTER="yarn" export HADOOP_TOKEN_FILE_LOCATION="/hadoop/yarn/local/usercache/root/appcache/application_1517883514475_0588/container_e01_1517883514475_0588_01_000001/container_tokens" export NM_AUX_SERVICE_spark_shuffle="" export LOCAL_USER_DIRS="/hadoop/yarn/local/usercache/root/" export HADOOP_HOME="/usr/hdp/2.6.4.0-91/hadoop" export HOME="/home/" export NM_AUX_SERVICE_spark2_shuffle="" export CONTAINER_ID="container_e01_1517883514475_0588_01_000001" export MALLOC_ARENA_MAX="4" echo "Setting up job resources" ln -sf "/hadoop/yarn/local/usercache/root/appcache/application_1517883514475_0588/filecache/12/dmlc-yarn.jar" "dmlc-yarn.jar" ln -sf "/hadoop/yarn/local/usercache/root/appcache/application_1517883514475_0588/filecache/10/launcher.py" "launcher.py" ln -sf "/hadoop/yarn/local/usercache/root/appcache/application_1517883514475_0588/filecache/11/analyzeyarn.py" "analyzeyarn.py" echo "Copying debugging information" # Creating copy of launch script cp "launch_container.sh" "/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000001/launch_container.sh" chmod 640 "/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000001/launch_container.sh" # Determining directory contents echo "ls -l:" 1>"/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000001/directory.info" ls -l 1>>"/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000001/directory.info" echo "find -L . -maxdepth 5 -ls:" 1>>"/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000001/directory.info" find -L . -maxdepth 5 -ls 1>>"/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000001/directory.info" echo "broken symlinks(find -L . -maxdepth 5 -type l -ls):" 1>>"/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000001/directory.info" find -L . -maxdepth 5 -type l -ls 1>>"/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000001/directory.info" echo "Launching container" exec /bin/bash -c "$JAVA_HOME/bin/java -Xmx900m org.apache.hadoop.yarn.dmlc.ApplicationMaster -file "hdfs:/tmp/temp-dmlc-yarn-application_1517883514475_0588/launcher.py#launcher.py" -file "hdfs:/tmp/temp-dmlc-yarn-application_1517883514475_0588/analyzeyarn.py#analyzeyarn.py" -file "hdfs:/tmp/temp-dmlc-yarn-application_1517883514475_0588/dmlc-yarn.jar#dmlc-yarn.jar" ./launcher.py ./analyzeyarn.py 1>/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000001/stdout 2>/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000001/stderr" End of LogType:launch_container.sh ************************************************************************************ End of LogType:stdout *********************************************************************** Container: container_e01_1517883514475_0588_01_000001 on princeton0.field.hortonworks.com_45454 LogAggregationType: AGGREGATED =============================================================================================== LogType:prelaunch.out LogLastModifiedTime:Sun Feb 25 02:40:08 +0000 2018 LogLength:100 LogContents: Setting up env variables Setting up job resources Copying debugging information Launching container End of LogType:prelaunch.out ****************************************************************************** Container: container_e01_1517883514475_0588_01_000001 on princeton0.field.hortonworks.com_45454 LogAggregationType: AGGREGATED =============================================================================================== LogType:stderr LogLastModifiedTime:Sun Feb 25 02:40:23 +0000 2018 LogLength:18617 LogContents: 18/02/25 02:40:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/02/25 02:40:12 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 18/02/25 02:40:12 INFO dmlc.ApplicationMaster: Start AM as user=yarn 18/02/25 02:40:12 INFO dmlc.ApplicationMaster: Try to start 0 Servers and 1 Workers 18/02/25 02:40:12 INFO client.RMProxy: Connecting to ResourceManager at princeton0.field.hortonworks.com/172.26.200.216:8030 18/02/25 02:40:13 INFO impl.NMClientAsyncImpl: Upper bound of the thread pool size is 500 18/02/25 02:40:13 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0 18/02/25 02:40:13 INFO dmlc.ApplicationMaster: [DMLC] ApplicationMaster started 18/02/25 02:40:15 INFO impl.AMRMClientImpl: Received new token for : princeton0.field.hortonworks.com:45454 18/02/25 02:40:15 INFO dmlc.ApplicationMaster: {launcher.py=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1517883514475_0588/launcher.py" } size: 2765 timestamp: 1519526407236 type: FILE visibility: APPLICATION, analyzeyarn.py=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1517883514475_0588/analyzeyarn.py" } size: 3815 timestamp: 1519526407703 type: FILE visibility: APPLICATION, dmlc-yarn.jar=resource { scheme: "hdfs" port: -1 file: "/tmp/temp-dmlc-yarn-application_1517883514475_0588/dmlc-yarn.jar" } size: 21427 timestamp: 1519526407738 type: FILE visibility: APPLICATION} 18/02/25 02:40:15 INFO dmlc.ApplicationMaster: {PYTHONPATH=${PYTHONPATH}:., DMLC_NUM_SERVER=0, DMLC_NODE_HOST=princeton0.field.hortonworks.com, DMLC_ROLE=worker, DMLC_WORKER_CORES=1, DMLC_WORKER_MEMORY_MB=1024, DMLC_SERVER_MEMORY_MB=1024, DMLC_TRACKER_URI=172.26.200.216, CLASSPATH=${CLASSPATH}:./*:/usr/hdp/2.6.4.0-91/hadoop/conf:/usr/hdp/2.6.4.0-91/hadoop//azure-data-lake-store-sdk-2.1.4.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-annotations-2.7.3.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-annotations.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-auth-2.7.3.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-auth.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-aws-2.7.3.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-aws.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-azure-2.7.3.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-azure-datalake-2.7.3.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-azure-datalake.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-azure.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-common-2.7.3.2.6.4.0-91-tests.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-common-2.7.3.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-common-tests.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-common.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-nfs-2.7.3.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-nfs.jar:/usr/hdp/2.6.4.0-91/hadoop//:/usr/hdp/2.6.4.0-91/hadoop/lib//jackson-core-asl-1.9.13.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//ranger-hdfs-plugin-shim-0.7.0.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//nimbus-jose-jwt-3.9.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//ranger-plugin-classloader-0.7.0.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jackson-databind-2.2.3.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//ranger-yarn-plugin-shim-0.7.0.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//activation-1.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jetty-6.1.26.hwx.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//apacheds-i18n-2.0.0-M15.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jackson-jaxrs-1.9.13.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//apacheds-kerberos-codec-2.0.0-M15.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jetty-sslengine-6.1.26.hwx.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//api-asn1-api-1.0.0-M20.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jetty-util-6.1.26.hwx.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//api-util-1.0.0-M20.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//asm-3.2.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//avro-1.7.4.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//joda-time-2.9.4.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//aws-java-sdk-core-1.10.6.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jsch-0.1.54.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//aws-java-sdk-kms-1.10.6.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//json-smart-1.1.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//aws-java-sdk-s3-1.10.6.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jackson-mapper-asl-1.9.13.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//azure-keyvault-core-0.8.0.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jsp-api-2.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//azure-storage-5.4.0.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jsr305-3.0.0.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-beanutils-1.7.0.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jersey-core-1.9.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-beanutils-core-1.8.0.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-cli-1.2.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//junit-4.11.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-codec-1.4.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jersey-json-1.9.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-collections-3.2.2.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//log4j-1.2.17.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-compress-1.4.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jersey-server-1.9.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-configuration-1.6.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//mockito-all-1.8.5.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-digester-1.8.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-io-2.4.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-lang-2.6.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//netty-3.6.2.Final.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-lang3-3.4.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//paranamer-2.3.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-logging-1.1.3.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//protobuf-java-2.5.0.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-math3-3.1.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-net-3.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//servlet-api-2.5.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//curator-client-2.7.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//slf4j-api-1.7.10.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//curator-framework-2.7.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//slf4j-log4j12-1.7.10.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//curator-recipes-2.7.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//gson-2.2.4.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//guava-11.0.2.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//snappy-java-1.0.4.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//hamcrest-core-1.3.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jets3t-0.9.0.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//htrace-core-3.1.0-incubating.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//httpclient-4.5.2.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//httpcore-4.4.4.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jettison-1.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jackson-annotations-2.2.3.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//stax-api-1.0-2.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jackson-core-2.2.3.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jackson-xc-1.9.13.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//xmlenc-0.52.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//java-xmlbuilder-0.4.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jaxb-api-2.2.2.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//zookeeper-3.4.6.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jaxb-impl-2.2.3-1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//xz-1.0.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jcip-annotations-1.0.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//:/usr/hdp/current/hadoop-hdfs-client//hadoop-hdfs-2.7.3.2.6.4.0-91-tests.jar:/usr/hdp/current/hadoop-hdfs-client//hadoop-hdfs-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-hdfs-client//hadoop-hdfs-nfs-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-hdfs-client//hadoop-hdfs-nfs.jar:/usr/hdp/current/hadoop-hdfs-client//hadoop-hdfs-tests.jar:/usr/hdp/current/hadoop-hdfs-client//hadoop-hdfs.jar:/usr/hdp/current/hadoop-hdfs-client//:/usr/hdp/current/hadoop-hdfs-client/lib//asm-3.2.jar:/usr/hdp/current/hadoop-hdfs-client/lib//commons-cli-1.2.jar:/usr/hdp/current/hadoop-hdfs-client/lib//commons-codec-1.4.jar:/usr/hdp/current/hadoop-hdfs-client/lib//commons-daemon-1.0.13.jar:/usr/hdp/current/hadoop-hdfs-client/lib//commons-io-2.4.jar:/usr/hdp/current/hadoop-hdfs-client/lib//commons-lang-2.6.jar:/usr/hdp/current/hadoop-hdfs-client/lib//commons-logging-1.1.3.jar:/usr/hdp/current/hadoop-hdfs-client/lib//guava-11.0.2.jar:/usr/hdp/current/hadoop-hdfs-client/lib//htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jackson-annotations-2.2.3.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jackson-core-2.2.3.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jackson-core-asl-1.9.13.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jackson-databind-2.2.3.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jackson-mapper-asl-1.9.13.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jersey-core-1.9.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jersey-server-1.9.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jetty-6.1.26.hwx.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jetty-util-6.1.26.hwx.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jsr305-3.0.0.jar:/usr/hdp/current/hadoop-hdfs-client/lib//leveldbjni-all-1.8.jar:/usr/hdp/current/hadoop-hdfs-client/lib//log4j-1.2.17.jar:/usr/hdp/current/hadoop-hdfs-client/lib//netty-3.6.2.Final.jar:/usr/hdp/current/hadoop-hdfs-client/lib//netty-all-4.0.52.Final.jar:/usr/hdp/current/hadoop-hdfs-client/lib//okhttp-2.4.0.jar:/usr/hdp/current/hadoop-hdfs-client/lib//okio-1.4.0.jar:/usr/hdp/current/hadoop-hdfs-client/lib//protobuf-java-2.5.0.jar:/usr/hdp/current/hadoop-hdfs-client/lib//servlet-api-2.5.jar:/usr/hdp/current/hadoop-hdfs-client/lib//xercesImpl-2.9.1.jar:/usr/hdp/current/hadoop-hdfs-client/lib//xml-apis-1.3.04.jar:/usr/hdp/current/hadoop-hdfs-client/lib//xmlenc-0.52.jar:/usr/hdp/current/hadoop-hdfs-client/lib//:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-api-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-api.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-applications-distributedshell-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-applications-distributedshell.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-applications-unmanaged-am-launcher-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-applications-unmanaged-am-launcher.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-client-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-client.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-common-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-common.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-registry-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-registry.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-applicationhistoryservice-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-applicationhistoryservice.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-common-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-common.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-nodemanager-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-nodemanager.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-resourcemanager-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-resourcemanager.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-sharedcachemanager-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-sharedcachemanager.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-tests-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-tests.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-timeline-pluginstorage-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-timeline-pluginstorage.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-web-proxy-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-web-proxy.jar:/usr/hdp/current/hadoop-yarn-client//:/usr/hdp/current/hadoop-yarn-client/lib//activation-1.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//aopalliance-1.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//jersey-guice-1.9.jar:/usr/hdp/current/hadoop-yarn-client/lib//apacheds-i18n-2.0.0-M15.jar:/usr/hdp/current/hadoop-yarn-client/lib//javassist-3.18.1-GA.jar:/usr/hdp/current/hadoop-yarn-client/lib//apacheds-kerberos-codec-2.0.0-M15.jar:/usr/hdp/current/hadoop-yarn-client/lib//jersey-json-1.9.jar:/usr/hdp/current/hadoop-yarn-client/lib//api-asn1-api-1.0.0-M20.jar:/usr/hdp/current/hadoop-yarn-client/lib//jersey-server-1.9.jar:/usr/hdp/current/hadoop-yarn-client/lib//api-util-1.0.0-M20.jar:/usr/hdp/current/hadoop-yarn-client/lib//asm-3.2.jar:/usr/hdp/current/hadoop-yarn-client/lib//avro-1.7.4.jar:/usr/hdp/current/hadoop-yarn-client/lib//javax.inject-1.jar:/usr/hdp/current/hadoop-yarn-client/lib//azure-keyvault-core-0.8.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//jets3t-0.9.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//azure-storage-5.4.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//log4j-1.2.17.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-beanutils-1.7.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//jaxb-api-2.2.2.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-beanutils-core-1.8.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-cli-1.2.jar:/usr/hdp/current/hadoop-yarn-client/lib//leveldbjni-all-1.8.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-codec-1.4.jar:/usr/hdp/current/hadoop-yarn-client/lib//jaxb-impl-2.2.3-1.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-collections-3.2.2.jar:/usr/hdp/current/hadoop-yarn-client/lib//metrics-core-3.0.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-compress-1.4.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//json-smart-1.1.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-configuration-1.6.jar:/usr/hdp/current/hadoop-yarn-client/lib//netty-3.6.2.Final.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-digester-1.8.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-io-2.4.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-lang-2.6.jar:/usr/hdp/current/hadoop-yarn-client/lib//nimbus-jose-jwt-3.9.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-lang3-3.4.jar:/usr/hdp/current/hadoop-yarn-client/lib//objenesis-2.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-logging-1.1.3.jar:/usr/hdp/current/hadoop-yarn-client/lib//paranamer-2.3.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-math3-3.1.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-net-3.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//protobuf-java-2.5.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//curator-client-2.7.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//servlet-api-2.5.jar:/usr/hdp/current/hadoop-yarn-client/lib//curator-framework-2.7.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//snappy-java-1.0.4.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//curator-recipes-2.7.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//fst-2.24.jar:/usr/hdp/current/hadoop-yarn-client/lib//gson-2.2.4.jar:/usr/hdp/current/hadoop-yarn-client/lib//guava-11.0.2.jar:/usr/hdp/current/hadoop-yarn-client/lib//guice-3.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//stax-api-1.0-2.jar:/usr/hdp/current/hadoop-yarn-client/lib//guice-servlet-3.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//jcip-annotations-1.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hadoop-yarn-client/lib//httpclient-4.5.2.jar:/usr/hdp/current/hadoop-yarn-client/lib//httpcore-4.4.4.jar:/usr/hdp/current/hadoop-yarn-client/lib//jersey-client-1.9.jar:/usr/hdp/current/hadoop-yarn-client/lib//jackson-annotations-2.2.3.jar:/usr/hdp/current/hadoop-yarn-client/lib//xmlenc-0.52.jar:/usr/hdp/current/hadoop-yarn-client/lib//jackson-core-2.2.3.jar:/usr/hdp/current/hadoop-yarn-client/lib//xz-1.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//jackson-core-asl-1.9.13.jar:/usr/hdp/current/hadoop-yarn-client/lib//zookeeper-3.4.6.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client/lib//jackson-databind-2.2.3.jar:/usr/hdp/current/hadoop-yarn-client/lib//zookeeper-3.4.6.2.6.4.0-91-tests.jar:/usr/hdp/current/hadoop-yarn-client/lib//jackson-jaxrs-1.9.13.jar:/usr/hdp/current/hadoop-yarn-client/lib//jersey-core-1.9.jar:/usr/hdp/current/hadoop-yarn-client/lib//jackson-mapper-asl-1.9.13.jar:/usr/hdp/current/hadoop-yarn-client/lib//jackson-xc-1.9.13.jar:/usr/hdp/current/hadoop-yarn-client/lib//java-xmlbuilder-0.4.jar:/usr/hdp/current/hadoop-yarn-client/lib//jettison-1.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//jetty-6.1.26.hwx.jar:/usr/hdp/current/hadoop-yarn-client/lib//jsp-api-2.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//jetty-sslengine-6.1.26.hwx.jar:/usr/hdp/current/hadoop-yarn-client/lib//jetty-util-6.1.26.hwx.jar:/usr/hdp/current/hadoop-yarn-client/lib//jsch-0.1.54.jar:/usr/hdp/current/hadoop-yarn-client/lib//jsr305-3.0.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//:/usr/hdp/current/ext/hadoop//, DMLC_NUM_WORKER=1, DMLC_JOB_CLUSTER=yarn, DMLC_JOB_ARCHIVES=, LD_LIBRARY_PATH=, DMLC_SERVER_CORES=2, DMLC_NUM_ATTEMPT=0, DMLC_TRACKER_PORT=9091, DMLC_TASK_ID=0} 18/02/25 02:40:15 INFO impl.NMClientAsyncImpl: Processing Event EventType: START_CONTAINER for Container container_e01_1517883514475_0588_01_000002 18/02/25 02:40:15 INFO impl.ContainerManagementProtocolProxy: Opening proxy : princeton0.field.hortonworks.com:45454 18/02/25 02:40:15 INFO dmlc.ApplicationMaster: onContainerStarted Invoked 18/02/25 02:40:23 INFO dmlc.ApplicationMaster: Application completed. Stopping running containers 18/02/25 02:40:23 INFO impl.NMClientAsyncImpl: NM Client is being stopped. 18/02/25 02:40:23 INFO impl.NMClientAsyncImpl: Waiting for eventDispatcherThread to be interrupted. 18/02/25 02:40:23 INFO impl.NMClientAsyncImpl: eventDispatcherThread exited. 18/02/25 02:40:23 INFO impl.NMClientAsyncImpl: Stopping NM client. 18/02/25 02:40:23 INFO impl.NMClientImpl: Clean up running containers on stop. 18/02/25 02:40:23 INFO impl.NMClientImpl: Stopping container_e01_1517883514475_0588_01_000002 18/02/25 02:40:23 INFO impl.NMClientImpl: ok, stopContainerInternal.. container_e01_1517883514475_0588_01_000002 18/02/25 02:40:23 INFO impl.ContainerManagementProtocolProxy: Opening proxy : princeton0.field.hortonworks.com:45454 18/02/25 02:40:23 INFO impl.NMClientImpl: Running containers cleaned up. Stopping NM proxies. 18/02/25 02:40:23 INFO impl.NMClientImpl: Stopped all proxies. 18/02/25 02:40:23 INFO impl.NMClientAsyncImpl: NMClient stopped. 18/02/25 02:40:23 INFO dmlc.ApplicationMaster: Diagnostics., num_tasks1, finished=1, failed=0 18/02/25 02:40:23 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered. End of LogType:stderr *********************************************************************** End of LogType:prelaunch.err ****************************************************************************** Container: container_e01_1517883514475_0588_01_000002 on princeton0.field.hortonworks.com_45454 LogAggregationType: AGGREGATED =============================================================================================== LogType:directory.info LogLastModifiedTime:Sun Feb 25 02:40:15 +0000 2018 LogLength:2127 LogContents: ls -l: total 32 lrwxrwxrwx. 1 yarn hadoop 101 Feb 25 02:40 analyzeyarn.py -> /hadoop/yarn/local/usercache/root/appcache/application_1517883514475_0588/filecache/11/analyzeyarn.py -rw-r--r--. 1 yarn hadoop 94 Feb 25 02:40 container_tokens -rwx------. 1 yarn hadoop 653 Feb 25 02:40 default_container_executor_session.sh -rwx------. 1 yarn hadoop 707 Feb 25 02:40 default_container_executor.sh lrwxrwxrwx. 1 yarn hadoop 100 Feb 25 02:40 dmlc-yarn.jar -> /hadoop/yarn/local/usercache/root/appcache/application_1517883514475_0588/filecache/12/dmlc-yarn.jar -rwx------. 1 yarn hadoop 19213 Feb 25 02:40 launch_container.sh lrwxrwxrwx. 1 yarn hadoop 98 Feb 25 02:40 launcher.py -> /hadoop/yarn/local/usercache/root/appcache/application_1517883514475_0588/filecache/10/launcher.py drwx--x---. 2 yarn hadoop 6 Feb 25 02:40 tmp find -L . -maxdepth 5 -ls: 453027956 4 drwx--x--- 3 yarn hadoop 4096 Feb 25 02:40 . 470001419 0 drwx--x--- 2 yarn hadoop 6 Feb 25 02:40 ./tmp 453027961 4 -rw-r--r-- 1 yarn hadoop 94 Feb 25 02:40 ./container_tokens 453027962 4 -rw-r--r-- 1 yarn hadoop 12 Feb 25 02:40 ./.container_tokens.crc 453027963 20 -rwx------ 1 yarn hadoop 19213 Feb 25 02:40 ./launch_container.sh 453027964 4 -rw-r--r-- 1 yarn hadoop 160 Feb 25 02:40 ./.launch_container.sh.crc 453027965 4 -rwx------ 1 yarn hadoop 653 Feb 25 02:40 ./default_container_executor_session.sh 453027966 4 -rw-r--r-- 1 yarn hadoop 16 Feb 25 02:40 ./.default_container_executor_session.sh.crc 453027967 4 -rwx------ 1 yarn hadoop 707 Feb 25 02:40 ./default_container_executor.sh 453418137 4 -rw-r--r-- 1 yarn hadoop 16 Feb 25 02:40 ./.default_container_executor.sh.crc 394926208 24 -r-x------ 1 yarn hadoop 21427 Feb 25 02:40 ./dmlc-yarn.jar 361654889 4 -r-x------ 1 yarn hadoop 2765 Feb 25 02:40 ./launcher.py 378183873 4 -r-x------ 1 yarn hadoop 3815 Feb 25 02:40 ./analyzeyarn.py broken symlinks(find -L . -maxdepth 5 -type l -ls): End of LogType:directory.info ******************************************************************************* Container: container_e01_1517883514475_0588_01_000002 on princeton0.field.hortonworks.com_45454 LogAggregationType: AGGREGATED =============================================================================================== LogType:prelaunch.out LogLastModifiedTime:Sun Feb 25 02:40:15 +0000 2018 LogLength:100 LogContents: Setting up env variables Setting up job resources Copying debugging information Launching container End of LogType:prelaunch.out ****************************************************************************** Container: container_e01_1517883514475_0588_01_000002 on princeton0.field.hortonworks.com_45454 LogAggregationType: AGGREGATED =============================================================================================== LogType:stdout LogLastModifiedTime:Sun Feb 25 02:40:23 +0000 2018 LogLength:411 LogContents: {"top1pct": "67.6", "top5": "n03485794 handkerchief, hankie, hanky, hankey", "top4": "n04590129 window shade", "top3": "n03938244 pillow", "top2": "n04589890 window screen", "top1": "n02883205 bow tie, bow-tie, bowtie", "top2pct": "11.5", "imagefilename": "/opt/demo/incubator-mxnet/nanotie7.png", "top3pct": "4.5", "uuid": "mxnet_uuid_img_20180225024017", "top4pct": "2.8", "top5pct": "2.8", "runtime": "5.0"} End of LogType:stdout *********************************************************************** End of LogType:prelaunch.err ****************************************************************************** Container: container_e01_1517883514475_0588_01_000002 on princeton0.field.hortonworks.com_45454 LogAggregationType: AGGREGATED =============================================================================================== LogType:launch_container.sh LogLastModifiedTime:Sun Feb 25 02:40:15 +0000 2018 LogLength:19213 LogContents: #!/bin/bash set -o pipefail -e export PRELAUNCH_OUT="/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000002/prelaunch.out" exec >"${PRELAUNCH_OUT}" export PRELAUNCH_ERR="/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000002/prelaunch.err" exec 2>"${PRELAUNCH_ERR}" echo "Setting up env variables" export PATH="/usr/sbin:/sbin:/usr/lib/ambari-server/*:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/var/lib/ambari-agent" export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/usr/hdp/2.6.4.0-91/hadoop/conf"} export DMLC_NUM_SERVER="0" export DMLC_NODE_HOST="princeton0.field.hortonworks.com" export DMLC_WORKER_CORES="1" export DMLC_WORKER_MEMORY_MB="1024" export DMLC_SERVER_MEMORY_MB="1024" export JAVA_HOME=${JAVA_HOME:-"/usr/jdk64/jdk1.8.0_112"} export LANG="en_US.UTF-8" export NM_HOST="princeton0.field.hortonworks.com" export DMLC_JOB_ARCHIVES="" export LD_LIBRARY_PATH="" export DMLC_SERVER_CORES="2" export DMLC_NUM_ATTEMPT="0" export LOGNAME="root" export JVM_PID="$$" export DMLC_TRACKER_PORT="9091" export PWD="/hadoop/yarn/local/usercache/root/appcache/application_1517883514475_0588/container_e01_1517883514475_0588_01_000002" export LOCAL_DIRS="/hadoop/yarn/local/usercache/root/appcache/application_1517883514475_0588" export PYTHONPATH="${PYTHONPATH}:." export DMLC_ROLE="worker" export NM_HTTP_PORT="8042" export DMLC_TRACKER_URI="172.26.200.216" export LOG_DIRS="/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000002" export NM_AUX_SERVICE_mapreduce_shuffle="AAA0+gAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA= " export NM_PORT="45454" export USER="root" export HADOOP_YARN_HOME=${HADOOP_YARN_HOME:-"/usr/hdp/2.6.4.0-91/hadoop-yarn"} export CLASSPATH="${CLASSPATH}:./*:/usr/hdp/2.6.4.0-91/hadoop/conf:/usr/hdp/2.6.4.0-91/hadoop//azure-data-lake-store-sdk-2.1.4.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-annotations-2.7.3.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-annotations.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-auth-2.7.3.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-auth.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-aws-2.7.3.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-aws.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-azure-2.7.3.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-azure-datalake-2.7.3.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-azure-datalake.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-azure.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-common-2.7.3.2.6.4.0-91-tests.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-common-2.7.3.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-common-tests.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-common.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-nfs-2.7.3.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop//hadoop-nfs.jar:/usr/hdp/2.6.4.0-91/hadoop//:/usr/hdp/2.6.4.0-91/hadoop/lib//jackson-core-asl-1.9.13.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//ranger-hdfs-plugin-shim-0.7.0.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//nimbus-jose-jwt-3.9.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//ranger-plugin-classloader-0.7.0.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jackson-databind-2.2.3.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//ranger-yarn-plugin-shim-0.7.0.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//activation-1.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jetty-6.1.26.hwx.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//apacheds-i18n-2.0.0-M15.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jackson-jaxrs-1.9.13.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//apacheds-kerberos-codec-2.0.0-M15.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jetty-sslengine-6.1.26.hwx.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//api-asn1-api-1.0.0-M20.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jetty-util-6.1.26.hwx.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//api-util-1.0.0-M20.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//asm-3.2.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//avro-1.7.4.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//joda-time-2.9.4.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//aws-java-sdk-core-1.10.6.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jsch-0.1.54.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//aws-java-sdk-kms-1.10.6.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//json-smart-1.1.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//aws-java-sdk-s3-1.10.6.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jackson-mapper-asl-1.9.13.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//azure-keyvault-core-0.8.0.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jsp-api-2.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//azure-storage-5.4.0.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jsr305-3.0.0.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-beanutils-1.7.0.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jersey-core-1.9.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-beanutils-core-1.8.0.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-cli-1.2.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//junit-4.11.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-codec-1.4.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jersey-json-1.9.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-collections-3.2.2.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//log4j-1.2.17.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-compress-1.4.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jersey-server-1.9.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-configuration-1.6.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//mockito-all-1.8.5.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-digester-1.8.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-io-2.4.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-lang-2.6.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//netty-3.6.2.Final.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-lang3-3.4.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//paranamer-2.3.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-logging-1.1.3.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//protobuf-java-2.5.0.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-math3-3.1.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//commons-net-3.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//servlet-api-2.5.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//curator-client-2.7.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//slf4j-api-1.7.10.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//curator-framework-2.7.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//slf4j-log4j12-1.7.10.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//curator-recipes-2.7.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//gson-2.2.4.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//guava-11.0.2.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//snappy-java-1.0.4.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//hamcrest-core-1.3.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jets3t-0.9.0.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//htrace-core-3.1.0-incubating.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//httpclient-4.5.2.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//httpcore-4.4.4.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jettison-1.1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jackson-annotations-2.2.3.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//stax-api-1.0-2.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jackson-core-2.2.3.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jackson-xc-1.9.13.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//xmlenc-0.52.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//java-xmlbuilder-0.4.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jaxb-api-2.2.2.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//zookeeper-3.4.6.2.6.4.0-91.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jaxb-impl-2.2.3-1.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//xz-1.0.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//jcip-annotations-1.0.jar:/usr/hdp/2.6.4.0-91/hadoop/lib//:/usr/hdp/current/hadoop-hdfs-client//hadoop-hdfs-2.7.3.2.6.4.0-91-tests.jar:/usr/hdp/current/hadoop-hdfs-client//hadoop-hdfs-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-hdfs-client//hadoop-hdfs-nfs-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-hdfs-client//hadoop-hdfs-nfs.jar:/usr/hdp/current/hadoop-hdfs-client//hadoop-hdfs-tests.jar:/usr/hdp/current/hadoop-hdfs-client//hadoop-hdfs.jar:/usr/hdp/current/hadoop-hdfs-client//:/usr/hdp/current/hadoop-hdfs-client/lib//asm-3.2.jar:/usr/hdp/current/hadoop-hdfs-client/lib//commons-cli-1.2.jar:/usr/hdp/current/hadoop-hdfs-client/lib//commons-codec-1.4.jar:/usr/hdp/current/hadoop-hdfs-client/lib//commons-daemon-1.0.13.jar:/usr/hdp/current/hadoop-hdfs-client/lib//commons-io-2.4.jar:/usr/hdp/current/hadoop-hdfs-client/lib//commons-lang-2.6.jar:/usr/hdp/current/hadoop-hdfs-client/lib//commons-logging-1.1.3.jar:/usr/hdp/current/hadoop-hdfs-client/lib//guava-11.0.2.jar:/usr/hdp/current/hadoop-hdfs-client/lib//htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jackson-annotations-2.2.3.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jackson-core-2.2.3.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jackson-core-asl-1.9.13.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jackson-databind-2.2.3.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jackson-mapper-asl-1.9.13.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jersey-core-1.9.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jersey-server-1.9.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jetty-6.1.26.hwx.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jetty-util-6.1.26.hwx.jar:/usr/hdp/current/hadoop-hdfs-client/lib//jsr305-3.0.0.jar:/usr/hdp/current/hadoop-hdfs-client/lib//leveldbjni-all-1.8.jar:/usr/hdp/current/hadoop-hdfs-client/lib//log4j-1.2.17.jar:/usr/hdp/current/hadoop-hdfs-client/lib//netty-3.6.2.Final.jar:/usr/hdp/current/hadoop-hdfs-client/lib//netty-all-4.0.52.Final.jar:/usr/hdp/current/hadoop-hdfs-client/lib//okhttp-2.4.0.jar:/usr/hdp/current/hadoop-hdfs-client/lib//okio-1.4.0.jar:/usr/hdp/current/hadoop-hdfs-client/lib//protobuf-java-2.5.0.jar:/usr/hdp/current/hadoop-hdfs-client/lib//servlet-api-2.5.jar:/usr/hdp/current/hadoop-hdfs-client/lib//xercesImpl-2.9.1.jar:/usr/hdp/current/hadoop-hdfs-client/lib//xml-apis-1.3.04.jar:/usr/hdp/current/hadoop-hdfs-client/lib//xmlenc-0.52.jar:/usr/hdp/current/hadoop-hdfs-client/lib//:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-api-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-api.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-applications-distributedshell-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-applications-distributedshell.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-applications-unmanaged-am-launcher-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-applications-unmanaged-am-launcher.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-client-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-client.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-common-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-common.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-registry-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-registry.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-applicationhistoryservice-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-applicationhistoryservice.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-common-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-common.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-nodemanager-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-nodemanager.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-resourcemanager-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-resourcemanager.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-sharedcachemanager-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-sharedcachemanager.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-tests-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-tests.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-timeline-pluginstorage-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-timeline-pluginstorage.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-web-proxy-2.7.3.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client//hadoop-yarn-server-web-proxy.jar:/usr/hdp/current/hadoop-yarn-client//:/usr/hdp/current/hadoop-yarn-client/lib//activation-1.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//aopalliance-1.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//jersey-guice-1.9.jar:/usr/hdp/current/hadoop-yarn-client/lib//apacheds-i18n-2.0.0-M15.jar:/usr/hdp/current/hadoop-yarn-client/lib//javassist-3.18.1-GA.jar:/usr/hdp/current/hadoop-yarn-client/lib//apacheds-kerberos-codec-2.0.0-M15.jar:/usr/hdp/current/hadoop-yarn-client/lib//jersey-json-1.9.jar:/usr/hdp/current/hadoop-yarn-client/lib//api-asn1-api-1.0.0-M20.jar:/usr/hdp/current/hadoop-yarn-client/lib//jersey-server-1.9.jar:/usr/hdp/current/hadoop-yarn-client/lib//api-util-1.0.0-M20.jar:/usr/hdp/current/hadoop-yarn-client/lib//asm-3.2.jar:/usr/hdp/current/hadoop-yarn-client/lib//avro-1.7.4.jar:/usr/hdp/current/hadoop-yarn-client/lib//javax.inject-1.jar:/usr/hdp/current/hadoop-yarn-client/lib//azure-keyvault-core-0.8.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//jets3t-0.9.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//azure-storage-5.4.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//log4j-1.2.17.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-beanutils-1.7.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//jaxb-api-2.2.2.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-beanutils-core-1.8.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-cli-1.2.jar:/usr/hdp/current/hadoop-yarn-client/lib//leveldbjni-all-1.8.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-codec-1.4.jar:/usr/hdp/current/hadoop-yarn-client/lib//jaxb-impl-2.2.3-1.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-collections-3.2.2.jar:/usr/hdp/current/hadoop-yarn-client/lib//metrics-core-3.0.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-compress-1.4.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//json-smart-1.1.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-configuration-1.6.jar:/usr/hdp/current/hadoop-yarn-client/lib//netty-3.6.2.Final.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-digester-1.8.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-io-2.4.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-lang-2.6.jar:/usr/hdp/current/hadoop-yarn-client/lib//nimbus-jose-jwt-3.9.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-lang3-3.4.jar:/usr/hdp/current/hadoop-yarn-client/lib//objenesis-2.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-logging-1.1.3.jar:/usr/hdp/current/hadoop-yarn-client/lib//paranamer-2.3.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-math3-3.1.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//commons-net-3.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//protobuf-java-2.5.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//curator-client-2.7.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//servlet-api-2.5.jar:/usr/hdp/current/hadoop-yarn-client/lib//curator-framework-2.7.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//snappy-java-1.0.4.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//curator-recipes-2.7.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//fst-2.24.jar:/usr/hdp/current/hadoop-yarn-client/lib//gson-2.2.4.jar:/usr/hdp/current/hadoop-yarn-client/lib//guava-11.0.2.jar:/usr/hdp/current/hadoop-yarn-client/lib//guice-3.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//stax-api-1.0-2.jar:/usr/hdp/current/hadoop-yarn-client/lib//guice-servlet-3.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//jcip-annotations-1.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//htrace-core-3.1.0-incubating.jar:/usr/hdp/current/hadoop-yarn-client/lib//httpclient-4.5.2.jar:/usr/hdp/current/hadoop-yarn-client/lib//httpcore-4.4.4.jar:/usr/hdp/current/hadoop-yarn-client/lib//jersey-client-1.9.jar:/usr/hdp/current/hadoop-yarn-client/lib//jackson-annotations-2.2.3.jar:/usr/hdp/current/hadoop-yarn-client/lib//xmlenc-0.52.jar:/usr/hdp/current/hadoop-yarn-client/lib//jackson-core-2.2.3.jar:/usr/hdp/current/hadoop-yarn-client/lib//xz-1.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//jackson-core-asl-1.9.13.jar:/usr/hdp/current/hadoop-yarn-client/lib//zookeeper-3.4.6.2.6.4.0-91.jar:/usr/hdp/current/hadoop-yarn-client/lib//jackson-databind-2.2.3.jar:/usr/hdp/current/hadoop-yarn-client/lib//zookeeper-3.4.6.2.6.4.0-91-tests.jar:/usr/hdp/current/hadoop-yarn-client/lib//jackson-jaxrs-1.9.13.jar:/usr/hdp/current/hadoop-yarn-client/lib//jersey-core-1.9.jar:/usr/hdp/current/hadoop-yarn-client/lib//jackson-mapper-asl-1.9.13.jar:/usr/hdp/current/hadoop-yarn-client/lib//jackson-xc-1.9.13.jar:/usr/hdp/current/hadoop-yarn-client/lib//java-xmlbuilder-0.4.jar:/usr/hdp/current/hadoop-yarn-client/lib//jettison-1.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//jetty-6.1.26.hwx.jar:/usr/hdp/current/hadoop-yarn-client/lib//jsp-api-2.1.jar:/usr/hdp/current/hadoop-yarn-client/lib//jetty-sslengine-6.1.26.hwx.jar:/usr/hdp/current/hadoop-yarn-client/lib//jetty-util-6.1.26.hwx.jar:/usr/hdp/current/hadoop-yarn-client/lib//jsch-0.1.54.jar:/usr/hdp/current/hadoop-yarn-client/lib//jsr305-3.0.0.jar:/usr/hdp/current/hadoop-yarn-client/lib//:/usr/hdp/current/ext/hadoop//" export DMLC_NUM_WORKER="1" export DMLC_JOB_CLUSTER="yarn" export HADOOP_TOKEN_FILE_LOCATION="/hadoop/yarn/local/usercache/root/appcache/application_1517883514475_0588/container_e01_1517883514475_0588_01_000002/container_tokens" export NM_AUX_SERVICE_spark_shuffle="" export LOCAL_USER_DIRS="/hadoop/yarn/local/usercache/root/" export HADOOP_HOME="/usr/hdp/2.6.4.0-91/hadoop" export DMLC_TASK_ID="0" export HOME="/home/" export NM_AUX_SERVICE_spark2_shuffle="" export CONTAINER_ID="container_e01_1517883514475_0588_01_000002" export MALLOC_ARENA_MAX="4" echo "Setting up job resources" ln -sf "/hadoop/yarn/local/usercache/root/appcache/application_1517883514475_0588/filecache/12/dmlc-yarn.jar" "dmlc-yarn.jar" ln -sf "/hadoop/yarn/local/usercache/root/appcache/application_1517883514475_0588/filecache/10/launcher.py" "launcher.py" ln -sf "/hadoop/yarn/local/usercache/root/appcache/application_1517883514475_0588/filecache/11/analyzeyarn.py" "analyzeyarn.py" echo "Copying debugging information" # Creating copy of launch script cp "launch_container.sh" "/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000002/launch_container.sh" chmod 640 "/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000002/launch_container.sh" # Determining directory contents echo "ls -l:" 1>"/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000002/directory.info" ls -l 1>>"/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000002/directory.info" echo "find -L . -maxdepth 5 -ls:" 1>>"/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000002/directory.info" find -L . -maxdepth 5 -ls 1>>"/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000002/directory.info" echo "broken symlinks(find -L . -maxdepth 5 -type l -ls):" 1>>"/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000002/directory.info" find -L . -maxdepth 5 -type l -ls 1>>"/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000002/directory.info" echo "Launching container" exec /bin/bash -c "./launcher.py ./analyzeyarn.py 1>/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000002/stdout 2>/hadoop/yarn/log/application_1517883514475_0588/container_e01_1517883514475_0588_01_000002/stderr" End of LogType:launch_container.sh ************************************************************************************ Container: container_e01_1517883514475_0588_01_000002 on princeton0.field.hortonworks.com_45454 LogAggregationType: AGGREGATED =============================================================================================== LogType:stderr LogLastModifiedTime:Sun Feb 25 02:40:17 +0000 2018 LogLength:393 LogContents: [02:40:17] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v0.8.0. Attempting to upgrade... [02:40:17] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded! /usr/lib/python2.7/site-packages/mxnet/module/base_module.py:65: UserWarning: Data provided by label_shapes don't match names specified by label_names ([] vs. ['softmax_label']) warnings.warn(msg) End of LogType:stderr *********************************************************************** [root@princeton0 demo]# cat logs/mxnet.log 2018-02-25 01:49:11,667 INFO start listen on 172.26.200.216:9091 2018-02-25 01:54:05,445 INFO start listen on 172.26.200.216:9091 2018-02-25 02:17:52,685 INFO start listen on 172.26.200.216:9091 2018-02-25 02:29:46,873 INFO start listen on 172.26.200.216:9091 2018-02-25 02:29:48,076 DEBUG Submit job with 1 workers and 0 servers 2018-02-25 02:29:48,078 DEBUG java -cp /usr/hdp/2.6.4.0-91/hadoop/conf:/usr/hdp/2.6.4.0-91/hadoop/lib/*:/usr/hdp/2.6.4.0-91/hadoop/.//*:/usr/hdp/2.6.4.0-91/hadoop-hdfs/./:/usr/hdp/2.6.4.0-91/hadoop-hdfs/lib/*:/usr/hdp/2.6.4.0-91/hadoop-hdfs/.//*:/usr/hdp/2.6.4.0-91/hadoop-yarn/lib/*:/usr/hdp/2.6.4.0-91/hadoop-yarn/.//*:/usr/hdp/2.6.4.0-91/hadoop-mapreduce/lib/*:/usr/hdp/2.6.4.0-91/hadoop-mapreduce/.//*::mysql-connector-java.jar:/usr/hdp/2.6.4.0-91/tez/*:/usr/hdp/2.6.4.0-91/tez/lib/*:/usr/hdp/2.6.4.0-91/tez/conf:/opt/demo/dmlc-core/tracker/dmlc_tracker/../yarn/dmlc-yarn.jar org.apache.hadoop.yarn.dmlc.Client -file /opt/demo/dmlc-core/tracker/dmlc_tracker/../yarn/dmlc-yarn.jar -file /opt/demo/dmlc-core/tracker/dmlc_tracker/launcher.py -file /opt/demo/incubator-mxnet/analyzeyarn.py -jobname DMLC[nworker=1]:analyzeyarn.py -tempdir /tmp -queue default ./launcher.py ./analyzeyarn.py 2018-02-25 02:33:24,463 INFO start listen on 172.26.200.216:9091 2018-02-25 02:33:25,633 DEBUG Submit job with 1 workers and 0 servers 2018-02-25 02:33:25,634 DEBUG java -cp /usr/hdp/2.6.4.0-91/hadoop/conf:/usr/hdp/2.6.4.0-91/hadoop/lib/*:/usr/hdp/2.6.4.0-91/hadoop/.//*:/usr/hdp/2.6.4.0-91/hadoop-hdfs/./:/usr/hdp/2.6.4.0-91/hadoop-hdfs/lib/*:/usr/hdp/2.6.4.0-91/hadoop-hdfs/.//*:/usr/hdp/2.6.4.0-91/hadoop-yarn/lib/*:/usr/hdp/2.6.4.0-91/hadoop-yarn/.//*:/usr/hdp/2.6.4.0-91/hadoop-mapreduce/lib/*:/usr/hdp/2.6.4.0-91/hadoop-mapreduce/.//*::mysql-connector-java.jar:/usr/hdp/2.6.4.0-91/tez/*:/usr/hdp/2.6.4.0-91/tez/lib/*:/usr/hdp/2.6.4.0-91/tez/conf:/opt/demo/dmlc-core/tracker/dmlc_tracker/../yarn/dmlc-yarn.jar org.apache.hadoop.yarn.dmlc.Client -file /opt/demo/dmlc-core/tracker/dmlc_tracker/../yarn/dmlc-yarn.jar -file /opt/demo/dmlc-core/tracker/dmlc_tracker/launcher.py -file /opt/demo/incubator-mxnet/analyzeyarn.py -jobname DMLC[nworker=1]:analyzeyarn.py -tempdir /tmp -queue default ./launcher.py ./analyzeyarn.py 2018-02-25 02:40:00,993 INFO start listen on 172.26.200.216:9091 2018-02-25 02:40:02,067 DEBUG Submit job with 1 workers and 0 servers 2018-02-25 02:40:02,068 DEBUG java -cp /usr/hdp/2.6.4.0-91/hadoop/conf:/usr/hdp/2.6.4.0-91/hadoop/lib/*:/usr/hdp/2.6.4.0-91/hadoop/.//*:/usr/hdp/2.6.4.0-91/hadoop-hdfs/./:/usr/hdp/2.6.4.0-91/hadoop-hdfs/lib/*:/usr/hdp/2.6.4.0-91/hadoop-hdfs/.//*:/usr/hdp/2.6.4.0-91/hadoop-yarn/lib/*:/usr/hdp/2.6.4.0-91/hadoop-yarn/.//*:/usr/hdp/2.6.4.0-91/hadoop-mapreduce/lib/*:/usr/hdp/2.6.4.0-91/hadoop-mapreduce/.//*::mysql-connector-java.jar:/usr/hdp/2.6.4.0-91/tez/*:/usr/hdp/2.6.4.0-91/tez/lib/*:/usr/hdp/2.6.4.0-91/tez/conf:/opt/demo/dmlc-core/tracker/dmlc_tracker/../yarn/dmlc-yarn.jar org.apache.hadoop.yarn.dmlc.Client -file /opt/demo/dmlc-core/tracker/dmlc_tracker/../yarn/dmlc-yarn.jar -file /opt/demo/dmlc-core/tracker/dmlc_tracker/launcher.py -file /opt/demo/incubator-mxnet/analyzeyarn.py -jobname DMLC[nworker=1]:analyzeyarn.py -tempdir /tmp -queue default ./launcher.py ./analyzeyarn.py The Run Logs 2018-02-26 18:51:04,613 INFO start listen on 172.26.200.216:9091 2018-02-26 18:51:51,143 INFO start listen on 172.26.200.216:9091 2018-02-26 18:51:52,336 DEBUG Submit job with 1 workers and 0 servers 2018-02-26 18:51:52,337 DEBUG /usr/jdk64/jdk1.8.0_112/bin/java -cp /usr/hdp/2.6.4.0-91/hadoop/conf:/usr/hdp/2.6.4.0-91/hadoop/lib/*:/usr/hdp/2.6.4.0-91/hadoop/.//*:/usr/hdp/2.6.4.0-91/hadoop-hdfs/./:/usr/hdp/2.6.4.0-91/hadoop-hdfs/lib/*:/usr/hdp/2.6.4.0-91/hadoop-hdfs/.//*:/usr/hdp/2.6.4.0-91/hadoop-yarn/lib/*:/usr/hdp/2.6.4.0-91/hadoop-yarn/.//*:/usr/hdp/2.6.4.0-91/hadoop-mapreduce/lib/*:/usr/hdp/2.6.4.0-91/hadoop-mapreduce/.//*::mysql-connector-java.jar:/usr/hdp/2.6.4.0-91/tez/*:/usr/hdp/2.6.4.0-91/tez/lib/*:/usr/hdp/2.6.4.0-91/tez/conf:/opt/demo/dmlc-core/tracker/dmlc_tracker/../yarn/dmlc-yarn.jar org.apache.hadoop.yarn.dmlc.Client -file /opt/demo/dmlc-core/tracker/dmlc_tracker/../yarn/dmlc-yarn.jar -file /opt/demo/dmlc-core/tracker/dmlc_tracker/launcher.py -file /opt/demo/incubator-mxnet/analyzeyarn.py -jobname DMLC[nworker=1]:analyzeyarn.py -tempdir /tmp -queue default ./launcher.py ./analyzeyarn.py Below are some YARN UI screens to show you the application run results. This Python script will look familiar as it's the one we have been using in the series. I have added the external inception predict functions and some Pydoop commands to write data to HDFS into one mega python script to make this easier to run in YARN. The example output in HDFS hdfs dfs -ls /mxnetyarn Found 5 items -rw-r--r-- 3 root hdfs 416 2018-02-26 18:45 /mxnetyarn/mxnet_uuid_json_20180226184514.json -rw-r--r-- 3 root hdfs 417 2018-02-26 18:46 /mxnetyarn/mxnet_uuid_json_20180226184643.json -rw-r--r-- 3 root hdfs 417 2018-02-26 18:47 /mxnetyarn/mxnet_uuid_json_20180226184707.json -rw-r--r-- 3 yarn hdfs 417 2018-02-26 18:52 /mxnetyarn/mxnet_uuid_json_20180226185209.json -rw-r--r-- 3 yarn hdfs 417 2018-02-26 22:08 /mxnetyarn/mxnet_uuid_json_20180226220806.json We could ingest these HDFS files with Apache NiFi and convert them to Hive tables or you could do that directly with Apache Hive. Source Code: https://github.com/tspannhw/nifi-mxnet-yarn https://github.com/tspannhw/ApacheBigData101 Python Script #!/bin/python # fork of previous ones forked from Apache MXNet examples # https://github.com/tspannhw/mxnet_rpi/blob/master/analyze.py import pydoop.hdfs as hdfs import time import sys import datetime import subprocess import sys import os import datetime import traceback import math import random, string import base64 import json from time import gmtime, strftime import mxnet as mx import numpy as np import math import random, string import time from time import gmtime, strftime # forked from Apache MXNet example with minor changes for osx import time import mxnet as mx import numpy as np import cv2, os, urllib from collections import namedtuple Batch = namedtuple('Batch', ['data']) # Load the symbols for the networks with open('/opt/demo/incubator-mxnet/synset.txt', 'r') as f: synsets = [l.rstrip() for l in f] # Load the network parameters sym, arg_params, aux_params = mx.model.load_checkpoint('/opt/demo/incubator-mxnet/Inception-BN', 0) # Load the network into an MXNet module and bind the corresponding parameters mod = mx.mod.Module(symbol=sym, context=mx.cpu()) mod.bind(for_training=False, data_shapes=[('data', (1,3,224,224))]) mod.set_params(arg_params, aux_params) ''' Function to predict objects by giving the model a pointer to an image file and running a forward pass through the model. inputs: filename = jpeg file of image to classify objects in mod = the module object representing the loaded model synsets = the list of symbols representing the model N = Optional parameter denoting how many predictions to return (default is top 5) outputs: python list of top N predicted objects and corresponding probabilities ''' def predict(filename, mod, synsets, N=5): tic = time.time() img = cv2.cvtColor(cv2.imread(filename), cv2.COLOR_BGR2RGB) if img is None: return None img = cv2.resize(img, (224, 224)) img = np.swapaxes(img, 0, 2) img = np.swapaxes(img, 1, 2) img = img[np.newaxis, :] toc = time.time() mod.forward(Batch([mx.nd.array(img)])) prob = mod.get_outputs()[0].asnumpy() prob = np.squeeze(prob) topN = [] a = np.argsort(prob)[::-1] for i in a[0:N]: topN.append((prob[i], synsets[i])) return topN # Code to download an image from the internet and run a prediction on it def predict_from_url(url, N=5): filename = url.split("/")[-1] urllib.urlretrieve(url, filename) img = cv2.imread(filename) if img is None: print( "Failed to download" ) else: return predict(filename, mod, synsets, N) # Code to predict on a local file def predict_from_local_file(filename, N=5): return predict(filename, mod, synsets, N) start = time.time() packet_size=3000 # Create unique image name uniqueid = 'mxnet_uuid_{0}_{1}.json'.format('json',strftime("%Y%m%d%H%M%S",gmtime())) filename = '/opt/demo/incubator-mxnet/nanotie7.png' topn = [] # Run inception prediction on image try: topn = predict_from_local_file(filename, N=5) except: print("Error") errorcondition = "true" try: # 5 MXNET Analysis top1 = str(topn[0][1]) top1pct = str(round(topn[0][0],3) * 100) top2 = str(topn[1][1]) top2pct = str(round(topn[1][0],3) * 100) top3 = str(topn[2][1]) top3pct = str(round(topn[2][0],3) * 100) top4 = str(topn[3][1]) top4pct = str(round(topn[3][0],3) * 100) top5 = str(topn[4][1]) top5pct = str(round(topn[4][0],3) * 100) end = time.time() row = { 'uuid': uniqueid, 'top1pct': top1pct, 'top1': top1, 'top2pct': top2pct, 'top2': top2,'top3pct': top3pct, 'top3': top3,'top4pct': top4pct,'top4': top4, 'top5pct': top5pct,'top5': top5, 'imagefilename': filename, 'runtime': str(round(end - start)) } json_string = json.dumps(row) print (json_string) hdfs.hdfs(host="princeton0.field.hortonworks.com", port=50090, user="root") hdfs.dump(json_string + "\n", "/mxnetyarn/" + uniqueid, mode="at") fh = open("/opt/demo/logs/mxnetyarn.log", "a") fh.writelines('{0}\n'.format(json_string)) fh.close except: print("{\"message\": \"Failed to run\"}") We are running with this image: Which is a pretty standard picture of a cat with a necktie holding a Raspberry Pi with Rainbow HAT on it. See: https://raw.githubusercontent.com/tspannhw/ApacheBigData101/master/nanotie7.png References: https://github.com/dmlc/dmlc-core/tree/master/tracker/yarn https://community.hortonworks.com/articles/174227/apache-deep-learning-101-using-apache-mxnet-on-an.html https://www.slideshare.net/AmazonWebServices/deep-learning-for-developers-86885654 https://github.com/dmlc/dmlc-core/tree/master/tracker/yarn https://mxnet.incubator.apache.org/tutorials/embedded/wine_detector.html https://github.com/apache/incubator-mxnet/tree/master/example/image-classification https://mxnet.incubator.apache.org/how_to/cloud.html http://dmlc-core.readthedocs.io/en/latest/ https://community.hortonworks.com/articles/42995/yarn-application-monitoring-with-nifi.html

TimothySpann · ‎02-23-2018

This is for people preparing to attend my talk on Deep Learning at DataWorks Summit Berling 2018 (https://dataworkssummit.com/berlin-2018/#agenda) on Thursday April 19, 2018 at 11:50AM Berlin time. This is for running Apache MXNet on an HDF 3.1 node with Centos 7. Let's get this installed! git clone https://github.com/apache/incubator-mxnet.git The installation instructions at Apache MXNet's website (http://mxnet.incubator.apache.org/install/index.html) are amazing. Pick your platform and your style. I am doing this the simplest way on Linux path. We need to install OpenCV to handle images in Python. So we install that and all the build tools that OpenCV requires to build it and Apache MXNet. HDF 3.1 / Centos 7 Setup sudo yum groupinstall 'Development Tools' -y sudo yum install cmake git pkgconfig -y sudo yum install libpng-devel libjpeg-turbo-devel jasper-devel openexr-devel libtiff-devel libwebp-devel -y sudo yum install libdc1394-devel libv4l-devel gstreamer-plugins-base-devel -y sudo yum install gtk2-devel -y sudo yum install tbb-devel eigen3-devel -y pip install numpy cd ~ git clone https://github.com/Itseez/opencv.git cd opencv git checkout 3.1.0 git clone https://github.com/Itseez/opencv_contrib.git cd opencv_contrib git checkout 3.1.0 cd ~/opencv mkdir build cd build cmake -D CMAKE_BUILD_TYPE=RELEASE -D CMAKE_INSTALL_PREFIX=/usr/local -D OPENCV_EXTRA_MODULES_PATH=~/opencv_contrib/modules -D INSTALL_C_EXAMPLES=OFF -D INSTALL_PYTHON_EXAMPLES=ON -D BUILD_EXAMPLES=ON -D BUILD_OPENCV_PYTHON2=ON .. sudo make sudo make install sudo ldconfig Local Centos 7 Run Script python -W ignore analyzex.py $1 Python https://github.com/tspannhw/ApacheBigData101/blob/master/analyzex.py See Part 1: https://community.hortonworks.com/articles/171960/using-apache-mxnet-on-an-apache-nifi-15-instance-w.html Apache NiFi Flow This first flow retrieves images from the picsum.photos API, stores it locally and then runs some basic processing. The first branch extracts all the metadata we can. The second branch will call our example Inception Apache MXNet Python script for image recognition. The script returns a JSON file that we will process with the same processing code that is used by the local version of this program. Once we funnel that out our process group, we send it to the MXNet processing group which will convert the JSON to Apache AVRO and then to Apache ORC for storage in HDFS to be used as an external Apache Hive table. Our Schema hosted in Hortonworks Schema Registry Examining The Picture with ExtractMedia... To Execute Apache MXNet Installed on HDF Node An Example Image Loaded From the API Exploring the data with Apache Hive SQL in Apache Zeppelin on HDP 2.6.4 An Example Unsplash Image Apache MXNet Caption: Lakeshore Resources: https://github.com/tspannhw/ApacheDeepLearning101 Images REST API Provided by PicSum (Digital Ocean + Beluga CDN) https://picsum.photos/600/800/?random https://picsum.photos/ https://belugacdn.com/?ref=picsum.photos https://www.digitalocean.com/?ref=picsum.photos Images Provided By Unsplash https://unsplash.com/ https://unsplash.com/license

TimothySpann · ‎02-22-2018

@balalaika ok, with no code changes here is the Structured version. Structured Streaming is basically just GA in Spark 2.2 which is HDP 2.6.4 and above. It works fine, a little different from the old style. They will probably keep the old one for until 2.5 or maybe 3.0. Both styles are nice. Another option is to use Apache Beam or Streaming Analytics Manager. https://community.hortonworks.com/content/kbentry/174105/hdp-264-hdf-31-apache-spark-structured-streaming-i.html

TimothySpann · ‎02-21-2018

Apache Spark 2.2.0 with Scala 2.11.8 with Java 1.8.0_112 on HDP 2.6.4 called from HDF 3.1 with Apache NiFi 1.5. This is a follow up to: https://community.hortonworks.com/articles/173818/hdp-264-hdf-31-apache-spark-streaming-integration.html and https://community.hortonworks.com/articles/155326/monitoring-energy-usage-utilizing-apache-nifi-pyth.html We are using the same Apache NiFi flow to send messages to Apache Kafka. What is nice you could have the Structured Streaming version, non-structured version and others listening to the same Topic and same messages sent by Apache NiFi. When we start, no data yet. We quickly get a ton of data By Default A Kafka Cluster is 3 Nodes. Replication Factor of 3 is good then. I have one node. I had to change this. Tons of warnings in the /usr/hdf/current/kafka-broker/logs directory. The simplest Apache Spark client is one run in the shell: /usr/hdp/current/spark2-client/bin/spark-shell --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.0,org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 The code is a simple fork of code from this excellent highly recommended tutorial: https://github.com/jaceklaskowski/spark-structured-streaming-book/blob/master/spark-sql-streaming-KafkaSource.adoc If you are submitting this job and not running in a shell, add: // In the end, stop the streaming query sq.awaitTermination val records = spark. readStream. format("kafka"). option("subscribe", "smartPlug2"). option("kafka.bootstrap.servers", "mykafkabroker:6667").load records.printSchema val result = records. select( $"key" cast "string", $"value" cast "string", $"topic", $"partition", $"offset") import org.apache.spark.sql.streaming.{OutputMode, Trigger} import scala.concurrent.duration._ val sq = result. writeStream. format("console"). option("truncate", false). trigger(Trigger.ProcessingTime(10.seconds)). outputMode(OutputMode.Append). queryName("scalastrstrclient"). start sq.status Example Run Spark context Web UI available at http://myipiscool:4045 Spark context available as 'sc' (master = local[*], app id = local-1519248053841). Spark session available as 'spark'. Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.2.0.2.6.4.0-91 /_/ Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112) Type in expressions to have them evaluated. Type :help for more information. scala> val records = spark. | readStream. | format("kafka"). | option("subscribe", "smartPlug2"). | option("kafka.bootstrap.servers", "server:6667").load records: org.apache.spark.sql.DataFrame = [key: binary, value: binary ... 5 more fields] scala> records.printSchema root |-- key: binary (nullable = true) |-- value: binary (nullable = true) |-- topic: string (nullable = true) |-- partition: integer (nullable = true) |-- offset: long (nullable = true) |-- timestamp: timestamp (nullable = true) |-- timestampType: integer (nullable = true) scala> val result = records. | select( | $"key" cast "string", | $"value" cast "string", | $"topic", | $"partition", | $"offset") result: org.apache.spark.sql.DataFrame = [key: string, value: string ... 3 more fields] scala> import org.apache.spark.sql.streaming.{OutputMode, Trigger} import org.apache.spark.sql.streaming.{OutputMode, Trigger} scala> import scala.concurrent.duration._ import scala.concurrent.duration._ scala> val sq = result. | writeStream. | format("console"). | option("truncate", false). | trigger(Trigger.ProcessingTime(10.seconds)). | outputMode(OutputMode.Append). | queryName("scalastrstrclient"). | start sq: org.apache.spark.sql.streaming.StreamingQuery = org.apache.spark.sql.execution.streaming.StreamingQueryWrapper@3638a852 scala> sq.status res1: org.apache.spark.sql.streaming.StreamingQueryStatus = { "message" : "Getting offsets from KafkaSource[Subscribe[smartPlug2]]", "isDataAvailable" : false, "isTriggerActive" : true } scala> ------------------------------------------- Batch: 0 ------------------------------------------- +---+-----+-----+---------+------+ |key|value|topic|partition|offset| +---+-----+-----+---------+------+ +---+-----+-----+---------+------+ ------------------------------------------- Batch: 1 ------------------------------------------- +-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+------+ |key |value |topic |partition|offset| +-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+------+ |02/21/2018 16:22:00|{"day1":1.204,"day2":1.006,"day3":1.257,"day4":1.053,"day5":1.597,"day6":1.642,"day7":1.439,"day8":1.178,"day9":1.259,"day10":0.995,"day11":0.569,"day12":1.287,"day13":1.371,"day14":1.404,"day15":1.588,"day16":1.426,"day17":1.707,"day18":1.153,"day19":1.155,"day20":1.732,"day21":1.333,"day22":1.497,"day23":1.151,"day24":1.227,"day25":1.387,"day26":1.138,"day27":1.204,"day28":1.401,"day29":1.288,"day30":1.439,"day31":0.126,"sw_ver":"1.1.1 Build 160725 Rel.164033","hw_ver":"1.0","mac":"50:C7:BF:B1:95:D5","type":"IOT.SMARTPLUGSWITCH","hwId":"7777","fwId":"777","oemId":"FFF22CFF774A0B89F7624BFC6F50D5DE","dev_name":"Wi-Fi Smart Plug With Energy Monitoring","model":"HS110(US)","deviceId":"777","alias":"Tim Spann's MiniFi Controller SmartPlug - Desk1","icon_hash":"","relay_state":1,"on_time":452287,"active_mode":"schedule","feature":"TIM:ENE","updating":0,"rssi":-33,"led_off":0,"latitude":41,"longitude":-77,"index":18,"zone_str":"(UTC-05:00) Eastern Daylight Time (US & Canada)","tz_str":"EST5EDT,M3.2.0,M11.1.0","dst_offset":60,"month12":null,"current":0.888908,"voltage":118.880856,"power":103.141828,"total":8.19,"time":"02/21/2018 16:22:00","ledon":true,"systemtime":"02/21/2018 16:22:00"}|smartPlug2|0 |14 | +-------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+------+ Example JSON Data |02/21/2018 16:23:58|{"day1":1.204,"day2":1.006,"day3":1.257,"day4":1.053,"day5":1.597,"day6":1.642,"day7":1.439,"day8":1.178,"day9":1.259,"day10":0.995,"day11":0.569,"day12":1.287,"day13":1.371,"day14":1.404,"day15":1.588,"day16":1.426,"day17":1.707,"day18":1.153,"day19":1.155,"day20":1.732,"day21":1.337,"day22":1.497,"day23":1.151,"day24":1.227,"day25":1.387,"day26":1.138,"day27":1.204,"day28":1.401,"day29":1.288,"day30":1.439,"day31":0.126,"sw_ver":"1.1.1 Build 160725 Rel.164033","hw_ver":"1.0","mac":"50:C7:88:95:D5","type":"IOT.SMARTPLUGSWITCH","hwId":"8888","fwId":"6767","oemId":"6767","dev_name":"Wi-Fi Smart Plug With Energy Monitoring","model":"HS110(US)","deviceId":"7676","alias":"Tim Spann's MiniFi Controller SmartPlug - Desk1","icon_hash":"","relay_state":1,"on_time":452404,"active_mode":"schedule","feature":"TIM:ENE","updating":0,"rssi":-33,"led_off":0,"latitude":41.3241234,"longitude":-74.1234234,"index":18,"zone_str":"(UTC-05:00) Eastern Daylight Time (US & Canada)","tz_str":"EST5EDT,M3.2.0,M11.1.0","dst_offset":60,"month12":null,"current":0.932932,"voltage":118.890282,"power":107.826982,"total":8.194,"time":"02/21/2018 16:23:58","ledon":true,"systemtime":"02/21/2018 16:23:58"}|smartPlug2|0 |24 Reference: https://www.gitbook.com/book/jaceklaskowski/spark-structured-streaming/details https://github.com/jaceklaskowski/spark-structured-streaming-book/blob/master/spark-sql-streaming-KafkaSource.adoc https://community.hortonworks.com/articles/91379/spark-structured-streaming-with-nifi-and-kafka-usi.html https://community.hortonworks.com/articles/173818/hdp-264-hdf-31-apache-spark-streaming-integration.html https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html https://databricks.com/blog/2017/04/04/real-time-end-to-end-integration-with-apache-kafka-in-apache-sparks-structured-streaming.html https://github.com/zaratsian/Spark/blob/master/pyspark_structured_stream_kafka.py https://databricks.com/blog/2017/04/26/processing-data-in-apache-kafka-with-structured-streaming-in-apache-spark-2-2.html https://mtpatter.github.io/bilao/notebooks/html/01-spark-struct-stream-kafka.html

TimothySpann · ‎02-20-2018

Structured Spark Streaming is uses Spark SQL https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html https://jaceklaskowski.gitbooks.io/spark-structured-streaming/spark-structured-streaming.html

TimothySpann · ‎02-17-2018

Version: HDF 3.1, HDP 2.6.4, PySpark 2.2.0, Python 2.7, Apache NiFi 1.5. I push my power data from a local Apache NiFi 1.5 server over Site-to-Site HTTP to a cloud hosted HDF 3.1 cluster. This cluster has a Remote Input that passes the information on to a version controlled Process Group called "Spark-Kafka-Streaming". Once inside, I set a schema name and data type then push the data to Kafka 1.0 hosted in HDF 3.1. The PublishKafkaRecord_1.0 settings are super easy. We use the JsonTreeReader and the supplied schema to read the JSON file into records. I chose to use the JsonRecordSetWriter to push JSON out. I could have easily done Apache Avro or CSV or another format. I chose JSON as it is easy to work with in Apache Spark and good for debug display. This method and code will work for several versions forward, but I cannot confirm for previous versions. This article is how to connect Apache NiFi with Apache Spark via Kafka using Spark Streaming. The example code is in PySpark. I run the streaming Spark code two different ways for testing: First way is via Apache Zeppelin, you will need to load the Apache Spark Kafka Streaming package to Apache Zeppelin To add Kafka Streaming Support we just add a dependency to the spark2 interpreter and restart the interpreter with the restart button. No need to restart Apache Zeppelin or a server. The other way I run this is as a Spark Submit with YARN Master in Cluster mode. As you see here I also include the Spark Streaming Kafka Package. /usr/hdp/current/spark2-client/bin/spark-submit --master yarn --deploy-mode cluster --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.2.0 kafka_test.py My example PySpark program is really basic but shows you the integration. This is forked from the standard Spark example. import sys from pyspark import SparkContext from pyspark.streaming import StreamingContext from pyspark.streaming.kafka import KafkaUtils sc = SparkContext(appName="kafkaTest") ssc = StreamingContext(sc,5) print "Connected to spark streaming" def process(time, rdd): print("========= %s =========" % str(time)) if not rdd.isEmpty(): rdd.count() rdd.first() ssc = StreamingContext(sc, 5) kafkaStream = KafkaUtils.createStream(ssc, "server:2181", "pysparkclient1", {"smartPlug": 1}) kafkaStream.pprint() kafkaStream.foreachRDD(process) ssc.start() ssc.awaitTermination() This program runs every 5 seconds and grabs the Kafka JSON message as an RDD, if it's not empty, I run a count and get the first row. You can see the application running in Apache YARN UI. From Apache Ambari we can monitor the data moving through the Kafka Broker topics. We can also monitor the Spark job via the URL supplied in the output of the submit. We can see the STDOUT of the submitted Spark job here in the YARN logs. Example PySpark Run root@princeton0 demo]# ./submit.sh Ivy Default Cache set to: /root/.ivy2/cache The jars for the packages stored in: /root/.ivy2/jars :: loading settings :: url = jar:file:/usr/hdp/2.6.4.0-91/spark2/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml org.apache.spark#spark-streaming-kafka-0-8_2.11 added as a dependency :: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0 confs: [default] found org.apache.spark#spark-streaming-kafka-0-8_2.11;2.2.0 in central found org.apache.kafka#kafka_2.11;0.8.2.1 in central found org.scala-lang.modules#scala-xml_2.11;1.0.2 in central found com.yammer.metrics#metrics-core;2.2.0 in central found org.slf4j#slf4j-api;1.7.16 in central found org.scala-lang.modules#scala-parser-combinators_2.11;1.0.2 in central found com.101tec#zkclient;0.3 in central found log4j#log4j;1.2.17 in central found org.apache.kafka#kafka-clients;0.8.2.1 in central found net.jpountz.lz4#lz4;1.3.0 in central found org.xerial.snappy#snappy-java;1.1.2.6 in central found org.apache.spark#spark-tags_2.11;2.2.0 in central found org.spark-project.spark#unused;1.0.0 in central :: resolution report :: resolve 3452ms :: artifacts dl 21ms :: modules in use: com.101tec#zkclient;0.3 from central in [default] com.yammer.metrics#metrics-core;2.2.0 from central in [default] log4j#log4j;1.2.17 from central in [default] net.jpountz.lz4#lz4;1.3.0 from central in [default] org.apache.kafka#kafka-clients;0.8.2.1 from central in [default] org.apache.kafka#kafka_2.11;0.8.2.1 from central in [default] org.apache.spark#spark-streaming-kafka-0-8_2.11;2.2.0 from central in [default] org.apache.spark#spark-tags_2.11;2.2.0 from central in [default] org.scala-lang.modules#scala-parser-combinators_2.11;1.0.2 from central in [default] org.scala-lang.modules#scala-xml_2.11;1.0.2 from central in [default] org.slf4j#slf4j-api;1.7.16 from central in [default] org.spark-project.spark#unused;1.0.0 from central in [default] org.xerial.snappy#snappy-java;1.1.2.6 from central in [default] --------------------------------------------------------------------- | | modules || artifacts | | conf | number| search|dwnlded|evicted|| number|dwnlded| --------------------------------------------------------------------- | default | 13 | 2 | 2 | 0 || 13 | 0 | --------------------------------------------------------------------- :: retrieving :: org.apache.spark#spark-submit-parent confs: [default] 0 artifacts copied, 13 already retrieved (0kB/23ms) 18/02/17 01:03:00 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/02/17 01:03:01 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 18/02/17 01:03:01 INFO RMProxy: Connecting to ResourceManager at princeton0.field.hortonworks.com/172.26.200.216:8050 18/02/17 01:03:01 INFO Client: Requesting a new application from cluster with 1 NodeManagers 18/02/17 01:03:02 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (43008 MB per container) 18/02/17 01:03:02 INFO Client: Will allocate AM container, with 1408 MB memory including 384 MB overhead 18/02/17 01:03:02 INFO Client: Setting up container launch context for our AM 18/02/17 01:03:02 INFO Client: Setting up the launch environment for our AM container 18/02/17 01:03:02 INFO Client: Preparing resources for our AM container 18/02/17 01:03:04 INFO Client: Use hdfs cache file as spark.yarn.archive for HDP, hdfsCacheFile:hdfs://princeton0.field.hortonworks.com:8020/hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz 18/02/17 01:03:04 INFO Client: Source and destination file systems are the same. Not copying hdfs://princeton0.field.hortonworks.com:8020/hdp/apps/2.6.4.0-91/spark2/spark2-hdp-yarn-archive.tar.gz 18/02/17 01:03:04 INFO Client: Uploading resource file:/root/.ivy2/jars/org.apache.spark_spark-streaming-kafka-0-8_2.11-2.2.0.jar -> hdfs://princeton0.field.hortonworks.com:8020/user/root/.sparkStaging/application_1517883514475_0424/org.apache.spark_spark-streaming-kafka-0-8_2.11-2.2.0.jar 18/02/17 01:03:05 INFO Client: Uploading resource file:/root/.ivy2/jars/org.apache.kafka_kafka_2.11-0.8.2.1.jar -> hdfs://princeton0.field.hortonworks.com:8020/user/root/.sparkStaging/application_1517883514475_0424/org.apache.kafka_kafka_2.11-0.8.2.1.jar 18/02/17 01:03:05 INFO Client: Uploading resource file:/root/.ivy2/jars/org.apache.spark_spark-tags_2.11-2.2.0.jar -> hdfs://princeton0.field.hortonworks.com:8020/user/root/.sparkStaging/application_1517883514475_0424/org.apache.spark_spark-tags_2.11-2.2.0.jar 18/02/17 01:03:05 INFO Client: Uploading resource file:/root/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar -> hdfs://princeton0.field.hortonworks.com:8020/user/root/.sparkStaging/application_1517883514475_0424/org.spark-project.spark_unused-1.0.0.jar 18/02/17 01:03:05 INFO Client: Uploading resource file:/root/.ivy2/jars/org.scala-lang.modules_scala-xml_2.11-1.0.2.jar -> hdfs://princeton0.field.hortonworks.com:8020/user/root/.sparkStaging/application_1517883514475_0424/org.scala-lang.modules_scala-xml_2.11-1.0.2.jar 18/02/17 01:03:05 INFO Client: Uploading resource file:/root/.ivy2/jars/com.yammer.metrics_metrics-core-2.2.0.jar -> hdfs://princeton0.field.hortonworks.com:8020/user/root/.sparkStaging/application_1517883514475_0424/com.yammer.metrics_metrics-core-2.2.0.jar 18/02/17 01:03:05 INFO Client: Uploading resource file:/root/.ivy2/jars/org.scala-lang.modules_scala-parser-combinators_2.11-1.0.2.jar -> hdfs://princeton0.field.hortonworks.com:8020/user/root/.sparkStaging/application_1517883514475_0424/org.scala-lang.modules_scala-parser-combinators_2.11-1.0.2.jar 18/02/17 01:03:05 INFO Client: Uploading resource file:/root/.ivy2/jars/com.101tec_zkclient-0.3.jar -> hdfs://princeton0.field.hortonworks.com:8020/user/root/.sparkStaging/application_1517883514475_0424/com.101tec_zkclient-0.3.jar 18/02/17 01:03:05 INFO Client: Uploading resource file:/root/.ivy2/jars/org.apache.kafka_kafka-clients-0.8.2.1.jar -> hdfs://princeton0.field.hortonworks.com:8020/user/root/.sparkStaging/application_1517883514475_0424/org.apache.kafka_kafka-clients-0.8.2.1.jar 18/02/17 01:03:05 INFO Client: Uploading resource file:/root/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar -> hdfs://princeton0.field.hortonworks.com:8020/user/root/.sparkStaging/application_1517883514475_0424/org.slf4j_slf4j-api-1.7.16.jar 18/02/17 01:03:05 INFO Client: Uploading resource file:/root/.ivy2/jars/log4j_log4j-1.2.17.jar -> hdfs://princeton0.field.hortonworks.com:8020/user/root/.sparkStaging/application_1517883514475_0424/log4j_log4j-1.2.17.jar 18/02/17 01:03:05 INFO Client: Uploading resource file:/root/.ivy2/jars/net.jpountz.lz4_lz4-1.3.0.jar -> hdfs://princeton0.field.hortonworks.com:8020/user/root/.sparkStaging/application_1517883514475_0424/net.jpountz.lz4_lz4-1.3.0.jar 18/02/17 01:03:05 INFO Client: Uploading resource file:/root/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar -> hdfs://princeton0.field.hortonworks.com:8020/user/root/.sparkStaging/application_1517883514475_0424/org.xerial.snappy_snappy-java-1.1.2.6.jar 18/02/17 01:03:05 INFO Client: Uploading resource file:/opt/demo/kafka_test.py -> hdfs://princeton0.field.hortonworks.com:8020/user/root/.sparkStaging/application_1517883514475_0424/kafka_test.py 18/02/17 01:03:05 INFO Client: Uploading resource file:/usr/hdp/current/spark2-client/python/lib/pyspark.zip -> hdfs://princeton0.field.hortonworks.com:8020/user/root/.sparkStaging/application_1517883514475_0424/pyspark.zip 18/02/17 01:03:06 INFO Client: Uploading resource file:/usr/hdp/current/spark2-client/python/lib/py4j-0.10.4-src.zip -> hdfs://princeton0.field.hortonworks.com:8020/user/root/.sparkStaging/application_1517883514475_0424/py4j-0.10.4-src.zip 18/02/17 01:03:06 WARN Client: Same path resource file:/root/.ivy2/jars/org.apache.spark_spark-streaming-kafka-0-8_2.11-2.2.0.jar added multiple times to distributed cache. 18/02/17 01:03:06 WARN Client: Same path resource file:/root/.ivy2/jars/org.apache.kafka_kafka_2.11-0.8.2.1.jar added multiple times to distributed cache. 18/02/17 01:03:06 WARN Client: Same path resource file:/root/.ivy2/jars/org.apache.spark_spark-tags_2.11-2.2.0.jar added multiple times to distributed cache. 18/02/17 01:03:06 WARN Client: Same path resource file:/root/.ivy2/jars/org.spark-project.spark_unused-1.0.0.jar added multiple times to distributed cache. 18/02/17 01:03:06 WARN Client: Same path resource file:/root/.ivy2/jars/org.scala-lang.modules_scala-xml_2.11-1.0.2.jar added multiple times to distributed cache. 18/02/17 01:03:06 WARN Client: Same path resource file:/root/.ivy2/jars/com.yammer.metrics_metrics-core-2.2.0.jar added multiple times to distributed cache. 18/02/17 01:03:06 WARN Client: Same path resource file:/root/.ivy2/jars/org.scala-lang.modules_scala-parser-combinators_2.11-1.0.2.jar added multiple times to distributed cache. 18/02/17 01:03:06 WARN Client: Same path resource file:/root/.ivy2/jars/com.101tec_zkclient-0.3.jar added multiple times to distributed cache. 18/02/17 01:03:06 WARN Client: Same path resource file:/root/.ivy2/jars/org.apache.kafka_kafka-clients-0.8.2.1.jar added multiple times to distributed cache. 18/02/17 01:03:06 WARN Client: Same path resource file:/root/.ivy2/jars/org.slf4j_slf4j-api-1.7.16.jar added multiple times to distributed cache. 18/02/17 01:03:06 WARN Client: Same path resource file:/root/.ivy2/jars/log4j_log4j-1.2.17.jar added multiple times to distributed cache. 18/02/17 01:03:06 WARN Client: Same path resource file:/root/.ivy2/jars/net.jpountz.lz4_lz4-1.3.0.jar added multiple times to distributed cache. 18/02/17 01:03:06 WARN Client: Same path resource file:/root/.ivy2/jars/org.xerial.snappy_snappy-java-1.1.2.6.jar added multiple times to distributed cache. 18/02/17 01:03:06 INFO Client: Uploading resource file:/tmp/spark-bc1bedca-6201-4715-812e-cd06f8e6efac/__spark_conf__9099337700911844616.zip -> hdfs://princeton0.field.hortonworks.com:8020/user/root/.sparkStaging/application_1517883514475_0424/__spark_conf__.zip 18/02/17 01:03:06 INFO SecurityManager: Changing view acls to: root 18/02/17 01:03:06 INFO SecurityManager: Changing modify acls to: root 18/02/17 01:03:06 INFO SecurityManager: Changing view acls groups to: 18/02/17 01:03:06 INFO SecurityManager: Changing modify acls groups to: 18/02/17 01:03:06 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 18/02/17 01:03:06 INFO Client: Submitting application application_1517883514475_0424 to ResourceManager 18/02/17 01:03:06 INFO YarnClientImpl: Submitted application application_1517883514475_0424 18/02/17 01:03:07 INFO Client: Application report for application_1517883514475_0424 (state: ACCEPTED) 18/02/17 01:03:07 INFO Client: client token: N/A diagnostics: AM container is launched, waiting for AM container to Register with RM ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1518829386408 final status: UNDEFINED tracking URL: http://princeton0.field.hortonworks.com:8088/proxy/application_1517883514475_0424/ user: root Source https://github.com/tspannhw/nifi-sparkstreaming-kafka spark-kafka-streaming.xml Reference Using the data from here: https://community.hortonworks.com/articles/155326/monitoring-energy-usage-utilizing-apache-nifi-pyth.html Example Data {"day19": 2.035, "day20": 1.191, "day21": 0.637, "day22": 1.497, "day23": 1.151, "day24": 1.227, "day25": 1.387, "day26": 1.138, "day27": 1.204, "day28": 1.401, "day29": 1.288, "day30": 1.439, "day31": 0.126, "day1": 1.204, "day2": 1.006, "day3": 1.257, "day4": 1.053, "day5": 1.597, "day6": 1.642, "day7": 1.439, "day8": 1.178, "day9": 1.259, "day10": 0.995, "day11": 0.569, "day12": 1.287, "day13": 1.371, "day14": 1.404, "day15": 1.588, "day16": 0.474, "day17": 1.438, "day18": 1.056, "sw_ver": "1.1.1 Build 160725 Rel.164033", "hw_ver": "1.0", "mac": "50:C7:BF:B1:95:D5", "type": "IOT.SMARTPLUGSWITCH", "hwId": "60FF6B258734EA6880E186F8C96DDC61", "fwId": "060BFEA28A8CD1E67146EB5B2B599CC8", "oemId": "FFF22CFF774A0B89F7624BFC6F50D5DE", "dev_name": "Wi-Fi Smart Plug With Energy Monitoring", "model": "HS110(US)", "deviceId": "8006ECB1D454C4428953CB2B34D9292D18A6DB0E", "alias": "Tim Spann's MiniFi Controller SmartPlug - Desk1", "icon_hash": "", "relay_state": 1, "on_time": 5778, "active_mode": "schedule", "feature": "TIM:ENE", "updating": 0, "rssi": -35, "led_off": 0, "latitude": 40.268216, "longitude": -74.529088, "index": 18, "zone_str": "(UTC-05:00) Eastern Daylight Time (US & Canada)", "tz_str": "EST5EDT,M3.2.0,M11.1.0", "dst_offset": 60, "month1": 32.674, "month2": 19.323, "current": 0.664822, "voltage": 121.700245, "power": 77.280039, "total": 0.158, "time": "02/16/2018 12:20:08", "ledon": true, "systemtime": "02/16/2018 12:20:08"}

TimothySpann · ‎02-13-2018

I have posted an ExecuteSparkInteractive article

TimothySpann · ‎02-10-2018

Apache Deep Learning 101 Series This is for people preparing to attend my talk on Deep Learning at DataWorks Summit Berling 2018 (https://dataworkssummit.com/berlin-2018/#agenda) on Thursday April 19, 2018 at 11:50AM Berlin time. You can easily run Apache MXNet on an OSX machine or a Linux workstation utilizing a Python script. I have forked the standard Apache MXNet Wine Detector Tutorial (http://mxnet.incubator.apache.org/tutorials/embedded/wine_detector.html) to read our local OSX webcam (you may need to change your OpenCV WebCam port from 0 to 1 or to 2, depending on your number of webcams and which one you want to use. I am running this on an OSX laptop connected to a monitor that has a built in webcam, so I use that one which is 1. The webcam numbering starts at 0. If you only have one, then use 0. Let's get this installed! git clone https://github.com/apache/incubator-mxnet.git The installation instructions at Apache MXNet's website (http://mxnet.incubator.apache.org/install/index.html) are amazing. Pick your platform and your style. I am doing this the simplest way on a Mac, but you can use Virtual Python Environment which may be best for you. git clone https://github.com/tspannhw/ApacheBigData101.git You will want to copy my shell script osxlocalrun.sh, inception copy and analyze.py script to your machine. If you don't have a webcam you will want to use the Centos version of the shell and Python. That one works with a static image that you supply. I am assuming you are running a recently updated Mac with 16GB of RAM or more, PIP, Brew and Python 3 installed already. If not, do that. If you have a pre-1.0 Apache MXNet, please upgrade. You will need curl and tar installed which they should be. cd incubator-mxnet mkdir images curl --header 'Host: data.mxnet.io' --header 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Firefox/45.0' --header 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' --header 'Accept-Language: en-US,en;q=0.5' --header 'Referer: http://data.mxnet.io/models/imagenet/' --header 'Connection: keep-alive' 'http://data.mxnet.io/models/imagenet/inception-bn.tar.gz' -o 'inception-bn.tar.gz' -L tar -xvzf inception-bn.tar.gz cp Inception-BN-0126.params Inception-BN-0000.params Then brew update pip install --upgrade pip pip install --upgrade setuptools pip install mxnet==1.0.0 brew install graphviz pip install graphviz For your machine if you have two versions of Python, you may need to do pip3 and you may need to run via sudo. It depends on how your machine is setup and how locked down it is. We are creating a directory called images that will fill with OpenCV capture images. You probably want to delete them or ingest them. It's very easy to ingest with Apache NiFi or MiniFi both of which run on OSX with ease. See: https://community.hortonworks.com/articles/107379/minifi-for-image-capture-and-ingestion-from-raspbe.html So we call a simple shell script (osxlocalrun.sh), which calls our custom Python 3 script (you can easily convert this to Python 2 if you need to, in a future article I have this running on Python 2.7 on a Centos 7 HDP 2.6.4 cluster node). I send warnings to /dev/null to get rid of them since they are related to OSX configuration that you may or may not have and cannot easily change. Nothing to see here. You will probably need to chmod 755 your osxlocalrun.sh. If you are running on a Linux variant, follow this directions on the Apache MXNet site or wait for my next article on installing and using Apache MXNet in Centos-based HDP 2.6.4 and HDF 3.1 clusters. python3 -W ignore analyze.py 2>/dev/null For Apache NiFi Flow Templates You can download my Apache NiFi flows from github or this article. Architecture Local Apache NiFi 1.5 with NiFi Registry running with JDK 8 on OSX Local Apache MXNet installation with Python 3 Remote HDF 3.1 Cluster Running on Centos 7 on OpenStack with Apache Ambari, Apache NiFi, NiFi Registry, Hortonworks Schema Registry. Remote HDP 2.6.4 Cluster Runniong on Centos 7 on OpenStack with Apache Hive, Apache Ambari The flow is easy: ExecuteProcess: Execute that shell script UpdateAttribute: Add the schema name InferAvroSchema: Really need this one only once if you don't want to hand create your schema, push the results to an attribute Remote Process Group: Send via HTTP Site-to-Site to an HDF 3.1 cluster. Local OSX Processing Cluster based Record Processing On the cloud we use ConvertRecord to convert the Apache MXNet Python script generated JSON into AVRO. We merge a bunch of those together then convert that larger AVRO record to ORC. This ORC file is stored in HDFS. Apache NiFi will automatically generate Hive DDL that we can instantly execute via Apache NiFi or do manually. I do this manually in Apache Zeppelin. I could easily augment this data with weather, twitter and other REST feeds. Those have been covered in other articles I have written. I could also push the results to Kafka 1.0 for additional processing in Hortonworks Streaming Analytics Manager. I will do that a future time. Apache Hive SQL DDL CREATE EXTERNAL TABLE IF NOT EXISTS inception3 (uuid STRING, top1pct STRING, top1 STRING, top2pct STRING, top2 STRING, top3pct STRING, top3 STRING, top4pct STRING, top4 STRING, top5pct STRING, top5 STRING, imagefilename STRING, runtime STRING) STORED AS ORC LOCATION '/mxnet/local' Example Output {"uuid": "mxnet_uuid_img_20180208204131", "top1pct": "30.0999999046", "top1": "n02871525 bookshop, bookstore, bookstall", "top2pct": "23.7000003457", "top2": "n04200800 shoe shop, shoe-shop, shoe store", "top3pct": "4.80000004172", "top3": "n03141823 crutch", "top4pct": "2.89999991655", "top4": "n04370456 sweatshirt", "top5pct": "2.80000008643", "top5": "n02834397 bib", "imagefilename": "images/tx1_image_img_20180208204131.jpg", "runtime": "2"} Query Results Example OpenCV Captured Image {"top1pct": "67.6", "top5": "n03485794 handkerchief, hankie, hanky, hankey", "top4": "n04590129 window shade", "top3": "n03938244 pillow", "top2": "n04589890 window screen", "top1": "n02883205 bow tie, bow-tie, bowtie", "top2pct": "11.5", "imagefilename": "nanotie7.png", "top3pct": "4.5", "uuid": "mxnet_uuid_img_20180211161220", "top4pct": "2.8", "top5pct": "2.8", "runtime": "3.0"} My cat assists me in some Deep Learning work, so I use Apache NiFi to track him to make sure he's working and hasn't taken his tie off during office hours. I run a strict office here in the Princeton lab. Source Code https://github.com/tspannhw/ApacheBigData101/tree/master apache-mxnet-local.xml apachemxnet-local-processing.xml References: https://community.hortonworks.com/articles/155435/using-the-new-mxnet-model-server.html https://community.hortonworks.com/articles/83100/deep-learning-iot-workflows-with-raspberry-pi-mqtt.html https://community.hortonworks.com/articles/146704/edge-analytics-with-nvidia-jetson-tx1-running-apac.html http://mxnet.incubator.apache.org/ In the Series: Interfacing with MXNet Model Server Using Apache MXNet with HDF 3.1 Clusters Using Apache MXNet with HDP 2.6.4 Clusters Using Apache MXNet with Hadoop 3.0 YARN 3.0 HDP 3.0 Dockerized GPU Aware Clusters

TimothySpann · ‎02-09-2018

I want to easily integrate Apache Spark jobs with my Apache NiFi flows. Fortunately with the release of HDF 3.1, I can do that via Apache NiFi's ExecuteSparkInteractive processor. First step, let me set up a Centos 7 cluster with HDF 3.1, follow the well-written guide here. With the magic of time lapse photography, instantly we have a new cluster of goodness: It is important to note the new NiFi Registry for doing version control and more. We also get the new Kafka 1.0, updated SAM and the ever important updated Schema Registry. The star of the show today tis Apache NiFi 1.5 here. My first step is to Add a Controller Service (LivySessionController). Then we add the Apache Livy Server, you can find this in your Ambari UI. It is by default port 8999. For my session, I am doing Python, so I picked pyspark. You can also pick pyspark3 for Python 3 code, spark for Scala, and sparkr for R. To execute my Python job, you can pass the code in from a previous processor to the ExecuteSparkInteractive processor or put the code inline. I put the code inline. Two new features of Schema Registry I have to mention are the version comparison: You click the COMPARE VERSIONS link and now you have a nice comparison UI. And the amazing new Swagger documentation for interactive documentation and testing of the schema registry APIs. Not only do you get all the parameters for input and output, the full URL and a Curl example, you get to run the code live against your server. I will be adding an article on how to use Apache NiFi to grab schemas from data using InferAvroSchema and publish these new schemas to the Schema Registry vai REST API automagically. Part two of this article will focus on the details of using Apache Livy + Apache NiFi + Apache Spark with the new processor to call jobs. Part 2 -> https://community.hortonworks.com/articles/171787/hdf-31-executing-apache-spark-via-executesparkinte.html References https://community.hortonworks.com/articles/148730/integrating-apache-spark-2x-jobs-with-apache-nifi.html https://community.hortonworks.com/articles/73828/submitting-spark-jobs-from-apache-nifi-using-livy.html

TimothySpann · ‎02-07-2018

Use Case: Ingesting energy data and running an Apache Spark job as part of the flow. We will be using the new (in Apache NiFi 1.5 / HDF 3.1) ExecuteSparkInteractive processor with the LivyController to accomplish that integration. As we mentioned in the first part of the article, it's pretty easy to set this up. Since this is a modern Apache NiFi project, we use version control on our code: On a local machine, I am talking to an electricity sensor over WiFi in a Python script. This code is processed, cleaned and sent to a cloud hosted Apache NiFi instance via S2S over HTTP. In the cloud we receive the pushed messages. Once we open the Spark It Up processor group, we have a flow to process the data. Flow Overview QueryRecord: Determine how to route based on query on streaming data. Converts JSON to Apache AVRO. Path for All Files UpdateAttribute: Set a schema MergeContent: Do an Apache AVRO merge on our data to make bigger files. ConvertAvroToORC: Build an Apache ORC file from merged Apache AVRO file. PutHDFS: Store our Apache ORC file in an HDFS directory on our HDP 2.6.4 cluster. Path For Large Voltage ExecuteSparkInteractive: Call our PySpark job PutHDFS: Store the results to HDFS. We could take all the metadata attributes and send them somewhere or store them as a JSON file. We tested our PySpark program in Apache Zeppelin and then copy it to our processor. Our ExecuteSparkInteractive Processor: In our QueryProcessor we send messages with large voltages to the Apache Spark executor to run a PySpark job to do some more processing. Once we have submitted a job via Apache Livy, we are now able to see the job during and after execution with detailed Apache Livy UI screens and Spark screens. In the Apache Livy UI screen below we can see the PySpark code executed and it's output. Apache Livy UI Apache Spark Jobs UI - Jobs Apache Spark Jobs UI - SQL Apache Spark Jobs UI - Executors Apache Zeppelin SQL Search of the Data Hive / Spark SQL Table DDL Generated Automagically by Apache NiFi Below are the source code related to this article: Source Code: https://github.com/tspannhw/nifi-spark-livy PySpark Code shdf = spark.read.json("hdfs://yourhdp264server:8020/spark2-history") shdf.printSchema() shdf.createOrReplaceTempView("sparklogs") stuffdf = spark.sql("SELECT * FROM sparklogs") stuffdf.count() This is a pretty simple PySpark application to read the JSON results of Spark2 History, print a schema inferred from it and then do a simple SELECT and count. We could do Spark machine learning or other processing in there very easily. You can run Python 2.x or 3.x for this with PySpark. I am running this in Apache Spark 2.2.0 hosted on a HDP 2.6.4 cluster running Centos 7. The fun part is that everytime I run this Spark job it produces more results for it to read. I should probably just read that log in Apache NiFi, but it was a fun little example. Clearly you can run any kind of job in here, my next article will be around running Apache MXNet and Spark MLib jobs through Apache Livy and Apache NiFi. For a quick side note, you have a lot of options for working with schemas now: Schema For Energy Data inferred.avro.schema { "type" : "record", "name" : "smartPlug", "fields" : [ { "name" : "day19", "type" : "double", "doc" : "Type inferred from '2.035'" }, { "name" : "day20", "type" : "double", "doc" : "Type inferred from '1.191'" }, { "name" : "day21", "type" : "double", "doc" : "Type inferred from '0.637'" }, { "name" : "day22", "type" : "double", "doc" : "Type inferred from '1.497'" }, { "name" : "day23", "type" : "double", "doc" : "Type inferred from '1.151'" }, { "name" : "day24", "type" : "double", "doc" : "Type inferred from '1.227'" }, { "name" : "day25", "type" : "double", "doc" : "Type inferred from '1.387'" }, { "name" : "day26", "type" : "double", "doc" : "Type inferred from '1.138'" }, { "name" : "day27", "type" : "double", "doc" : "Type inferred from '1.204'" }, { "name" : "day28", "type" : "double", "doc" : "Type inferred from '1.401'" }, { "name" : "day29", "type" : "double", "doc" : "Type inferred from '1.288'" }, { "name" : "day30", "type" : "double", "doc" : "Type inferred from '1.439'" }, { "name" : "day31", "type" : "double", "doc" : "Type inferred from '0.126'" }, { "name" : "day1", "type" : "double", "doc" : "Type inferred from '1.204'" }, { "name" : "day2", "type" : "double", "doc" : "Type inferred from '1.006'" }, { "name" : "day3", "type" : "double", "doc" : "Type inferred from '1.257'" }, { "name" : "day4", "type" : "double", "doc" : "Type inferred from '1.053'" }, { "name" : "day5", "type" : "double", "doc" : "Type inferred from '1.597'" }, { "name" : "day6", "type" : "double", "doc" : "Type inferred from '1.642'" }, { "name" : "day7", "type" : "double", "doc" : "Type inferred from '0.443'" }, { "name" : "day8", "type" : "double", "doc" : "Type inferred from '0.01'" }, { "name" : "day9", "type" : "double", "doc" : "Type inferred from '0.009'" }, { "name" : "day10", "type" : "double", "doc" : "Type inferred from '0.009'" }, { "name" : "day11", "type" : "double", "doc" : "Type inferred from '0.075'" }, { "name" : "day12", "type" : "double", "doc" : "Type inferred from '1.149'" }, { "name" : "day13", "type" : "double", "doc" : "Type inferred from '1.014'" }, { "name" : "day14", "type" : "double", "doc" : "Type inferred from '0.851'" }, { "name" : "day15", "type" : "double", "doc" : "Type inferred from '1.134'" }, { "name" : "day16", "type" : "double", "doc" : "Type inferred from '1.54'" }, { "name" : "day17", "type" : "double", "doc" : "Type inferred from '1.438'" }, { "name" : "day18", "type" : "double", "doc" : "Type inferred from '1.056'" }, { "name" : "sw_ver", "type" : "string", "doc" : "Type inferred from '\"1.1.1 Build 160725 Rel.164033\"'" }, { "name" : "hw_ver", "type" : "string", "doc" : "Type inferred from '\"1.0\"'" }, { "name" : "mac", "type" : "string", "doc" : "Type inferred from '\"50:C7:BF:B1:95:D5\"'" }, { "name" : "type", "type" : "string", "doc" : "Type inferred from '\"IOT.SMARTPLUGSWITCH\"'" }, { "name" : "hwId", "type" : "string", "doc" : "Type inferred from '\"60FF6B258734EA6880E186F8C96DDC61\"'" }, { "name" : "fwId", "type" : "string", "doc" : "Type inferred from '\"060BFEA28A8CD1E67146EB5B2B599CC8\"'" }, { "name" : "oemId", "type" : "string", "doc" : "Type inferred from '\"FFF22CFF774A0B89F7624BFC6F50D5DE\"'" }, { "name" : "dev_name", "type" : "string", "doc" : "Type inferred from '\"Wi-Fi Smart Plug With Energy Monitoring\"'" }, { "name" : "model", "type" : "string", "doc" : "Type inferred from '\"HS110(US)\"'" }, { "name" : "deviceId", "type" : "string", "doc" : "Type inferred from '\"8006ECB1D454C4428953CB2B34D9292D18A6DB0E\"'" }, { "name" : "alias", "type" : "string", "doc" : "Type inferred from '\"Tim Spann's MiniFi Controller SmartPlug - Desk1\"'" }, { "name" : "icon_hash", "type" : "string", "doc" : "Type inferred from '\"\"'" }, { "name" : "relay_state", "type" : "int", "doc" : "Type inferred from '1'" }, { "name" : "on_time", "type" : "int", "doc" : "Type inferred from '1995745'" }, { "name" : "active_mode", "type" : "string", "doc" : "Type inferred from '\"schedule\"'" }, { "name" : "feature", "type" : "string", "doc" : "Type inferred from '\"TIM:ENE\"'" }, { "name" : "updating", "type" : "int", "doc" : "Type inferred from '0'" }, { "name" : "rssi", "type" : "int", "doc" : "Type inferred from '-34'" }, { "name" : "led_off", "type" : "int", "doc" : "Type inferred from '0'" }, { "name" : "latitude", "type" : "double", "doc" : "Type inferred from '40.268216'" }, { "name" : "longitude", "type" : "double", "doc" : "Type inferred from '-74.529088'" }, { "name" : "index", "type" : "int", "doc" : "Type inferred from '18'" }, { "name" : "zone_str", "type" : "string", "doc" : "Type inferred from '\"(UTC-05:00) Eastern Daylight Time (US & Canada)\"'" }, { "name" : "tz_str", "type" : "string", "doc" : "Type inferred from '\"EST5EDT,M3.2.0,M11.1.0\"'" }, { "name" : "dst_offset", "type" : "int", "doc" : "Type inferred from '60'" }, { "name" : "month1", "type" : "double", "doc" : "Type inferred from '32.674'" }, { "name" : "month2", "type" : "double", "doc" : "Type inferred from '8.202'" }, { "name" : "current", "type" : "double", "doc" : "Type inferred from '0.772548'" }, { "name" : "voltage", "type" : "double", "doc" : "Type inferred from '121.740428'" }, { "name" : "power", "type" : "double", "doc" : "Type inferred from '91.380606'" }, { "name" : "total", "type" : "double", "doc" : "Type inferred from '48.264'" }, { "name" : "time", "type" : "string", "doc" : "Type inferred from '\"02/07/2018 11:17:30\"'" }, { "name" : "ledon", "type" : "boolean", "doc" : "Type inferred from 'true'" }, { "name" : "systemtime", "type" : "string", "doc" : "Type inferred from '\"02/07/2018 11:17:30\"'" } ] } Python Source (Updated to include 31 days) from pyHS100 import SmartPlug, SmartBulb #from pprint import pformat as pf import json import datetime plug = SmartPlug("192.168.1.203") row = { } emeterdaily = plug.get_emeter_daily(year=2017, month=12) for k, v in emeterdaily.items(): row["day%s" % k] = v emeterdaily = plug.get_emeter_daily(year=2018, month=1) for k, v in emeterdaily.items(): row["day%s" % k] = v emeterdaily = plug.get_emeter_daily(year=2018, month=2) for k, v in emeterdaily.items(): row["day%s" % k] = v hwinfo = plug.hw_info for k, v in hwinfo.items(): row["%s" % k] = v sysinfo = plug.get_sysinfo() for k, v in sysinfo.items(): row["%s" % k] = v timezone = plug.timezone for k, v in timezone.items(): row["%s" % k] = v emetermonthly = plug.get_emeter_monthly(year=2018) for k, v in emetermonthly.items(): row["month%s" % k] = v realtime = plug.get_emeter_realtime() for k, v in realtime.items(): row["%s" % k] = v row['alias'] = plug.alias row['time'] = plug.time.strftime('%m/%d/%Y %H:%M:%S') row['ledon'] = plug.led row['systemtime'] = datetime.datetime.now().strftime('%m/%d/%Y %H:%M:%S') json_string = json.dumps(row) print(json_string) Example Output {"text\/plain":"root\n |-- App Attempt ID: string (nullable = true)\n |-- App ID: string (nullable = true)\n |-- App Name: string (nullable = true)\n |-- Block Manager ID: struct (nullable = true)\n | |-- Executor ID: string (nullable = true)\n | |-- Host: string (nullable = true)\n | |-- Port: long (nullable = true)\n |-- Classpath Entries: struct (nullable = true)\n | |-- \/etc\/hadoop\/conf\/: string (nullable = true)\n | |-- \/etc\/hadoop\/conf\/secure: string (nullable = true)\n | |-- \/etc\/zeppelin\/conf\/external-dependency-conf\/: string (nullable = true)\n | |-- \/hadoop\/yarn\/local\/usercache\/livy\/appcache\/application_1517883514475_0002\/container_e01_1517883514475_0002_01_000001: string (nullable = true)\n | |-- \/hadoop\/yarn\/local\/usercache\/livy\/appcache\/application_1517883514475_0002\/container_e01_1517883514475_0002_01_000001\/__spark_conf__: string (nullable = true)\n | |-- \/hadoop\/yarn\/local\/usercache\/livy\/appcache\/application_1517883514475_0002\/container_e01_1517883514475_0002_01_000001\/__spark_libs__\/JavaEWAH-0.3.2.jar: string (nullable = true)\n | |-- \/hadoop\/yarn\/local\/usercache\/livy\/appcache\/application_1517883514475_0002\/container_e01_1517883514475_0002_01_000001\/__spark_libs__\/RoaringBitmap-0.5.11.jar: string (nullable = true)\n | |-- \/hadoop\/yarn\/local\/usercache\/livy\/appcache\/application_1517883514475_0002\/container_e01_1517883514475_0002_01_000001\/__spark_libs__\/ST4-4.0.4.jar: string (nullable = true)\n | |-- \/hadoop\/yarn\/local\/usercache\/livy\/appcache\/application_1517883514475_0002\/container_e01_1517883514475_0002_01_000001\/__spark_libs__\/activation-1.1.1.jar: string (nullable = true)\n | |-- \/hadoop\/yarn\/local\/usercache\/livy\/appcache\/application_1517883514475_0002\/container_e01_1517883514475_0002_01_000001\/__spark_libs__\/aircompressor-0.8.jar: string (nullable = true)\n | |-- \/hadoop\/yarn\/local\/usercache\/livy\/appcache\/application_1517883514475_0002\/container_e01_1517883514475_0002_01_000001\/__spark_libs__\/antlr-2.7.7.jar: string (nullable = true)\n | |-- \/hadoop\/yarn\/local\/usercache\/livy\/appcache\/application_1517883514475_0002\/container_e01_1517883514475_0002_01_000001\/__spark_libs__\/antlr-runtime-3.4.jar: string (nullable = true)\n | |-- \/hadoop\/yarn\/local\/usercache\/livy\/appcache\/application_1517883514475_0002\/container_e01_1517883514475_0002_01_000001\/__spark_libs__\/antlr4-runtime-4.5.3.jar: string (nullable = true)\n | |-- \/hadoop\/yarn\/local\/usercache\/livy\/appcache\/ Shell Tip: Apache MXnet may have some warnings sent to STDERR. I don't want these, so send them to /dev/null: python3 -W ignore analyze.py 2>/dev/null Software: PySpark Python Apache NiFi Apache Spark HDF 3.1 HDP 2.6.4 Apache Hive Apache Avro Apache ORC Apache Ambari Apache Zeppelin Reference: https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi/nifi-livy-nar/1.5.0/org.apache.nifi.processors.livy.ExecuteSparkInteractive/index.html https://community.hortonworks.com/articles/73828/submitting-spark-jobs-from-apache-nifi-using-livy.html https://community.hortonworks.com/articles/148730/integrating-apache-spark-2x-jobs-with-apache-nifi.html https://community.hortonworks.com/articles/155326/monitoring-energy-usage-utilizing-apache-nifi-pyth.html https://github.com/tspannhw/nifi-smartplug/ https://github.com/tspannhw/nifi-spark-livy

Online	Offline
Last Visited	‎02-05-2026 01:38 AM

Member Since	‎01-07-2019 11:58 AM
Last Visited	‎02-05-2026 01:38 AM
Posts	1,973
Kudos received	1121

Cloudera Community

Re: Has anyone tried NiFi consuming (JMSConsume) f...

Re: NiFi Crash after runing chain of lookups

Re: Recommend approach for listening to RSS Feed i...

Re: NiFi ListenFTP Processor Default Data Port

Re: Nifi: Kafka Producer with Avro format in both ...

Apache Deep Learning 101: Using Apache MXNet on Ap...

Apache Deep Learning 101: Using Apache MXNet on an...

Re: HDP 2.6.4 - HDF 3.1: Apache Spark Streaming ...

HDP 2.6.4 - HDF 3.1: Apache Kafka - Apache Spark S...

Re: HDP 2.6.4 - HDF 3.1: Apache Spark Streaming ...

HDP 2.6.4 - HDF 3.1: Apache Spark Streaming Inte...

Re: New Features in Apache NiFi 1.5 – Apache NiFi ...

Apache Deep Learning 101: Using Apache MXNet on a...

HDF 3.1: Executing Apache Spark via ExecuteSparkIn...

HDF 3.1: Executing Apache Spark via ExecuteSparkI...