About TimothySpann

TimothySpann · ‎08-10-2016

I have look at it and there's no specific connectors for SnappyData in HDF. I am looking into writing one to handle in-memory data stores, perhaps using the Redis Connector as a start https://github.com/qntfy/nifi-redis

TimothySpann · ‎05-20-2016

I will give this a try and I'll post the results. For Windows and DBVisualizer, there's an article with step by step details. DBVisualizer Windows For Tableau: http://kb.tableau.com/articles/knowledgebase/connecting-to-hive-server-2-in-secure-mode For Squirrel SQL: https://community.hortonworks.com/questions/17381/hive-with-dbvisualiser-or-squirrel-sql-client.html

TimothySpann · ‎05-19-2016

Having run a bunch of Spark jobs locally, in Spark Standalone clusters and in HDP Yarn Clusters; I have found a few JVM settings that helped with debugging non-production jobs and assist with better Garbage Collection. This is important even with off-heap storage and bare metal optimizations. spark-submit --driver-java-options "-XX:+PrintGCDetails -XX:+UseG1GC -XX:MaxGCPauseMillis=400" You can also set options extra options in the runtime environment (see Spark Documentation). For HDP / Spark, you can add this from Ambari. In your Scala Spark Program: sparkConf.set("spark.cores.max", "4") sparkConf.set("spark.serializer", classOf[KryoSerializer].getName) sparkConf.set("spark.sql.tungsten.enabled", "true") sparkConf.set("spark.eventLog.enabled", "true") sparkConf.set("spark.app.id", "MyAppIWantToFind") sparkConf.set("spark.io.compression.codec", "snappy") sparkConf.set("spark.rdd.compress", "false") sparkConf.set("spark.suffle.compress", "true") Make sure you have Tungsten on, the KryoSerializer, eventLog enabled and use Logging. Logger.getLogger("org.apache.spark").setLevel(Level.WARN) Logger.getLogger("org.apache.spark.storage.BlockManager").setLevel(Level.ERROR) val log = Logger.getLogger("com.hortonworks.myapp") log.info("Started Logs Analysis") Also, whenever possible include relevant filters on your datasets: "filter(!_.clientIp.equals("Empty"))".

MattWho · ‎11-10-2016

@vlundberg This has nothing to do with being installed via Ambari. If the core-site.xml file that is being used by the HDFS processor in NiFi reference a Class which NiFi does not include, you will get a NoClassDef found error. Adding new Class to NiFi's HDFS NAR bundle may be a possibility, but as I am not a developer i can't speak to that. You can always file an Apache Jira against NiFi for this change. https://issues.apache.org/jira/secure/Dashboard.jspa Thanks, Matt

revinchalil · ‎05-10-2017

Thanks for the very useful article. I am getting the below when trying to compile. constructor cannot be instantiated to expected type; found : (T1, T2) required: org.apache.kafka.clients.consumer.ConsumerRecord[String,Array[Byte]] [ERROR] val rdd2 = rdd.map { case (k, v) => parseAVROToString(v) } Did anybody face this issue? Thanks.

pietro_fragnit1 · ‎05-12-2016

Ok Thanks! Seems adding this param works for me. #!/usr/bin/env bash # This file is sourced when running various Spark programs. # Copy it as spark-env.sh and edit that to configure Spark for your site. MASTER="yarn-cluster" # Options read in YARN client mode SPARK_EXECUTOR_INSTANCES="3" #Number of workers to start (Default: 2) #SPARK_EXECUTOR_CORES="1" #Number of cores for the workers (Default: 1). #SPARK_EXECUTOR_MEMORY="1G" #Memory per Worker (e.g. 1000M, 2G) (Default: 1G) #SPARK_DRIVER_MEMORY="512 Mb" #Memory for Master (e.g. 1000M, 2G) (Default: 512 Mb) #SPARK_YARN_APP_NAME="spark" #The name of your application (Default: Spark) #SPARK_YARN_QUEUE="~@~Xdefault~@~Y" #The hadoop queue to use for allocation requests (Default: @~Xdefault~@~Y) #SPARK_YARN_DIST_FILES="" #Comma separated list of files to be distributed with the job. #SPARK_YARN_DIST_ARCHIVES="" #Comma separated list of archives to be distributed with the job. # Generic options for the daemons used in the standalone deploy mode # Alternate conf dir. (Default: ${SPARK_HOME}/conf) export SPARK_CONF_DIR=${SPARK_CONF_DIR:-{{spark_home}}/conf} # Where log files are stored.(Default:${SPARK_HOME}/logs) #export SPARK_LOG_DIR=${SPARK_HOME:-{{spark_home}}}/logs export SPARK_LOG_DIR={{spark_log_dir}} # Where the pid file is stored. (Default: /tmp) export SPARK_PID_DIR={{spark_pid_dir}} # A string representing this instance of spark.(Default: $USER) SPARK_IDENT_STRING=$USER # The scheduling priority for daemons. (Default: 0) SPARK_NICENESS=0 export HADOOP_HOME=${HADOOP_HOME:-{{hadoop_home}}} export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-{{hadoop_conf_dir}}} # The java implementation to use. export JAVA_HOME={{java_home}} if [ -d "/etc/tez/conf/" ]; then export TEZ_CONF_DIR=/etc/tez/conf else export TEZ_CONF_DIR= fi ps:it works well but seems the params passed via command line (e.g.: --num-executors 8--num-executor-core 4--executor-memory 2G) are not taken in consideration. Instead, if I set the executors in "spark-env template" filed of Ambari, the params are taken in consideration. Anyway now it works 🙂 Thanks a lot.

bleonhardi · ‎05-10-2016

Would be interesting to see. There seem to be a couple data quality tools out there in the open source commnity mural/mosaic but the last update in the repository seems to have been 4 years ago. So not sure how useful that is. https://java.net/projects/mosaic

pacosoplas · ‎05-13-2016

Hi finally the problem was about the directory permission /var/run/ambari-server on the namenode I did: chown -R ambari:ambari /var/run/ambari-server

vnv · ‎05-02-2016

Love this!! Already sent it to some close sales reps for a good laugh 🙂 Great job Dan!

Raj_B · ‎05-03-2016

It's JDK 1.8

Online	Offline
Last Visited	‎05-20-2024 05:42 PM

Member Since	‎01-07-2019 11:58 AM
Last Visited	‎05-20-2024 05:42 PM
Posts	1,973
Kudos received	1122

Cloudera Community

Re: Has anyone tried NiFi consuming (JMSConsume) f...

Re: NiFi Crash after runing chain of lookups

Re: Recommend approach for listening to RSS Feed i...

Re: NiFi ListenFTP Processor Default Data Port

Re: Nifi: Kafka Producer with Avro format in both ...

Re: Has anyone used SnappyData with HDF?

Re: Using GUI SQL Tools Against Hive on HDP from M...

Spark 1.6 Tips in Code and Submission

Re: PUTHDFS processor not working - NoClassDefFoun...

Re: Receiving AVRO Messages through KAFKA in a Spa...

Re: Spark not using Yarn cluster resources

Re: Find Fields in Noise with Spark

Re: Adding Hosts to a Cluster

Re: How to simulate a Sales Executive with HDF

Re: Heavy CPU usage by NiFi Java process