About TimothySpann

TimothySpann · ‎12-25-2016

had to uninstall ambari Then I manually installed postgresql, chmod on the directories and ran sysctl enable and started postgresql

TimothySpann · ‎12-24-2016

Dec 24 19:45:45 tspanndev13.field.hortonworks.com systemd[1]: Failed to start PostgreSQL database server. -- Subject: Unit postgresql.service has failed -- Defined-By: systemd -- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel -- -- Unit postgresql.service has failed. -- -- The result is failed. Dec 24 19:45:45 tspanndev13.field.hortonworks.com systemd[1]: Unit postgresql.service entered failed state. Dec 24 19:45:45 tspanndev13.field.hortonworks.com systemd[1]: postgresql.service failed. Dec 24 19:45:45 tspanndev13.field.hortonworks.com polkitd[2452]: Unregistered Authentication Agent for unix-process:4241:769952344 (system bus name :1.8767, object path /org/freedesktop/PolicyKit1/Authent [root@tspannde

TimothySpann · ‎12-24-2016

cd to your Spark 1.4 directory /usr/hdp/current/spark-client possibly. That's probably it. or it might be here /usr/hdp/current/spark-historyserver/ run it from the bin there Your server or PC has as many cores as a CPU has, if it's a PC it might just have 4 or 8 or 16. you can set cores in spark in SparkConf in your code from the command line Note that only values explicitly specified through spark-defaults.conf , SparkConf , or the command line will appear. For all other configuration properties, you can assume the default value is used. Please see the full page here: https://spark.apache.org/docs/1.4.1/configuration.html#available-properties I highly recommend reading all of Spark's basic documentation before running Spark applications. They answer a lot of questions. ./bin/spark-submit --name "My app" --master local[4] --conf spark.shuffle.spill=false --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails -XX:+PrintGCTimeStamps" myApp.jar

TimothySpann · ‎12-24-2016

4040 is only when running you have to look at the history server. See: https://spark.apache.org/docs/1.4.1/monitoring.html As shown in this file you must specify the logging directory and a few parameters for these to be stored. You can look at jobs after they finished using the history monitor ./sbin/start-history-server.sh You ran as 4 threads local[4] Run Spark locally with K worker threads (which should be set to the number of cores on your machine). You are running local Spark Spark 1.4.1 is very old and you are running locally, why not upgrade to 1.6.2 or 2.x? If you are running in HDP 2.4 or HDP 2.5, history server is managed for you and running by default. If not start it with ambari. Also better to run under YARN.

TimothySpann · ‎12-24-2016

That is the way to go

TimothySpann · ‎12-23-2016

see: https://pierrevillard.com/2016/08/13/apache-nifi-1-0-0-cluster-setup/ The first thing is to configure the list of the ZK (ZooKeeper) instances in the configuration file ‘./conf/zookeep.properties‘. Since our three NiFi instances will run the embedded ZK instance, I just have to complete the file with the following properties: server.1=node-1:2888:3888 server.2=node-2:2888:3888 server.3=node-3:2888:3888 Then, everything happens in the ‘./conf/nifi.properties‘. First, I specify that NiFi must run an embedded ZK instance, with the following property: nifi.state.management.embedded.zookeeper.start=true I also specify the ZK connect string: nifi.zookeeper.connect.string=node-1:2181,node-2:2181,node-3:2181 As you can notice, the ./conf/zookeeper.properties file has a property named dataDir. By default, this value is set to ./state/zookeeper. If more than one NiFi node is running an embedded ZK, it is important to tell the server which one it is.

TimothySpann · ‎12-23-2016

Yes increase driver and executor memory. http://spark.apache.org/docs/latest/configuration.html Also specify more executors. in spark submit --master yarn --deploy-mode cluster In my code I like to: val sparkConf = new SparkConf().setAppName("Links") sparkConf.set("spark.cores.max", "32") sparkConf.set("spark.serializer", classOf[KryoSerializer].getName) sparkConf.set("spark.sql.tungsten.enabled", "true") sparkConf.set("spark.eventLog.enabled", "true") sparkConf.set("spark.app.id", "MyApp") sparkConf.set("spark.io.compression.codec", "snappy") sparkConf.set("spark.rdd.compress", "false") sparkConf.set("spark.suffle.compress", "true") See http://spark.apache.org/docs/latest/submitting-applications.html --executor-memory 32G \ --num-executors 50 \ Up these spark.driver.cores 32 Number of cores to use for the driver process, only in cluster mode. spark.driver.maxResultSize 1g Limit of total size of serialized results of all partitions for each Spark action (e.g. collect). Should be at least 1M, or 0 for unlimited. Jobs will be aborted if the total size is above this limit. Having a high limit may cause out-of-memory errors in driver (depends on spark.driver.memory and memory overhead of objects in JVM). Setting a proper limit can protect the driver from out-of-memory errors. spark.driver.memory 32g Amount of memory to use for the driver process, i.e. where SparkContext is initialized. (e.g. 1g , 2g ). Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-memory command line option or in your default properties file. spark.executor.memory 32g Amount of memory to use per executor process (e.g. 2g , 8g ). See http://spark.apache.org/docs/latest/running-on-yarn.html for running on YARN --driver-memory 32g \ --executor-memory 32g \ --executor-cores 1 \ Use as much memory as you can for optimal Spark performance. spark.yarn.am.memory 16G Amount of memory to use for the YARN Application Master in client mode, in the same format as JVM memory strings (e.g. 512m , 2g ). In cluster mode, use spark.driver.memory instead. Use lower-case suffixes, e.g. k , m , g , t , and p , for kibi-, mebi-, gibi-, tebi-, and pebibytes, respectively. spark.driver.memory 16g Amount of memory to use for the driver process, i.e. where SparkContext is initialized. (e.g. 1g , 2g ). Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-memory command line option or in your default properties file. spark.driver.cores 50 Number of cores used by the driver in YARN cluster mode. Since the driver is run in the same JVM as the YARN Application Master in cluster mode, this also controls the cores used by the YARN Application Master. In client mode, use spark.yarn.am.cores to control the number of cores used by the YARN Application Master instead. Up things where you can and run in YARN cluster mode. Good Reference: http://www.slideshare.net/pdx_spark/performance-in-spark-20-pdx-spark-meetup-81816 http://www.slideshare.net/JenAman/rearchitecting-spark-for-performance-understandability-63065166 See the Hortonworks Spark Tuning Guide http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.3/bk_spark-component-guide/content/ch_tuning-spark.html

TimothySpann · ‎12-23-2016

See: http://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.1.1/bk_dataflow-administration/content/clustering.html http://docs.hortonworks.com/HDPDocuments/HDF2/HDF-2.1.1/bk_dataflow-administration/content/state_providers.html Under Cluster Node Properties, set the following: nifi.cluster.is.node - Set this to true. nifi.cluster.node.address - Set this to the fully qualified hostname of the node. If left blank, it defaults to "localhost". nifi.cluster.node.protocol.port - Set this to an open port that is higher than 1024 (anything lower requires root). nifi.cluster.node.protocol.threads - The number of threads that should be used to communicate with other nodes in the cluster. This property defaults to 10, but for large clusters, this value may need to be larger. nifi.zookeeper.connect.string - The Connect String that is needed to connect to Apache ZooKeeper. This is a comma-separted list of hostname:port pairs. For example, localhost:2181,localhost:2182,localhost:2183. This should contain a list of all ZooKeeper instances in the ZooKeeper quorum. nifi.zookeeper.root.node - The root ZNode that should be used in ZooKeeper. ZooKeeper provides a directory-like structure for storing data. Each directory in this structure is referred to as a ZNode. This denotes the root ZNode, or directory, that should be used for storing data. The default value is /root. This is important to set correctly, as which cluster the NiFi instance attempts to join is determined by which ZooKeeper instance it connects to and the ZooKeeper Root Node that is specified. nifi.cluster.flow.election.max.wait.time - Specifies the amount of time to wait before electing a Flow as the "correct" Flow. If the number of Nodes that have voted is equal to the number specified by the nifi.cluster.flow.election.max.candidates property, the cluster will not wait this long. The default is 5 minutes. Note that the time starts as soon as the first vote is cast. nifi.cluster.flow.election.max.candidates - Specifies the number of Nodes required in the cluster to cause early election of Flows. This allows the Nodes in the cluster to avoid having to wait a long time before starting processing if we reach at least this number of nodes in the cluster. Make sure they are all in the same zookeeper, same network and can talk on all ports to each other

TimothySpann · ‎12-23-2016

Email Innovative Exams at examsupport@examslocal.com, or call +1-888-504-9178, +1-312-612-1049 for additional support. When can I expect to receive the results of my exam? Please allow up to 5 business days to receive your exam results via e-mail from Hortonworks. If after 5 business days you still have not received your results please contact Hortonworks directly at certification@hortonworks.com http://hortonworks.com/training/certification/hdp-certified-developer-faq-page/

TimothySpann · ‎12-22-2016

Zeppelin is a good example of one.

Online	Offline
Last Visited	‎05-20-2024 05:42 PM

Member Since	‎01-07-2019 11:58 AM
Last Visited	‎05-20-2024 05:42 PM
Posts	1,973
Kudos received	1122

Cloudera Community

Re: Has anyone tried NiFi consuming (JMSConsume) f...

Re: NiFi Crash after runing chain of lookups

Re: Recommend approach for listening to RSS Feed i...

Re: NiFi ListenFTP Processor Default Data Port

Re: Nifi: Kafka Producer with Avro format in both ...

Re: Postgresql for Ambari not starting

Postgresql for Ambari not starting

Re: How access to Spark Web UI ?

Re: How access to Spark Web UI ?

Re: How to implement "connect By" of ORACLE in Hiv...

Re: Nifi Nodes will not connect to the cluster

Re: RandomForest causing Heap Space error

Re: Nifi Nodes will not connect to the cluster

Re: HDPCA exam result ?

Re: Can we have a long running Spark application w...