About phargis

phargis · ‎08-23-2016

First, you should go to the Apache Spark downloads web page to download Spark 2.0. Link to Spark downloads page: http://spark.apache.org/downloads.html Set your download options (shown in image below), and click on the link next to "Download Spark" (i.e. "spark-2.0.0-bin-hadoop2.7.tgz"): This will download the gzipped tarball to your computer. Next, startup the HDP 2.5 Sandbox image within your virtual machine (either using VirtualBox or VMFusion). Once the image is booted, startup a Terminal session on your laptop and copy the tarball to the VM. Here is an example using the 'scp' (secure copy) command, although you can use any file copy mechanism. scp -p 2222 spark-2.0.0-bin-hadoop2.7.tgz [email protected]:~ This will copy the file to the 'root' user's home directory on the VM. Next, login (via ssh) to the VM: ssh -p 2222 [email protected] Once logged in, unzip the tarball with this command: tar -xvzf spark-2.0.0-bin-hadoop2.7.tgz You can now navigate to the "seed" directory already created for Spark 2.0, and move the contents from the unzipped tar file into the current directory: cd /usr/hdp/current/spark2-client mv ~/spark-2.0.0-bin-hadoop2.7/* . Next, change the ownership of the new files to match the local directory: chown -R root:root * Now, setup the SPARK_HOME environment variable for this session (or permanently by adding it to ~/.bash_profile) export SPARK_HOME=/usr/hdp/current/spark2-client Let's create the config files that we can edit them to configure Spark in the "conf" directory. cd conf cp spark-env.sh.template spark-env.sh cp spark-defaults.conf.template spark-defaults.conf Edit the config files with a text editor (like vi or vim), and make sure the following environment variables and/or parameters are set below. Add the following lines to the file 'spark-env.sh' and then save the file: HADOOP_CONF_DIR=/etc/hadoop/conf SPARK_EXECUTOR_INSTANCES=2 SPARK_EXECUTOR_CORES=1 SPARK_EXECUTOR_MEMORY=512M SPARK_DRIVER_MEMORY=512M Now, replace the lines in the "spark-defaults.conf" file to match this content, and then save the file: spark.driver.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native spark.executor.extraLibraryPath /usr/hdp/current/hadoop-client/lib/native spark.driver.extraJavaOptions -Dhdp.version=2.5.0.0-817 spark.yarn.am.extraJavaOptions -Dhdp.version=2.5.0.0-817 spark.eventLog.dir hdfs:///spark-history spark.eventLog.enabled true # Required: setting this parameter to 'false' turns off ATS timeline server for Spark spark.hadoop.yarn.timeline-service.enabled false #spark.history.fs.logDirectory hdfs:///spark-history #spark.history.kerberos.keytab none #spark.history.kerberos.principal none #spark.history.provider org.apache.spark.deploy.history.FsHistoryProvider #spark.history.ui.port 18080 spark.yarn.containerLauncherMaxThreads 25 spark.yarn.driver.memoryOverhead 200 spark.yarn.executor.memoryOverhead 200 #spark.yarn.historyServer.address sandbox.hortonworks.com:18080 spark.yarn.max.executor.failures 3 spark.yarn.preserve.staging.files false spark.yarn.queue default spark.yarn.scheduler.heartbeat.interval-ms 5000 spark.yarn.submit.file.replication 3 spark.ui.port 4041 Now that your config files are setup, change directory back to your $SPARK_HOME: cd /usr/hdp/current/spark2-client Before running a Spark application, you need to change 2 YARN settings to enable Yarn to allocate enough memory to run the jobs on the Sandbox. To change the Yarn settings, login to the Ambari console (http://127.0.0.1:8080/), and click on the "YARN" service along the left-hand side of the screen. Once the YARN Summary page is drawn, find the "Config" tab along top and click on it. Scroll down and you will see the "Settings" (not Advanced). Change the settings described below: Note: Use the Edit/pencil icon to set each parameter to the exact values 1) Memory Node (Memory allocated for all YARN containers on a node) = 7800MB 2) Container (Maximum Container Size (Memory)) = 2500MB Alternately, if you click the "Advanced" tab next to Settings, here are the exact config parameter names you want to edit: yarn.scheduler.maximum-allocation-mb = 2500MB yarn.nodemanager.resource.memory-mb = 7800MB After editing these parameters, click on the green "Save" button above the settings in Ambari. You will now need to Restart all affected services (Note: a yellow "Restart" icon should show up once the config settings are saved by Ambari; you can click on that button and select "Restart all affected services"). It may be faster to navigate to the Hosts page via the Tab, click on the single host, and look for the "Restart" button there. Make sure that YARN is restarted successfully. Below is an image showing the new YARN settings: Finally, you are ready to run the packaged SparkPi example using Spark 2.0. In order to run SparkPi on YARN (yarn-client mode), run the command below, which switches user to "spark" and uses spark-submit to launch the precompiled SparkPi example program: su spark --command "bin/spark-submit --class org.apache.spark.examples.SparkPi --master yarn --deploy-mode client --driver-memory 2g --executor-memory 2g --executor-cores 1 examples/jars/spark-examples*.jar 10" You should see lots of lines of debug/stderr output, followed by a results line similar to this: Pi is roughly 3.144799144799145 Note: To run the SparkPi example in standalone mode, without the use of YARN, you can run this command: ./bin/run-example SparkPi

phargis · ‎08-16-2016

Apache Zeppelin (version 0.6.0) includes the ability to securely authenticate users and require logins. It uses the Apache Shiro security framework to accomplish this objective. Note: prior versions of Zeppelin did not force users to login. After launching the HDP 2.5 Tech Preview Sandbox on a virtual machine, make sure the Zeppelin service is up and running via Ambari. Next, open the Zeppelin UI either by clicking on: Services (tab) -> Zeppelin notebook (left-hand panel) -> Quick Links (tab) -> "Zeppelin UI" (button) or just by opening a browser at: http://sandbox.hortonworks.com:9995/ (or http://127.0.0.1:9995/) The Zeppelin welcome page should show in the browser, and you should notice a "Login" button in the upper right-hand corner. This will bring up a pop-up window with text entries for username and password. Enter one of the username/password pairs below (these are the defaults listed in the "shiro.ini" file located in the "conf" sub-directory of zeppelin): Username/Password pairs: admin/password1 user1/password2 user2/password3 user3/password4 If you want to change these passwords or add more users, you can use the "Credentials" tab of the Zeppelin notebook to create additional usernames. After entering the credentials, you will be logged in and the existing notebooks will display on the left-hand side of the Zeppelin screen. If you enter the wrong username or password, you will be directed back to the Welcome page. FYI: For more information about Zeppelin security, see this link: https://github.com/apache/zeppelin/blob/master/SECURITY-README.md FYI: For more detailed information about Apache Shiro configuration options, see this link: http://shiro.apache.org/configuration.html#Configuration-INISections

phargis · ‎08-09-2016

Just a few months ago, Apache Storm announced release 1.0 for the distribution. The bullet points below summarize the new features available. For more detailed descriptions, you can go to this link to read the full release notes: http://storm.apache.org/2016/04/12/storm100-released.html Apache Storm 1.0 Release Apache Storm 1.0 is *up to 16 times faster than previous versions, with latency reduced up to 60%.” Pacemaker – Heartbeat Server Pacemaker is an optional Storm daemon designed to process heartbeats from workers. (overcomes scaling problems of Zookeeper) Distributed Cache API Files in the distributed cache can be updated at any time from the command line, without the need to redeploy a topology. HA Nimbus Multiple instances of the Nimbus service run in a cluster and perform leader election when a Nimbus node fails Native Streaming Window API Storm has support for sliding and tumbling windows based on time duration and/or event count. Automatic Backpressure Storm now has an automatic backpressure mechanism based on configurable high/low watermarks expressed as a percentage of a task's buffer size. Resource Aware Scheduler The new resources aware scheduler (AKA "RAS Scheduler") allows users to specify the memory and CPU requirements for individual topology components Storm makes it easier to debug, with… Dynamic Log Levels Tuple Sampling and Debugging Dynamic Worker Profiling

phargis · ‎06-27-2016

The referenced JIRA above is now resolved. I have successfully tested the new version of the Hive ODBC Driver on Mac OSX version 10.11 (El Capitan). However, please note that you must install the new Hive ODBC driver version 2.1.2 as shown through the iODBC Administration tool Please also note that the location of the driver file has changed. Here is the new odbcinst.ini file (stored in ~/.odbcinst.ini), showing the old driver location commented out and the new driver location below it: [ODBC Drivers] Hortonworks Hive ODBC Driver=Installed [Hortonworks Hive ODBC Driver] Description=Hortonworks Hive ODBC Driver ; old driver location ; Driver=/usr/lib/hive/lib/native/universal/libhortonworkshiveodbc.dylib ; new driver location below Driver=/opt/hortonworks/hiveodbc/lib/universal/libhortonworkshiveodbc.dylib

phargis · ‎03-31-2016

One caveat: In case you reboot (reset) your VM/Sandbox, you should enable 'ntpd' daemon to start on bootup. I had trouble with GetTwitter as mentioned in the post above, even after following the steps to add ntpd and enable it. However, in the meantime, I had to reboot, which turned it off. To enable it on system bootup, run this command: chkconfig ntpd on To make sure it was effective, you can run this command to make sure 'ntpd' is enabled in the run modes (2,3,4,5): chkconfig --list | grep ntpd

Online	Offline
Last Visited	‎10-04-2016 10:20 PM

Member Since	‎09-24-2015 01:55 PM
Last Visited	‎10-04-2016 10:20 PM
Posts	98
Kudos received	70

Cloudera Community

How to install and run Spark 2.0 on HDP 2.5 Sandbo...

How do I login to Zeppelin when Security is enable...

What enhancements does the Apache Storm Release 1....

Re: Hive ODBC Driver on OSX 10.11 (El Capitan)

Re: Sample HDF/NiFi flow to Push Tweets into Solr/...