About rhryniewicz

rhryniewicz · ‎04-21-2017

Does it work in Chrome or Firefox?

rhryniewicz · ‎11-06-2016

Hi Anindya, generally all the tutorials and corresponding labs are for the latest sandbox, currently HDP 2.5. Two options: 1. Download the latest sandbox HDP 2.5 2. Use a similar notebook in the main Zeppelin notebook list: Lab 201: Intro to Machine Learning with Spark You can find other Zeppelin notebooks here: https://github.com/hortonworks-gallery/zeppelin-notebooks/tree/master And in the future each version of Sandbox will have a corresponding branch. E.g. there's HDP 2.5 branch now, so there won't be compatibility issues down the road with newer versions of Zeppelin on older Sandboxes.

rhryniewicz · ‎10-03-2016

Make sure you are running the latest HDP 2.5 Sandbox. I've just tested it and I had no "prefix not found" related issues.

rhryniewicz · ‎06-12-2016

Updated tutorial: 1) using centos-release-scl 2) wget https://bootstrap.pypa.io/ez_setup.py Thanks!

rhryniewicz · ‎06-09-2016

Checkout latest blog on HBase connector: http://hortonworks.com/blog/spark-hbase-dataframe-based-hbase-connector/

rhryniewicz · ‎03-17-2016

Sridhar, as long as you're using Spark 1.6 I'd refer to https://spark.apache.org/docs/1.6.1/sql-programming-guide.html

rhryniewicz · ‎03-16-2016

Hi Sridhar, can you post what version of Spark you are running and a link to the documentation you're referring to?

rhryniewicz · ‎03-05-2016

Requirements HDP 2.3.x cluster, whether it is a multi-node cluster or a single-node HDP Sandbox. Installing The Spark 1.6 Technical Preview is provided in RPM and DEB package formats. The following instructions assume RPM packaging: Download the Spark 1.6 RPM repository: wget -nv http://private-repo-1.hortonworks.com/HDP/centos6/2.x/updates/2.3.4.1-10/hdp.repo -O /etc/yum.repos.d/HDP-TP.repo For installing on Ubuntu use the following: http://private-repo-1.hortonworks.com/HDP/ubuntu12/2.x/updates/2.3.4.1-10/hdp.list Install the Spark Package: Download the Spark 1.6 RPM (and pySpark, if desired) and set it up on your HDP 2.3 cluster: yum install spark_2_3_4_1_10-master -y If you want to use pySpark, install it as follows and make sure that Python is installed on all nodes. yum install spark_2_3_4_1_10-python -y The RPM installer will also download core Hadoop dependencies. It will create “spark” as an OS user, and it will create the /user/spark directory in HDFS. Set JAVA_HOME and SPARK_HOME: Make sure that you set JAVA_HOME before you launch the Spark Shell or thrift server. export JAVA_HOME=<path to JDK 1.8> The Spark install creates the directory where Spark binaries are unpacked (/usr/hdp/2.3.4.1-10/spark). Set the SPARK_HOME variable to this directory: export SPARK_HOME=/usr/hdp/2.3.4.1-10/spark/ Create hive-site in the Spark conf directory: As user root, create the file SPARK_HOME/conf/hive-site.xml. Edit the file to contain only the following configuration setting: <configuration><property><name>hive.metastore.uris</name>  <value>thrift://sandbox.hortonworks.com:9083</value><description>URI for client to contact metastore server</description></property></configuration> Run the Spark Pi Example To test compute-intensive tasks in Spark, the Pi example calculates pi by “throwing darts” at a circle — it generates points in the unit square ((0,0) to (1,1)) and counts how many points fall within the unit circle within the square. The result approximates pi/4, which is used to estimate Pi. Change to your Spark directory and switch to the spark OS user: cd $SPARK_HOME su spark Run the Spark Pi example in yarn-client mode: ./bin/spark-submit --class org.apache.spark.examples.SparkPi--master yarn-client --num-executors 3--driver-memory 512m--executor-memory 512m--executor-cores 1 lib/spark-examples*.jar 10 Note: The Pi job should complete without any failure messages. It should produce output similar to the following. Note the value of pi near the end of the output. 15/12/1613:21:05 INFO DAGScheduler:Job0 finished: reduce at SparkPi.scala:36, took 4.313782 s Piis roughly 3.13949215/12/1613:21:05 INFO ContextHandler: stopped o.s.j.s.ServletContextHandler{/metrics/json,null}

rhryniewicz · ‎03-05-2016

Copy, paste and run the following code: val data = Array(1, 2, 3, 4, 5) // create Array of Integers val dataRDD = sc.parallelize(data) // create an RDD val dataDF = dataRDD.toDF() // convert RDD to DataFrame dataDF.write.parquet("data.parquet") // write to parquet val newDataDF = sqlContext. read.parquet("data.parquet") // read back parquet to DF newDataDF.show() // show contents If you run this code in a Zeppelin notebook you will see the following output data: data: Array[Int] = Array(1, 2, 3, 4, 5) dataRDD: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[0] at parallelize at <console>:31 dataDF: org.apache.spark.sql.DataFrame = [_1: int] newDataDF: org.apache.spark.sql.DataFrame = [_1: int] +---+ | _1| +---+ | 1| | 2| | 3| | 4| | 5| +---+

rhryniewicz · ‎03-03-2016

Grab the latest HDP 2.4 Sandbox. It comes with Spark 1.6 & the python interpreter works in Zeppelin. Also, see hortonworks.com/hadoop-tutorial/hands-on-tour-of-apache-spark-in-5-minutes/ where pyspark interpreter is used.

Online	Offline
Last Visited	‎08-22-2022 05:50 PM

Member Since	‎08-13-2019 11:08 AM
Last Visited	‎08-22-2022 05:50 PM
Posts	47
Kudos received	39

Cloudera Community

Re: Trouble Importing JSON for INTRO TO MACHINE LE...

Re: %jdbc(hive) prefix not found in Zeppelin

Re: SQLContext Error - CreateSchemaRDD

Re: Error starting Zeppelin Notebook in Sandbox -

Re: Trouble Importing JSON for INTRO TO MACHINE LE...

Re: %jdbc(hive) prefix not found in Zeppelin

Re: Error (No package python27 available.) while s...

Re: HDP Spark HBase Connector

Re: SQLContext Error - CreateSchemaRDD

Re: SQLContext Error - CreateSchemaRDD

Installing Spark 1.6 on HDP 2.3.x

Write / Read Parquet File in Spark

Re: Can't get Pyspark interpreter to work on Zeppe...