About Howchoy

Howchoy · ‎10-08-2018

It has been a while and I believed you already got it working. But for those people want to know how to do it, I am going to show you how I did it. Before I get started, I wanted to let you know I found the informative link by googling "setup Jupyter notebook at Hortonworks sandbox". Based on the link and I made some minor changes, I got it working. ####======================================================== ### login as root ####======================================================== sandbox-version == Sandbox Information == Platform: hdp-security Build date: 06-18-2018 Ambari version: 2.6.2.0-155 Hadoop version: Hadoop 2.7.3.2.6.5.0-292 OS: CentOS Linux release 7.5.1804 (Core) ==== ####======================================================== ### Install Jupyter Dependencies ####======================================================== pip install --ignore-installed pyparsing yum install epel-release sudo wget https://bootstrap.pypa.io/ez_setup.py -O - | python ;sudo yum install python-pip python-wheel python-devel gcc pip install --upgrade pip pip install --upgrade pip wheel pandas numpy scipy scikit-learn matplotlib virtualenv ####======================================================== ### Install Jupyter ####======================================================== pip install jupyter ####======================================================== ### Setup folders and files ####======================================================== jupyter notebook --generate-config sudo mkdir -p /ibm/conf sudo chown -R spark:hadoop /ibm cp ~/.jupyter/jupyter_notebook_config.py /ibm/conf/ ####======================================================== ### Setup startup shell script ####======================================================== vi /ibm/scripts/start_jupyter.sh #copy the paste the following contents #! bin/bash set -x USER=$1 JUPYTER_HOST=sandbox-hdp.hortonworks.com JUPYTER_PORT=8889 su - ${USER} << EOF export SPARK_HOME=/usr/hdp/current/spark-client export PYSPARK_SUBMIT_ARGS="--master yarn-client pyspark-shell" export HADOOP_HOME=/usr/hdp/current/hadoop-client export HADOOP_CONF_DIR=/usr/hdp/current/hadoop-client/conf export PYTHONPATH="/usr/hdp/current/spark-client/python:/usr/hdp/current/spark-client/python/lib/py4j-0.9-src.zip" export PYTHONSTARTUP=/usr/hdp/current/spark-client/python/pyspark/shell.py export PYSPARK_SUBMIT_ARGS="--master yarn-client pyspark-shell" echo "Starting Jupyter daemon on HDP Cluster ..." jupyter notebook --config=/ibm/conf/jupyter_notebook_config.py --ip=${JUPYTER_HOST} --port=${JUPYTER_PORT}& EOF exit 0 ####======================================================== ### Run startup shell script ####======================================================== chown -R spark:hadoop /ibm chmod 777 /ibm/script/start_jupyter.sh cd /ibm/scripts ./start_jupyter.sh spark ####======================================================== ### Copy the link from above step's output and paste to your computer's browser ####======================================================== # make sure you define sandbox.hortonworks.com in your hosts file http://sandbox.hortonworks.com:8889/?token=c982c0f95222abcf2900e3aeb9d9c59cc0386cc04c6c154d Test in Jupyter.

Howchoy · ‎06-29-2017

Hi Rahul Pathak, I followed your instructions to setup Cassandra to my Hortonworks Sandbox (HDP_2.6_vmware_19_04_2017_20_25_43_hdp_ambari_2_5_0_5_1 ) and got an error - Connection failed: [Errno 111] Connection refused to sandbox.hortonworks.com:7000. The following is the steps (login as root): 1) Added file datastax.repo vi /etc/yum.repos.d/datastax.repo [datastax] name = DataStax Repo for Apache Cassandra baseurl = http://rpm.datastax.com/community enabled = 1gpgcheck = 0 2) Install Python requests easy_install-2.6 pip pip install requests 3) Downloaded the Cassandra service folder VERSION=`hdp-select status hadoop-client | sed 's/hadoop-client - $[0-9]\.[0-9]$.*/\1/'` git clone https://github.com/Symantec/ambari-cassandra-service.git /var/lib/ambari-server/resources/stacks/HDP/$VERSION/services/CASSANDRA 4) Restart Ambari service ambari restart 5) Configured Cassandra: 'Add Service' from the 'Actions' dropdown menu in the bottom left of the Ambari dashboard Set seed_provider_parameters_seeds to "sandbox" 6) Restarted VM and Restarted servcies 7) Got a error in Cassandra, see attached pictures. Please shed some light on this issue. Thank you in advance for your reply.

Howchoy · ‎08-10-2016

Thank you.

Howchoy · ‎08-07-2016

I found this link: How to run spark job to interact with secured HBase cluster (https://community.hortonworks.com/articles/48988/how-to-run-spark-job-to-interact-with-secured-hbas.html), followed the instructions to setup and run the somketest, and got the error: Exception in thread "main" java.io.FileNotFoundException: File file:/usr/hdp/current/hbase-client/lib/guava*.jar does not exist. Found the command and example for my original question but need some final touch. Can anybody shed some light on it? I checked my VM HDP_2.4_vmware_v3 and the jar file /usr/hdp/current/hbase-client/lib/guava-12.0.1.jar is there. ./bin/spark-submit --class org.apache.spark.examples.HBaseTest --master yarn-cluster --num-executors 2 --driver-memory 512m --executor-memory 512m --executor-cores 1 --jars /usr/hdp/current/hbase-client/lib/hbase-client.jar,/usr/hdp/current/hbase-client/lib/hbase-common.jar,/usr/hdp/current/hbase-client/lib/hbase-server.jar,/usr/hdp/current/hbase-client/lib/guava*.jar,/usr/hdp/current/hbase-client/lib/hbase-protocol.jar,/usr/hdp/current/hbase-client/lib/htrace-core*.jar --files conf/hbase-site.xml ./lib/spark-examples*.jar ambarismoketest 16/08/07 16:22:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 16/08/07 16:22:13 INFO TimelineClientImpl: Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 16/08/07 16:22:13 INFO RMProxy: Connecting to ResourceManager at sandbox.hortonworks.com/192.168.132.140:8050 16/08/07 16:22:14 WARN DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded. 16/08/07 16:22:15 INFO Client: Requesting a new application from cluster with 1 NodeManagers 16/08/07 16:22:15 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (2250 MB per container) 16/08/07 16:22:15 INFO Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 16/08/07 16:22:15 INFO Client: Setting up container launch context for our AM 16/08/07 16:22:15 INFO Client: Setting up the launch environment for our AM container 16/08/07 16:22:15 INFO Client: Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://sandbox.hortonworks.com:8020/hdp/apps/2.4.0.0-169/spark/spark-hdp-assembly.jar 16/08/07 16:22:15 INFO Client: Preparing resources for our AM container 16/08/07 16:22:15 INFO Client: Using the spark assembly jar on HDFS because you are using HDP, defaultSparkAssembly:hdfs://sandbox.hortonworks.com:8020/hdp/apps/2.4.0.0-169/spark/spark-hdp-assembly.jar 16/08/07 16:22:15 INFO Client: Source and destination file systems are the same. Not copying hdfs://sandbox.hortonworks.com:8020/hdp/apps/2.4.0.0-169/spark/spark-hdp-assembly.jar 16/08/07 16:22:15 INFO Client: Uploading resource file:/usr/hdp/2.4.0.0-169/spark/lib/spark-examples-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar -> hdfs://sandbox.hortonworks.com:8020/user/root/.sparkStaging/application_1470585857897_0001/spark-examples-1.6.0.2.4.0.0-169-hadoop2.7.1.2.4.0.0-169.jar 16/08/07 16:22:18 INFO Client: Uploading resource file:/usr/hdp/current/hbase-client/lib/hbase-client.jar -> hdfs://sandbox.hortonworks.com:8020/user/root/.sparkStaging/application_1470585857897_0001/hbase-client.jar 16/08/07 16:22:18 INFO Client: Uploading resource file:/usr/hdp/current/hbase-client/lib/hbase-common.jar -> hdfs://sandbox.hortonworks.com:8020/user/root/.sparkStaging/application_1470585857897_0001/hbase-common.jar 16/08/07 16:22:18 INFO Client: Uploading resource file:/usr/hdp/current/hbase-client/lib/hbase-server.jar -> hdfs://sandbox.hortonworks.com:8020/user/root/.sparkStaging/application_1470585857897_0001/hbase-server.jar 16/08/07 16:22:18 INFO Client: Uploading resource file:/usr/hdp/current/hbase-client/lib/guava*.jar -> hdfs://sandbox.hortonworks.com:8020/user/root/.sparkStaging/application_1470585857897_0001/guava*.jar 16/08/07 16:22:18 INFO Client: Deleting staging directory .sparkStaging/application_1470585857897_0001 Exception in thread "main" java.io.FileNotFoundException: File file:/usr/hdp/current/hbase-client/lib/guava*.jar does not exist at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:609) at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:822) at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:599) at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:337) at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:289) at org.apache.spark.deploy.yarn.Client.copyFileToRemote(Client.scala:317) at org.apache.spark.deploy.yarn.Client.org$apache$spark$deploy$yarn$Client$distribute$1(Client.scala:407) at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$6$anonfun$apply$3.apply(Client.scala:471) at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$6$anonfun$apply$3.apply(Client.scala:470) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108) at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$6.apply(Client.scala:470) at org.apache.spark.deploy.yarn.Client$anonfun$prepareLocalResources$6.apply(Client.scala:468) at scala.collection.immutable.List.foreach(List.scala:318) at org.apache.spark.deploy.yarn.Client.prepareLocalResources(Client.scala:468) at org.apache.spark.deploy.yarn.Client.createContainerLaunchContext(Client.scala:722) at org.apache.spark.deploy.yarn.Client.submitApplication(Client.scala:142) at org.apache.spark.deploy.yarn.Client.run(Client.scala:1065) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1125) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:731) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Howchoy · ‎08-04-2016

Yes, tried it - 80. Scala but in vain but it's a good book. Thanks.

Howchoy · ‎08-03-2016

Please show me the codes to read HBase table in two ways: a) Spark/Sala RELP, and b) Scala project with build.sbt. Let's say the table DDL and DML is as following from Tutorialspoint: create 'emp', 'personal data', 'professional data' put 'emp','1','personal data:name','raju' put 'emp','1','personal data:city','hyderabad' put 'emp','1','professional data:designation','manager' put 'emp','1','professional data:salary','50000' I have no problems to do it in Java (with Maven); however, I have difficulties to do in Spark/Scala for not finding objects. Can anybody shed some light on it? 09/07/16: I read " SPARK-ON-HBASE: DATAFRAME BASED HBASE CONNECTOR" ( Github ) and saw the parameters for running spark-shell. In addition, I referred the example Scala code in 80.3 of "Apache HBase Reference Guide", I was able to solve it. Please see attached if you want to have a quick start. Also, I am anxious to try HBase Connector with HDP 2.5. Thank you.scala-hortonworks-community.zip

Howchoy · ‎07-09-2016

I have tried 20160212 and 20090211, and only got it working for Distributed cache. I got the jar from below link: https://mvnrepository.com/artifact/org.json/json Below is my command. How do I use -libjars? hadoop jar driver-collection-1.0-SNAPSHOT.jar multipleoutputjson /test/inputjson/json_input.txt /test/outputjs Thank you for your reply.

Howchoy · ‎07-08-2016

I used the method - "Distributed cache" from "Hadoop: Add third-party libraries to MapReduce job" and got it working. The following is my steps: Copied json-20160212.jar to /user/root/lib/json-20160212.jar in hdfs. Added the following codes in Driver class: job.addCacheFile(new Path("/user/root/lib/json-20160212.jar").toUri()); job.setJarByClass(JSONObject.class); Compiled the codes and ran the test. Still very anxious to learn to solve it by using the following methods: Add libjars option Add jar files to Hadoop classpath Create a fat jar Thank you.

Howchoy · ‎07-07-2016

Command to submit the job: hadoop jar driver-collection-1.0-SNAPSHOT.jar multipleoutputjson /test/inputjson/json_input.txt /test/outputjson Error: java.lang.ClassNotFoundException: org.json.JSONObject, see attached.error-message.txt I have below dependency for org.json.JSONObject (filename: json-20160212.jar) in pom.xml: <dependency> <groupId>org.json</groupId> <artifactId>json</artifactId> <version>20160212</version> <scope>compile</scope> </dependency> Code: The codes is pretty much like this. I have tried the following links but in vain. Please advise your insight. Hadoop: Add third-party libraries to MapReduce job How-to: Include Third-Party Libraries in Your MapReduce Job Include Third Party Jars in Hadoop include-third-party-jars-in-hadoop Maven – Create a fat Jar file – One-JAR example

Howchoy · ‎06-05-2016

Finally, I got it. This is what I’ve done to compile (mvn install –DskipTests) without any errors: 1) Open command prompt with administrator 2) Change repo link to: http://repo.hortonworks.com/content/groups/public 3) Require protoc.exe version 2.5.0 4) Require msbuild.exe from Visual Studio 2010 5) Install cmake.exe It should be mentioned in BUILDING.txt.

Online	Offline
Last Visited	‎05-12-2019 11:59 AM

Member Since	‎02-23-2019 10:33 PM
Last Visited	‎05-12-2019 11:59 AM
Posts	29
Kudos received	2

Cloudera Community

Re: Add third-party lib to MapReduce job Error: ja...

Re: Installing Jupyter on sandbox

Re: How can I install Cassandra in HDP 2.4 sandbox...

Re: Read HBase Table by using Spark/Scala

Re: Read HBase Table by using Spark/Scala

Re: Read HBase Table by using Spark/Scala

Read HBase Table by using Spark/Scala

Re: Add third-party lib to MapReduce job Error: ja...

Re: Add third-party lib to MapReduce job Error: ja...

Add third-party lib to MapReduce job Error: java.l...

Re: jetty-util-6.1.26.hwx.jar