About dkozlowski

dkozlowski · ‎02-06-2018

@Lekya Goriparti Have a look at this: https://community.hortonworks.com/questions/26622/the-node-hbase-is-not-in-zookeeper-it-should-have.html

dkozlowski · ‎11-09-2017

1. Introduction This article is an extension of the one created by @Dan Zaratsian - H2O on Livy 2. Environment Details Here are the environment details I did test it: HDP: 2.6.1 Ambari: 2.5.0.3 OS: 7.3.1611 python: 2.7.5 IMPORTANT NOTE: H2O requires python ver. 2.7+ 3. Installing H2O Go to Zeppelin node and do the following: $ mkdir /tmp/H2O $ cd /tmp/H2O $ wget http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.1/16/sparkling-water-2.1.16.zip $ unzip sparkling-water-2.1.16.zip 4. Testing H2O from CLI Go to Zeppelin node where you downloaded and installed H2O $ export SPARK_HOME='/usr/hdp/current/spark2-client' $ export HADOOP_CONF_DIR=/etc/hadoop/conf $ export MASTER="yarn-client" $ export SPARK_MAJOR_VERSION=2 $ cd /tmp/H2O/sparkling-water-2.1.16/bin $ ./pysparkling >>> from pysparkling import * >>> hc = H2OContext.getOrCreate(spark) My test [root@dkozlowski-dkhdp262 bin]# ./pysparkling Python 2.7.5 (default, Jun 17 2014, 18:11:42) [GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/spark2/jars/spark-llap_2.11-1.1.3-2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] 17/11/08 14:33:06 WARN HiveConf: HiveConf of name hive.llap.daemon.service.hosts does not exist 17/11/08 14:33:06 WARN HiveConf: HiveConf of name hive.llap.daemon.service.hosts does not exist Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.1.1.2.6.1.0-129 /_/ Using Python version 2.7.5 (default, Jun 17 2014 18:11:42) SparkSession available as 'spark'. >>> from pysparkling import * >>> hc = H2OContext.getOrCreate(spark) 17/11/08 14:33:57 WARN H2OContext: Method H2OContext.getOrCreate with an argument of type SparkContext is deprecated and parameter of type SparkSession is preferred. 17/11/08 14:33:57 WARN InternalH2OBackend: Increasing 'spark.locality.wait' to value 30000 17/11/08 14:33:57 WARN InternalH2OBackend: Due to non-deterministic behavior of Spark broadcast-based joins We recommend to disable them by configuring `spark.sql.autoBroadcastJoinThreshold` variable to value `-1`: sqlContext.sql("SET spark.sql.autoBroadcastJoinThreshold=-1") 17/11/08 14:33:57 WARN InternalH2OBackend: The property 'spark.scheduler.minRegisteredResourcesRatio' is not specified! We recommend to pass `--conf spark.scheduler.minRegisteredResourcesRatio=1` Connecting to H2O server at http://172.26.110.84:54323. successful. -------------------------- --------------------------------------------------- H2O cluster uptime: 25 secs H2O cluster version: 3.14.0.7 H2O cluster version age: 18 days H2O cluster name: sparkling-water-root_application_1507531306616_0032 H2O cluster total nodes: 2 H2O cluster free memory: 1.693 Gb H2O cluster total cores: 8 H2O cluster allowed cores: 8 H2O cluster status: accepting new members, healthy H2O connection url: http://172.26.110.84:54323 H2O connection proxy: H2O internal security: False H2O API Extensions: XGBoost, Algos, AutoML, Core V3, Core V4 Python version: 2.7.5 final -------------------------- --------------------------------------------------- Sparkling Water Context: * H2O name: sparkling-water-root_application_1507531306616_0032 * cluster size: 2 * list of used nodes: (executorId, host, port) ------------------------ (2,dkhdp262.openstacklocal,54321) (1,dkhdp263.openstacklocal,54321) ------------------------ Open H2O Flow in browser: http://172.26.110.84:54323 (CMD + click in Mac OSX) 5. Zeppelin site Before following up the below steps ensure point 4. Testing H2O from CLI runs successfully a) Ambari UI Ambari -> Zeppelin -> Configs -> Advanced zeppelin-env -> zeppelin_env_template export SPARK_SUBMIT_OPTIONS="--files /tmp/H2O/sparkling-water-2.1.16/py/build/dist/h2o_pysparkling_2.1-2.1.16.zip" export PYTHONPATH="/tmp/H2O/sparkling-water-2.1.16/py/build/dist/h2o_pysparkling_2.1-2.1.16.zip:${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip" b) Zeppelin UI Zeppelin UI -> Interpreter -> Spark2 (H2O does not work with dynamicAllocation enabled) spark.dynamicAllocation.enabled=<blank> spark.shuffle.service.enabled=<blank> c) Sample code %pyspark from pysparkling import * hc = H2OContext.getOrCreate(spark) %pyspark import h2o from h2o.estimators.gbm import H2OGradientBoostingEstimator from h2o.grid.grid_search import H2OGridSearch import sys sys.stdout.isatty = lambda : False sys.stdout.encoding = None training_data = h2o.import_file("hdfs://dkhdp261:8020/tmp/test1.csv") training_data.show()

dkozlowski · ‎09-08-2017

@Vijay Kiran Do not raise issues within the Article. Just create a separate HCC providing details of your JDBC interpreter as well as Credentials. NOTE: The above was tested on HDP 2.6.1 only. And you are on 2.6.0.3 if I am right.

dkozlowski · ‎07-31-2017

Problem: I have been trying to configure Zeppelin's JDBC interpreter to work with our Phoenix servers, but am getting an error when running queries. The JDBC interpreter works fine for Hive and MySQL. Running this: %jdbc(phoenix) select * from <table> I am getting org.apache.zeppelin.interpreter.InterpreterException: null org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=1, exceptions: Thu Jul 20 08:27:49 BST 2017, RpcRetryingCaller{globalStartTime=1500535669736, pause=100, retries=1}, org.apache.hadoop.hbase.MasterNotRunningException: com.google.protobuf.ServiceException: java.io.IOException: Broken pipe at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnection(JDBCInterpreter.java:416) at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:564) at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:692) at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94) at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:489) at org.apache.zeppelin.scheduler.Job.run(Job.java:175) at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Root cause: This problem is very likely caused by the having done the upgrade from HDP 2.5 to 2.6. There was an environment variable missing in zeppelin_env: ZEPPELIN_INTP_CLASSPATH_OVERRIDES. Solution: a) - mv /etc/zeppelin/conf/interpreter.json /etc/zeppelin/conf/interpreter_bckup.json - restart zeppelin b) - go to Ambari UI -> Zeppelin -> Configs -> Advanced zeppelin-env -> zeppelin-env_content - add export ZEPPELIN_INTP_CLASSPATH_OVERRIDES="/etc/zeppelin/conf/external-dependency-conf" just above #### Spark interpreter configuration #### - save the change and restart all required

dkozlowski · ‎07-19-2017

PROBLEM I have a non kerberized cluster. I did apply https://community.hortonworks.com/articles/81910/how-to-enable-user-impersonation-for-jdbc-interpre.html however the jdbc intepreter is still not impersonated. NOTE: From HDP 2.6.2 (Zeppelin 0.7.2) JDBC interpreter on non-kerberised cluster is impersonated by having the following property added into Zeppelin UI -> Interpreter -> JDBC config: hive.proxy.user.property=hive.server2.proxy.user SOLUTION 1. Go to Zeppelin UI -> Interpreter The configuration for JDBC hive interpreter should look like 2. Edit JDBC interpreter Remove properties hive.user and hive.password and save the changes. So, now the configuration looks like 3. Go to Zeppelin UI -> Credential Add the credentials for the user like Entity: jdbc.jdbc Username: <username> Password: <password> 4. Run the query Go to your notebook and run the jdbc query for hive. In RM UI this query is now running by YOU

dkozlowski · ‎07-13-2017

@Miles Yao Good catch!!! Just updated. The phoenix jar is here to work with JDBC interpreter rather than spark.

dkozlowski · ‎06-05-2017

ENVIRONMENT HDP-2.6.0.3 Ambari 2.5.0.3 SOLUTION 1. Install R on each DN $ yum install R-devel libcurl-devel openssl-devel 2. Run on each DN $ R > install.packages("knitr") 3. Test R from CLI [root@dghdp255 ~]# R -e "print(1+1)" R version 3.3.3 (2017-03-06) -- "Another Canoe" Copyright (C) 2017 The R Foundation for Statistical Computing Platform: x86_64-redhat-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > print(1+1) [1] 2 > > [root@dghdp255 ~]# 4. Zeppelin UI a) spark2 config SPARK_HOME /usr/hdp/current/spark2-client/ args master yarn-client spark.app.name Zeppelin spark.cores.max spark.executor.memory spark.yarn.keytab /etc/security/keytabs/zeppelin.server.kerberos.keytab spark.yarn.principal [email protected] zeppelin.R.cmd R zeppelin.R.image.width 100% zeppelin.R.knitr true zeppelin.R.render.options out.format = 'html', comment = NA, echo = FALSE, results = 'asis', message = F, warning = F zeppelin.dep.additionalRemoteRepository spark-packages,http://dl.bintray.com/spark-packages/maven,false; zeppelin.dep.localrepo local-repo zeppelin.interpreter.localRepo /usr/hdp/current/zeppelin-server/local-repo/2CHXWU7YZ zeppelin.pyspark.python python zeppelin.spark.concurrentSQL false zeppelin.spark.importImplicit true zeppelin.spark.maxResult 1000 zeppelin.spark.printREPLOutput true zeppelin.spark.sql.stacktrace false zeppelin.spark.useHiveContext true b) test R from zeppelin UI c) create a test CSV file on the OS (zeppelin node) [root@dghdp254 ~]# ls -lrt /tmp/updated.csv -rw-r--r--. 1 root root 1326 Jun 6 07:07 /tmp/test.csv d) check reading the file from R CLI [root@dghdp254 ~]# R R version 3.3.3 (2017-03-06) -- "Another Canoe" Copyright (C) 2017 The R Foundation for Statistical Computing Platform: x86_64-redhat-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > a<-read.csv("/tmp/test.csv") > print(a) [1] Test.File <0 rows> (or 0-length row.names) > e) restart spark2 interpreter and run the below %spark2.r a<-read.csv("/tmp/test.csv") print(a)

dkozlowski · ‎05-26-2017

I. Environment details - HDP 2.5.x - HDP 2.6.x - Kerberos enabled II. Steps to follow a) JDBC interpreter Set JDBC interpreter in Zeppelin UI like - JDBC interpreter config phoenix.driver org.apache.phoenix.jdbc.PhoenixDriver phoenix.hbase.client.retries.number 1 phoenix.password phoenix.url jdbc:phoenix:dkhdp262.openstacklocal,dkhdp261.openstacklocal,dkhdp263.openstacklocal:/hbase-secure phoenix.user phoenixuser zeppelin.jdbc.auth.type KERBEROS zeppelin.jdbc.keytab.location /etc/security/keytabs/zeppelin.server.kerberos.keytab zeppelin.jdbc.principal [email protected] ARTIFACTS /usr/hdp/current/phoenix-client/phoenix-client.jar b) zeppelin notebook

dkozlowski · ‎05-18-2017

ENVIRONMENT This problem has been replicated and fixed on HDP 2.5.x PROBLEM I have followed up the https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_zeppelin-component-guide/content/zepp-with-spark.html to add additional jar for livy - copy jar file into /usr/hdp/<version>/livy/repl-jars folder. Restarted livy service afterwards. When running: %livy import de.xxx.statistik.program.model.LineModel I am getting: <console>:25: error: not found: value de import de.xxx.statistik.program.model.LineModel SOLUTION Added the following into Ambari -> MapReduce2 -> mapreduce.application.classpath - :$PWD/* Before Change $PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure After Change $PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:$PWD/*

dkozlowski · ‎05-10-2017

@azelmad zakaria As this is an article, raise a separate question in HCC, refer to this one and provide the full stack trace from your console

Online	Offline
Last Visited	‎02-06-2018 06:34 AM

Member Since	‎03-25-2016 06:26 AM
Last Visited	‎02-06-2018 06:34 AM
Posts	142
Kudos received	48

Cloudera Community

Re: Looking for a sample python code for Spark-On-...

Running H2O Sparkling Water using Zeppelin (with p...

Re: How to enable user impersonation for JDBC inte...

Running jdbc intepreter with phoenix causes Broken...

How to enable user impersonation for JDBC interpre...

Re: Enable phoenix access from Zeppelin in secure ...

How to get spark2.r working with CSV file in Kerbe...

Enable phoenix access from Zeppelin in secure clus...

Added external package to livy causes "console:25"...

Re: Looking for a sample python code for Spark-On-...