Member since
03-25-2016
142
Posts
48
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
5774 | 06-13-2017 05:15 AM | |
1903 | 05-16-2017 05:20 AM | |
1342 | 03-06-2017 11:20 AM | |
7855 | 02-23-2017 06:59 AM | |
2223 | 02-20-2017 02:19 PM |
07-28-2020
11:31 PM
"I highly recommend skimming quickly over following slides, specially starting from slide 7. http://www.slideshare.net/Hadoop_Summit/w-235phall1pandey" This slide is not there at the path
... View more
10-02-2019
06:18 AM
@lvazquez maybe you can directly execute a "kinit" to submit your user's credentials to your LDAP I manage to authenticate users from AD while the cluster is kerberorized through a FreeIPA Server. This is a command sample: %sh
echo "password" | kinit foo@hortonworks.local
hdfs dfs -ls /
Found 12 items
drwxrwxrwt - yarn hadoop 0 2019-10-02 13:53 /app-logs
drwxr-xr-x - hdfs hdfs 0 2019-10-01 15:27 /apps
drwxr-xr-x - yarn hadoop 0 2019-10-01 14:06 /ats
drwxr-xr-x - hdfs hdfs 0 2019-10-01 14:08 /atsv2
drwxr-xr-x - hdfs hdfs 0 2019-10-01 14:06 /hdp
drwx------ - livy hdfs 0 2019-10-02 11:35 /livy2-recovery
drwxr-xr-x - mapred hdfs 0 2019-10-01 14:06 /mapred
drwxrwxrwx - mapred hadoop 0 2019-10-01 14:08 /mr-history
drwxrwxrwx - spark hadoop 0 2019-10-02 15:08 /spark2-history
drwxrwxrwx - hdfs hdfs 0 2019-10-01 15:31 /tmp
drwxr-xr-x - hdfs hdfs 0 2019-10-02 14:23 /user
drwxr-xr-x - hdfs hdfs 0 2019-10-01 15:14 /warehouse I think this way is really ugly but at least, it is possible. Do not forget to change in your hdfs-site file the auth_to_local RULE:[1:$1@$0](.*@HORTONWORKS.LOCAL)s/@.*//
RULE:[1:$1@$0](.*@IPA.HORTONWORKS.LOCAL)s/@.*//
... View more
11-09-2017
03:15 PM
1 Kudo
1. Introduction This article is an extension of the one created by @Dan Zaratsian - H2O on Livy 2. Environment Details Here are the environment details I did test it: HDP: 2.6.1 Ambari: 2.5.0.3 OS: 7.3.1611 python: 2.7.5 IMPORTANT NOTE: H2O requires python ver. 2.7+ 3. Installing H2O Go to Zeppelin node and do the following: $ mkdir /tmp/H2O
$ cd /tmp/H2O
$ wget http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.1/16/sparkling-water-2.1.16.zip
$ unzip sparkling-water-2.1.16.zip
4. Testing H2O from CLI Go to Zeppelin node where you downloaded and installed H2O $ export SPARK_HOME='/usr/hdp/current/spark2-client'
$ export HADOOP_CONF_DIR=/etc/hadoop/conf
$ export MASTER="yarn-client"
$ export SPARK_MAJOR_VERSION=2
$ cd /tmp/H2O/sparkling-water-2.1.16/bin
$ ./pysparkling
>>> from pysparkling import *
>>> hc = H2OContext.getOrCreate(spark)
My test [root@dkozlowski-dkhdp262 bin]# ./pysparkling
Python 2.7.5 (default, Jun 17 2014, 18:11:42)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/spark2/jars/spark-llap_2.11-1.1.3-2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/11/08 14:33:06 WARN HiveConf: HiveConf of name hive.llap.daemon.service.hosts does not exist
17/11/08 14:33:06 WARN HiveConf: HiveConf of name hive.llap.daemon.service.hosts does not exist
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.1.1.2.6.1.0-129
/_/
Using Python version 2.7.5 (default, Jun 17 2014 18:11:42)
SparkSession available as 'spark'.
>>> from pysparkling import *
>>> hc = H2OContext.getOrCreate(spark)
17/11/08 14:33:57 WARN H2OContext: Method H2OContext.getOrCreate with an argument of type SparkContext is deprecated and parameter of type SparkSession is preferred.
17/11/08 14:33:57 WARN InternalH2OBackend: Increasing 'spark.locality.wait' to value 30000
17/11/08 14:33:57 WARN InternalH2OBackend: Due to non-deterministic behavior of Spark broadcast-based joins
We recommend to disable them by
configuring `spark.sql.autoBroadcastJoinThreshold` variable to value `-1`:
sqlContext.sql("SET spark.sql.autoBroadcastJoinThreshold=-1")
17/11/08 14:33:57 WARN InternalH2OBackend: The property 'spark.scheduler.minRegisteredResourcesRatio' is not specified!
We recommend to pass `--conf spark.scheduler.minRegisteredResourcesRatio=1`
Connecting to H2O server at http://172.26.110.84:54323. successful.
-------------------------- ---------------------------------------------------
H2O cluster uptime: 25 secs
H2O cluster version: 3.14.0.7
H2O cluster version age: 18 days
H2O cluster name: sparkling-water-root_application_1507531306616_0032
H2O cluster total nodes: 2
H2O cluster free memory: 1.693 Gb
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster status: accepting new members, healthy
H2O connection url: http://172.26.110.84:54323
H2O connection proxy:
H2O internal security: False
H2O API Extensions: XGBoost, Algos, AutoML, Core V3, Core V4
Python version: 2.7.5 final
-------------------------- ---------------------------------------------------
Sparkling Water Context:
* H2O name: sparkling-water-root_application_1507531306616_0032
* cluster size: 2
* list of used nodes:
(executorId, host, port)
------------------------
(2,dkhdp262.openstacklocal,54321)
(1,dkhdp263.openstacklocal,54321)
------------------------
Open H2O Flow in browser: http://172.26.110.84:54323 (CMD + click in Mac OSX)
5. Zeppelin site Before following up the below steps ensure point 4. Testing H2O from CLI runs successfully a) Ambari UI Ambari -> Zeppelin -> Configs -> Advanced zeppelin-env -> zeppelin_env_template export SPARK_SUBMIT_OPTIONS="--files /tmp/H2O/sparkling-water-2.1.16/py/build/dist/h2o_pysparkling_2.1-2.1.16.zip"
export PYTHONPATH="/tmp/H2O/sparkling-water-2.1.16/py/build/dist/h2o_pysparkling_2.1-2.1.16.zip:${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip" b) Zeppelin UI Zeppelin UI -> Interpreter -> Spark2 (H2O does not work with dynamicAllocation enabled) spark.dynamicAllocation.enabled=<blank>
spark.shuffle.service.enabled=<blank>
c) Sample code %pyspark
from pysparkling import *
hc = H2OContext.getOrCreate(spark) %pyspark
import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.grid.grid_search import H2OGridSearch
import sys
sys.stdout.isatty = lambda : False
sys.stdout.encoding = None
training_data = h2o.import_file("hdfs://dkhdp261:8020/tmp/test1.csv")
training_data.show()
... View more
Labels:
07-31-2017
06:00 AM
Problem: I have been trying to configure Zeppelin's JDBC interpreter to work with our Phoenix servers, but am getting an error when running queries. The JDBC interpreter works fine for Hive and MySQL. Running this: %jdbc(phoenix)
select * from <table>
I am getting org.apache.zeppelin.interpreter.InterpreterException: null
org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=1, exceptions:
Thu Jul 20 08:27:49 BST 2017, RpcRetryingCaller{globalStartTime=1500535669736, pause=100, retries=1}, org.apache.hadoop.hbase.MasterNotRunningException: com.google.protobuf.ServiceException: java.io.IOException: Broken pipe
at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnection(JDBCInterpreter.java:416)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:564)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:692)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:489)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Root cause: This problem is very likely caused by the having done the upgrade from HDP 2.5 to 2.6. There was an environment variable missing in zeppelin_env: ZEPPELIN_INTP_CLASSPATH_OVERRIDES. Solution: a) - mv /etc/zeppelin/conf/interpreter.json /etc/zeppelin/conf/interpreter_bckup.json
- restart zeppelin b) - go to Ambari UI -> Zeppelin -> Configs -> Advanced zeppelin-env -> zeppelin-env_content
- add
export ZEPPELIN_INTP_CLASSPATH_OVERRIDES="/etc/zeppelin/conf/external-dependency-conf" just above
#### Spark interpreter configuration #### - save the change and restart all required
... View more
Labels:
07-26-2018
05:47 AM
Everyone following this article. Make sure value is set for property zeppelin.jdbc.auth.type in jdbc interpreter either as SIMPLE or KERBEROS. In my case, impersonation did not happen properly when the property has null value, and I changed the value to SIMPLE.
... View more
07-13-2017
03:06 PM
Hi Abraham, Spark interpreter is not impersonated. Uncheck <User Impersonate>, restart the interpreter and have another try.
... View more
06-05-2017
01:13 PM
2 Kudos
ENVIRONMENT
HDP-2.6.0.3 Ambari 2.5.0.3
SOLUTION 1. Install R on each DN $ yum install R-devel libcurl-devel openssl-devel 2. Run on each DN $ R
> install.packages("knitr")
3. Test R from CLI [root@dghdp255 ~]# R -e "print(1+1)"
R version 3.3.3 (2017-03-06) -- "Another Canoe"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> print(1+1)
[1] 2
>
>
[root@dghdp255 ~]#
4. Zeppelin UI a) spark2 config SPARK_HOME /usr/hdp/current/spark2-client/
args
master yarn-client
spark.app.name Zeppelin
spark.cores.max
spark.executor.memory
spark.yarn.keytab /etc/security/keytabs/zeppelin.server.kerberos.keytab
spark.yarn.principal zeppelin-emeasupport@HWX.COM
zeppelin.R.cmd R
zeppelin.R.image.width 100%
zeppelin.R.knitr true
zeppelin.R.render.options out.format = 'html', comment = NA, echo = FALSE, results = 'asis', message = F, warning = F
zeppelin.dep.additionalRemoteRepository spark-packages,http://dl.bintray.com/spark-packages/maven,false;
zeppelin.dep.localrepo local-repo
zeppelin.interpreter.localRepo /usr/hdp/current/zeppelin-server/local-repo/2CHXWU7YZ
zeppelin.pyspark.python python
zeppelin.spark.concurrentSQL false
zeppelin.spark.importImplicit true
zeppelin.spark.maxResult 1000
zeppelin.spark.printREPLOutput true
zeppelin.spark.sql.stacktrace false
zeppelin.spark.useHiveContext true
b) test R from zeppelin UI c) create a test CSV file on the OS (zeppelin node) [root@dghdp254 ~]# ls -lrt /tmp/updated.csv
-rw-r--r--. 1 root root 1326 Jun 6 07:07 /tmp/test.csv
d) check reading the file from R CLI [root@dghdp254 ~]# R
R version 3.3.3 (2017-03-06) -- "Another Canoe"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> a<-read.csv("/tmp/test.csv")
> print(a)
[1] Test.File
<0 rows> (or 0-length row.names)
>
e) restart spark2 interpreter and run the below %spark2.r
a<-read.csv("/tmp/test.csv")
print(a)
... View more
Labels:
07-13-2017
04:02 AM
@Miles Yao Good catch!!! Just updated. The phoenix jar is here to work with JDBC interpreter rather than spark.
... View more
05-18-2017
08:29 AM
ENVIRONMENT This problem has been replicated and fixed on HDP 2.5.x PROBLEM I have followed up the https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.5.0/bk_zeppelin-component-guide/content/zepp-with-spark.html to add additional jar for livy - copy jar file into /usr/hdp/<version>/livy/repl-jars folder. Restarted livy service afterwards. When running: %livy
import de.xxx.statistik.program.model.LineModel I am getting: <console>:25: error: not found: value de
import de.xxx.statistik.program.model.LineModel
SOLUTION Added the following into Ambari -> MapReduce2 -> mapreduce.application.classpath - :$PWD/* Before Change $PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure After Change $PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:$PWD/mr-framework/hadoop/share/hadoop/tools/lib/*:/usr/hdp/${hdp.version}/hadoop/lib/hadoop-lzo-0.6.0.${hdp.version}.jar:/etc/hadoop/conf/secure:$PWD/*
... View more