Member since
03-25-2016
142
Posts
48
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
3498 | 06-13-2017 05:15 AM | |
1057 | 05-16-2017 05:20 AM | |
833 | 03-06-2017 11:20 AM | |
3362 | 02-23-2017 06:59 AM | |
1752 | 02-20-2017 02:19 PM |
02-06-2018
06:35 AM
@Lekya Goriparti Have a look at this: https://community.hortonworks.com/questions/26622/the-node-hbase-is-not-in-zookeeper-it-should-have.html
... View more
11-09-2017
03:15 PM
1 Kudo
1. Introduction This article is an extension of the one created by @Dan Zaratsian - H2O on Livy 2. Environment Details Here are the environment details I did test it: HDP: 2.6.1 Ambari: 2.5.0.3 OS: 7.3.1611 python: 2.7.5 IMPORTANT NOTE: H2O requires python ver. 2.7+ 3. Installing H2O Go to Zeppelin node and do the following: $ mkdir /tmp/H2O
$ cd /tmp/H2O
$ wget http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.1/16/sparkling-water-2.1.16.zip
$ unzip sparkling-water-2.1.16.zip
4. Testing H2O from CLI Go to Zeppelin node where you downloaded and installed H2O $ export SPARK_HOME='/usr/hdp/current/spark2-client'
$ export HADOOP_CONF_DIR=/etc/hadoop/conf
$ export MASTER="yarn-client"
$ export SPARK_MAJOR_VERSION=2
$ cd /tmp/H2O/sparkling-water-2.1.16/bin
$ ./pysparkling
>>> from pysparkling import *
>>> hc = H2OContext.getOrCreate(spark)
My test [root@dkozlowski-dkhdp262 bin]# ./pysparkling
Python 2.7.5 (default, Jun 17 2014, 18:11:42)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.1.0-129/spark2/jars/spark-llap_2.11-1.1.3-2.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
17/11/08 14:33:06 WARN HiveConf: HiveConf of name hive.llap.daemon.service.hosts does not exist
17/11/08 14:33:06 WARN HiveConf: HiveConf of name hive.llap.daemon.service.hosts does not exist
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.1.1.2.6.1.0-129
/_/
Using Python version 2.7.5 (default, Jun 17 2014 18:11:42)
SparkSession available as 'spark'.
>>> from pysparkling import *
>>> hc = H2OContext.getOrCreate(spark)
17/11/08 14:33:57 WARN H2OContext: Method H2OContext.getOrCreate with an argument of type SparkContext is deprecated and parameter of type SparkSession is preferred.
17/11/08 14:33:57 WARN InternalH2OBackend: Increasing 'spark.locality.wait' to value 30000
17/11/08 14:33:57 WARN InternalH2OBackend: Due to non-deterministic behavior of Spark broadcast-based joins
We recommend to disable them by
configuring `spark.sql.autoBroadcastJoinThreshold` variable to value `-1`:
sqlContext.sql("SET spark.sql.autoBroadcastJoinThreshold=-1")
17/11/08 14:33:57 WARN InternalH2OBackend: The property 'spark.scheduler.minRegisteredResourcesRatio' is not specified!
We recommend to pass `--conf spark.scheduler.minRegisteredResourcesRatio=1`
Connecting to H2O server at http://172.26.110.84:54323. successful.
-------------------------- ---------------------------------------------------
H2O cluster uptime: 25 secs
H2O cluster version: 3.14.0.7
H2O cluster version age: 18 days
H2O cluster name: sparkling-water-root_application_1507531306616_0032
H2O cluster total nodes: 2
H2O cluster free memory: 1.693 Gb
H2O cluster total cores: 8
H2O cluster allowed cores: 8
H2O cluster status: accepting new members, healthy
H2O connection url: http://172.26.110.84:54323
H2O connection proxy:
H2O internal security: False
H2O API Extensions: XGBoost, Algos, AutoML, Core V3, Core V4
Python version: 2.7.5 final
-------------------------- ---------------------------------------------------
Sparkling Water Context:
* H2O name: sparkling-water-root_application_1507531306616_0032
* cluster size: 2
* list of used nodes:
(executorId, host, port)
------------------------
(2,dkhdp262.openstacklocal,54321)
(1,dkhdp263.openstacklocal,54321)
------------------------
Open H2O Flow in browser: http://172.26.110.84:54323 (CMD + click in Mac OSX)
5. Zeppelin site Before following up the below steps ensure point 4. Testing H2O from CLI runs successfully a) Ambari UI Ambari -> Zeppelin -> Configs -> Advanced zeppelin-env -> zeppelin_env_template export SPARK_SUBMIT_OPTIONS="--files /tmp/H2O/sparkling-water-2.1.16/py/build/dist/h2o_pysparkling_2.1-2.1.16.zip"
export PYTHONPATH="/tmp/H2O/sparkling-water-2.1.16/py/build/dist/h2o_pysparkling_2.1-2.1.16.zip:${SPARK_HOME}/python:${SPARK_HOME}/python/lib/py4j-0.8.2.1-src.zip" b) Zeppelin UI Zeppelin UI -> Interpreter -> Spark2 (H2O does not work with dynamicAllocation enabled) spark.dynamicAllocation.enabled=<blank>
spark.shuffle.service.enabled=<blank>
c) Sample code %pyspark
from pysparkling import *
hc = H2OContext.getOrCreate(spark) %pyspark
import h2o
from h2o.estimators.gbm import H2OGradientBoostingEstimator
from h2o.grid.grid_search import H2OGridSearch
import sys
sys.stdout.isatty = lambda : False
sys.stdout.encoding = None
training_data = h2o.import_file("hdfs://dkhdp261:8020/tmp/test1.csv")
training_data.show()
... View more
- Find more articles tagged with:
- h2o
- How-ToTutorial
- Sandbox & Learning
- spark2
- sparkling_water
- zeppelin
Labels:
09-08-2017
03:57 PM
@Vijay Kiran Do not raise issues within the Article. Just create a separate HCC providing details of your JDBC interpreter as well as Credentials. NOTE: The above was tested on HDP 2.6.1 only. And you are on 2.6.0.3 if I am right.
... View more
07-31-2017
06:00 AM
Problem: I have been trying to configure Zeppelin's JDBC interpreter to work with our Phoenix servers, but am getting an error when running queries. The JDBC interpreter works fine for Hive and MySQL. Running this: %jdbc(phoenix)
select * from <table>
I am getting org.apache.zeppelin.interpreter.InterpreterException: null
org.apache.phoenix.exception.PhoenixIOException: Failed after attempts=1, exceptions:
Thu Jul 20 08:27:49 BST 2017, RpcRetryingCaller{globalStartTime=1500535669736, pause=100, retries=1}, org.apache.hadoop.hbase.MasterNotRunningException: com.google.protobuf.ServiceException: java.io.IOException: Broken pipe
at org.apache.zeppelin.jdbc.JDBCInterpreter.getConnection(JDBCInterpreter.java:416)
at org.apache.zeppelin.jdbc.JDBCInterpreter.executeSql(JDBCInterpreter.java:564)
at org.apache.zeppelin.jdbc.JDBCInterpreter.interpret(JDBCInterpreter.java:692)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.interpret(LazyOpenInterpreter.java:94)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer$InterpretJob.jobRun(RemoteInterpreterServer.java:489)
at org.apache.zeppelin.scheduler.Job.run(Job.java:175)
at org.apache.zeppelin.scheduler.ParallelScheduler$JobRunner.run(ParallelScheduler.java:162)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Root cause: This problem is very likely caused by the having done the upgrade from HDP 2.5 to 2.6. There was an environment variable missing in zeppelin_env: ZEPPELIN_INTP_CLASSPATH_OVERRIDES. Solution: a) - mv /etc/zeppelin/conf/interpreter.json /etc/zeppelin/conf/interpreter_bckup.json
- restart zeppelin b) - go to Ambari UI -> Zeppelin -> Configs -> Advanced zeppelin-env -> zeppelin-env_content
- add
export ZEPPELIN_INTP_CLASSPATH_OVERRIDES="/etc/zeppelin/conf/external-dependency-conf" just above
#### Spark interpreter configuration #### - save the change and restart all required
... View more
- Find more articles tagged with:
- Data Science & Advanced Analytics
- How-ToTutorial
- interpreter
- jdbc
- Phoenix
- zeppelin
Labels:
07-26-2017
07:25 AM
@Karan Alang Re-implement the SSL by following up exactly the steps described in here: http://docs.confluent.io/2.0.0/kafka/ssl.html
... View more
07-26-2017
06:21 AM
@Karan Alang 1) After enabling the debug - what can you see in controller log file? 2) What steps did you follow to enable SSL for Kafka?
... View more
07-26-2017
05:30 AM
@Karan Alang For debugging do this - change the log4j.rootLogger parameter in /etc/kafka/conf/tools-log4j.properties as: log4j.rootLogger=DEBUG, stderr Also check if producer works find for PLAINTEXT like: /usr/hdp/current/kafka-broker/bin/kafka-console-producer.sh --broker-list <broker-node>:6667 --topic <topic> --security-protocol PLAINTEXT For the testing purpose - use only one broker-node.
... View more
07-26-2017
04:10 AM
@Karan Alang Remove: - ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1
- ssl.endpoint.identification.algorithm=HTTPS
- ssl.secure.random.implementation=SHA1PRNG
Add: advertised.listeners=SSL://nwk2-bdp-kafka-04.gdcs-qa.apple.com:6668,PLAINTEXT://nwk2-bdp-kafka-04.gdcs-qa.apple.com:6667
client-ssl.properties: security.protocol=SASL_SSL
ssl.truststore.location=/tmp/ssl-kafka/server.truststore.jks
ssl.truststore.password=changeit
Run (if your cluster is non-Kerberized) ./kafka-console-producer.sh --broker-list nwk2-bdp-kafka-04.gdcs-qa.apple.com:6668 --topic <topic> --producer.config client-ssl.properties --security-protocol SSL
... View more
07-25-2017
09:36 AM
@Karan Alang Can you share your server.properties for review?
... View more
07-19-2017
08:26 AM
1 Kudo
PROBLEM
I have a non kerberized cluster. I did apply https://community.hortonworks.com/articles/81910/how-to-enable-user-impersonation-for-jdbc-interpre.html however the jdbc intepreter is still not impersonated.
NOTE: From HDP 2.6.2 (Zeppelin 0.7.2) JDBC interpreter on non-kerberised cluster is impersonated by having the following property added into Zeppelin UI -> Interpreter -> JDBC config:
hive.proxy.user.property=hive.server2.proxy.user
SOLUTION 1. Go to Zeppelin UI -> Interpreter
The configuration for JDBC hive interpreter should look like
2. Edit JDBC interpreter
Remove properties
hive.user and hive.password
and save the changes.
So, now the configuration looks like
3. Go to Zeppelin UI -> Credential
Add the credentials for the user like
Entity: jdbc.jdbc
Username: <username>
Password: <password>
4. Run the query
Go to your notebook and run the jdbc query for hive. In RM UI this query is now running by YOU
... View more
- Find more articles tagged with:
- Data Science & Advanced Analytics
- How-ToTutorial
- impersonation
- jdbc
- zeppelin
Labels:
07-14-2017
12:38 PM
@Edgar Daeds Thanks for the information. It is good to hear you have got this working.
... View more
07-13-2017
03:06 PM
Hi Abraham, Spark interpreter is not impersonated. Uncheck <User Impersonate>, restart the interpreter and have another try.
... View more
07-13-2017
06:52 AM
@Edgar Daeds ldap error 49, 52e - it is your systemUsername and systemPassword are incorrect. Basically, including the above parameter you need to provide systemUsername WITHOUT domain name.
... View more
07-13-2017
04:02 AM
@Miles Yao Good catch!!! Just updated. The phoenix jar is here to work with JDBC interpreter rather than spark.
... View more
07-12-2017
03:23 PM
@Edgar Daeds It is likely you are logging into Zeppelin as user1@MYDOMAIN.COM but the access to databases are for user1. If that is the case you would need to reconfigure your shiro_ini to enable you getting authenticated to Zeppelin as user1 WITHOUT the domain. The following property would do that for you: activeDirectoryRealm.principalSuffix = @mydomain.com I hope this helps.
... View more
07-11-2017
03:03 AM
@Gaurav Mallikarjuna In the above example you can notice that I used other method to connect to hiveserver2 - using hive2 node + its port number like $ beeline -u "jdbc:hive2://dkhdp261c6.openstacklocal:10000/" -n admin
Using admin is for my sample only. In your case - if your transport mode is binary and the cluster is NON kerberized - $ beeline -u "jdbc:hive2://<hiveserver2-hostname>:10000/" -n <username>
... View more
07-10-2017
10:46 AM
@Gaurav Mallikarjuna I tested the same as mine HDP 2.6.1 and could not see any issues [root@dkhdp262c6 ~]# beeline -u "jdbc:hive2://dkhdp263c6.openstacklocal:2181,dkhdp262c6.openstacklocal:2181,dkhdp261c6.openstacklocal:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" -n admin
Connecting to jdbc:hive2://dkhdp263c6.openstacklocal:2181,dkhdp262c6.openstacklocal:2181,dkhdp261c6.openstacklocal:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
Connected to: Apache Hive (version 1.2.1000.2.6.1.0-129)
Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.6.1.0-129 by Apache Hive
0: jdbc:hive2://dkhdp263c6.openstacklocal:218> show databases;
+----------------+--+
| database_name |
+----------------+--+
| default |
+----------------+--+
1 row selected (0.305 seconds)
0: jdbc:hive2://dkhdp263c6.openstacklocal:218>
This is non Kerberised environment though. One more thing, I have transport mode set to binary. What is yours? If your environment is also non-Kerberised and hive transport mode is binary, try the following: beeline -u "jdbc:hive2://dkhdp261c6.openstacklocal:10000/" -n admin The above is a hostname where your hiveserver2 is installed + its port number. Here is how this works my end: [root@dkhdp262c6 ~]# beeline -u "jdbc:hive2://dkhdp261c6.openstacklocal:10000/" -n admin
Connecting to jdbc:hive2://dkhdp261c6.openstacklocal:10000/
Connected to: Apache Hive (version 1.2.1000.2.6.1.0-129)
Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 1.2.1000.2.6.1.0-129 by Apache Hive
0: jdbc:hive2://dkhdp261c6.openstacklocal:100> show databases;
+----------------+--+
| database_name |
+----------------+--+
| default |
+----------------+--+
1 row selected (0.29 seconds)
0: jdbc:hive2://dkhdp261c6.openstacklocal:100>
... View more
06-30-2017
03:11 PM
Hi @dbalasundaran This error means that systemUsername and systempassword defined in shiro_ini does not match to AD's one. Follow up this HCC: https://community.hortonworks.com/articles/70392/how-to-configure-zeppelin-for-active-directory-use.html
... View more
06-25-2017
06:50 AM
@Jayadeep Jayaraman - Make sure you have also the below proxyuser at core-site.xml hadoop.proxyuser.livy.hosts=*
hadoop.proxyuser.livy.groups=* - Create a new livy interpreter and check how this helps
... View more
06-18-2017
05:52 AM
1 Kudo
Hi @suyash soni Unfortunately, this feature has not yet been implemented. This will be available in Zeppelin 0.8 based on https://issues.apache.org/jira/browse/ZEPPELIN-2368.
... View more
06-13-2017
08:11 AM
Hi @Jayadeep Jayaraman That is great - thanks for letting me know
... View more
06-13-2017
05:59 AM
@Jayadeep Jayaraman It is good to hear the sample works. I have a feeling that problem may be with the way you created your original table. Hence, try another thing - point your code to the test_orc_t_string table - the one from my above sample. Check if that works.
... View more
06-13-2017
05:32 AM
Hi @Jayadeep Jayaraman I have just done another test - treated timestamp as a string. That works for me as well. See below: beeline > create table test_orc_t_string (b string,t timestamp) stored as ORC;
> insert into table test_orc_t_string values('a', '1969-06-19 06:57:26.485'),('b','1988-06-21 05:36:22.35');
> select * from test_orc_t_string;
+----------------------+--------------------------+--+
| test_orc_t_string.b | test_orc_t_string.t |
+----------------------+--------------------------+--+
| a | 1969-06-19 06:57:26.485 |
| b | 1988-06-21 05:36:22.35 |
+----------------------+--------------------------+--+
2 rows selected (0.128 seconds)
pyspark >>> sqlContext.sql("select * from test_orc_t_string").show()
+---+--------------------+
| b| t|
+---+--------------------+
| a|1969-06-19 06:57:...|
| b|1988-06-21 05:36:...|
+---+--------------------+
Can you test the above at your site? Let me know how this works. Can you also send me the output of the below from beeline: show create table test;
... View more
06-13-2017
05:15 AM
Hi @Jayadeep Jayaraman I have just tested the same in pyspark2.1. That works fine my site. See below: beeline 0: jdbc:hive2://dkhdp262.openstacklocal:2181,> create table test_orc (b string,t timestamp) stored as ORC;
0: jdbc:hive2://dkhdp262.openstacklocal:2181,> select * from test_orc;
+-------------+------------------------+--+
| test_orc.b | test_orc.t |
+-------------+------------------------+--+
| a | 2017-06-13 05:02:23.0 |
| b | 2017-06-13 05:02:23.0 |
| c | 2017-06-13 05:02:23.0 |
| d | 2017-06-13 05:02:23.0 |
| e | 2017-06-13 05:02:23.0 |
| f | 2017-06-13 05:02:23.0 |
| g | 2017-06-13 05:02:23.0 |
| h | 2017-06-13 05:02:23.0 |
| i | 2017-06-13 05:02:23.0 |
| j | 2017-06-13 05:02:23.0 |
+-------------+------------------------+--+
10 rows selected (0.091 seconds)
pyspark [root@dkhdp262 ~]# export SPARK_MAJOR_VERSION=2
[root@dkhdp262 ~]# pyspark
SPARK_MAJOR_VERSION is set to 2, using Spark2
Python 2.7.5 (default, Jun 17 2014, 18:11:42)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.1.1.2.6.1.0-129
/_/
Using Python version 2.7.5 (default, Jun 17 2014 18:11:42)
SparkSession available as 'spark'.
>>> sqlContext.sql("select b, t from test_orc").show()
+---+--------------------+
| b| t|
+---+--------------------+
| a|2017-06-13 05:02:...|
| b|2017-06-13 05:02:...|
| c|2017-06-13 05:02:...|
| d|2017-06-13 05:02:...|
| e|2017-06-13 05:02:...|
| f|2017-06-13 05:02:...|
| g|2017-06-13 05:02:...|
| h|2017-06-13 05:02:...|
| i|2017-06-13 05:02:...|
| j|2017-06-13 05:02:...|
+---+--------------------+
Based on the error you have - is the timestamp value in your table a REAL timestamp? How did you insert it?
... View more
06-09-2017
06:49 AM
To read the file it needs to be on OS rather than HDFS. I hope that clarifies the process.
... View more
06-09-2017
06:31 AM
@sysadmin CreditVidya It is good to hear that finally works for you
... View more
06-09-2017
06:00 AM
@sysadmin CreditVidya put the file on zeppelin node in OS and try through %spark2.r
... View more
06-09-2017
05:29 AM
Hi @sysadmin CreditVidya I was away yesterday. I have just done further test. - on hdfs-hadoop-p0z7 locate your updated.csv file in /tmp folder - not HDFS but OS - make sure the file is with READ permissions - then do: $ su - zeppelin
$ R
> a<-read.csv("/tmp/updated.csv")
... View more
06-07-2017
01:22 PM
In the same machine and user (zeppelin@hdfs-hadoop-p0z7) run the below and attach the output: hdfs dfs -ls /tmp/updated.csv
and $ R
> a<-read.csv("/tmp/updated.csv") In Zeppelin - restart spark2 interpreter, re-run %spark2.r
a<-read.csv("/tmp/test.csv")
print(a)
and send me the application log for that job
... View more
06-07-2017
11:46 AM
@sysadmin CreditVidya You can get the application log doing: $ yarn logs -applicationId <your_applicatuon_id_here>
Having asked to test $ hdfs dfs -ls hdfs://<Active_NN>:8020/tmp/update.csv
run it from the zeppelin machine I did noticed you ran it as smittapally@hdfs-hadoop-mr2w:~$ hdfs dfs -ls hdfs://hdfs-hadoop-88hh.c.creditvidya-152512.internal:8020/tmp/updated.csv
Try the same from root@hdfs-hadoop-tswv
... View more