About dkozlowski

dkozlowski · ‎07-13-2017

Hi Abraham, Spark interpreter is not impersonated. Uncheck <User Impersonate>, restart the interpreter and have another try.

dkozlowski · ‎07-13-2017

@Miles Yao Good catch!!! Just updated. The phoenix jar is here to work with JDBC interpreter rather than spark.

dkozlowski · ‎07-11-2017

@Gaurav Mallikarjuna In the above example you can notice that I used other method to connect to hiveserver2 - using hive2 node + its port number like $ beeline -u "jdbc:hive2://dkhdp261c6.openstacklocal:10000/" -n admin Using admin is for my sample only. In your case - if your transport mode is binary and the cluster is NON kerberized - $ beeline -u "jdbc:hive2://<hiveserver2-hostname>:10000/" -n <username>

dkozlowski · ‎07-10-2017

@Gaurav Mallikarjuna I tested the same as mine HDP 2.6.1 and could not see any issues [root@dkhdp262c6 ~]# beeline -u "jdbc:hive2://dkhdp263c6.openstacklocal:2181,dkhdp262c6.openstacklocal:2181,dkhdp261c6.openstacklocal:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2" -n admin Connecting to jdbc:hive2://dkhdp263c6.openstacklocal:2181,dkhdp262c6.openstacklocal:2181,dkhdp261c6.openstacklocal:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2 Connected to: Apache Hive (version 1.2.1000.2.6.1.0-129) Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1000.2.6.1.0-129 by Apache Hive 0: jdbc:hive2://dkhdp263c6.openstacklocal:218> show databases; +----------------+--+ | database_name | +----------------+--+ | default | +----------------+--+ 1 row selected (0.305 seconds) 0: jdbc:hive2://dkhdp263c6.openstacklocal:218> This is non Kerberised environment though. One more thing, I have transport mode set to binary. What is yours? If your environment is also non-Kerberised and hive transport mode is binary, try the following: beeline -u "jdbc:hive2://dkhdp261c6.openstacklocal:10000/" -n admin The above is a hostname where your hiveserver2 is installed + its port number. Here is how this works my end: [root@dkhdp262c6 ~]# beeline -u "jdbc:hive2://dkhdp261c6.openstacklocal:10000/" -n admin Connecting to jdbc:hive2://dkhdp261c6.openstacklocal:10000/ Connected to: Apache Hive (version 1.2.1000.2.6.1.0-129) Driver: Hive JDBC (version 1.2.1000.2.6.1.0-129) Transaction isolation: TRANSACTION_REPEATABLE_READ Beeline version 1.2.1000.2.6.1.0-129 by Apache Hive 0: jdbc:hive2://dkhdp261c6.openstacklocal:100> show databases; +----------------+--+ | database_name | +----------------+--+ | default | +----------------+--+ 1 row selected (0.29 seconds) 0: jdbc:hive2://dkhdp261c6.openstacklocal:100>

dkozlowski · ‎06-18-2017

Hi @suyash soni Unfortunately, this feature has not yet been implemented. This will be available in Zeppelin 0.8 based on https://issues.apache.org/jira/browse/ZEPPELIN-2368.

dkozlowski · ‎06-13-2017

Hi @Jayadeep Jayaraman That is great - thanks for letting me know

dkozlowski · ‎06-13-2017

@Jayadeep Jayaraman It is good to hear the sample works. I have a feeling that problem may be with the way you created your original table. Hence, try another thing - point your code to the test_orc_t_string table - the one from my above sample. Check if that works.

dkozlowski · ‎06-13-2017

Hi @Jayadeep Jayaraman I have just done another test - treated timestamp as a string. That works for me as well. See below: beeline > create table test_orc_t_string (b string,t timestamp) stored as ORC; > insert into table test_orc_t_string values('a', '1969-06-19 06:57:26.485'),('b','1988-06-21 05:36:22.35'); > select * from test_orc_t_string; +----------------------+--------------------------+--+ | test_orc_t_string.b | test_orc_t_string.t | +----------------------+--------------------------+--+ | a | 1969-06-19 06:57:26.485 | | b | 1988-06-21 05:36:22.35 | +----------------------+--------------------------+--+ 2 rows selected (0.128 seconds) pyspark >>> sqlContext.sql("select * from test_orc_t_string").show() +---+--------------------+ | b| t| +---+--------------------+ | a|1969-06-19 06:57:...| | b|1988-06-21 05:36:...| +---+--------------------+ Can you test the above at your site? Let me know how this works. Can you also send me the output of the below from beeline: show create table test;

dkozlowski · ‎06-13-2017

Hi @Jayadeep Jayaraman I have just tested the same in pyspark2.1. That works fine my site. See below: beeline 0: jdbc:hive2://dkhdp262.openstacklocal:2181,> create table test_orc (b string,t timestamp) stored as ORC; 0: jdbc:hive2://dkhdp262.openstacklocal:2181,> select * from test_orc; +-------------+------------------------+--+ | test_orc.b | test_orc.t | +-------------+------------------------+--+ | a | 2017-06-13 05:02:23.0 | | b | 2017-06-13 05:02:23.0 | | c | 2017-06-13 05:02:23.0 | | d | 2017-06-13 05:02:23.0 | | e | 2017-06-13 05:02:23.0 | | f | 2017-06-13 05:02:23.0 | | g | 2017-06-13 05:02:23.0 | | h | 2017-06-13 05:02:23.0 | | i | 2017-06-13 05:02:23.0 | | j | 2017-06-13 05:02:23.0 | +-------------+------------------------+--+ 10 rows selected (0.091 seconds) pyspark [root@dkhdp262 ~]# export SPARK_MAJOR_VERSION=2 [root@dkhdp262 ~]# pyspark SPARK_MAJOR_VERSION is set to 2, using Spark2 Python 2.7.5 (default, Jun 17 2014, 18:11:42) [GCC 4.8.2 20140120 (Red Hat 4.8.2-16)] on linux2 Type "help", "copyright", "credits" or "license" for more information. Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel). Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /__ / .__/\_,_/_/ /_/\_\ version 2.1.1.2.6.1.0-129 /_/ Using Python version 2.7.5 (default, Jun 17 2014 18:11:42) SparkSession available as 'spark'. >>> sqlContext.sql("select b, t from test_orc").show() +---+--------------------+ | b| t| +---+--------------------+ | a|2017-06-13 05:02:...| | b|2017-06-13 05:02:...| | c|2017-06-13 05:02:...| | d|2017-06-13 05:02:...| | e|2017-06-13 05:02:...| | f|2017-06-13 05:02:...| | g|2017-06-13 05:02:...| | h|2017-06-13 05:02:...| | i|2017-06-13 05:02:...| | j|2017-06-13 05:02:...| +---+--------------------+ Based on the error you have - is the timestamp value in your table a REAL timestamp? How did you insert it?

dkozlowski · ‎06-05-2017

ENVIRONMENT HDP-2.6.0.3 Ambari 2.5.0.3 SOLUTION 1. Install R on each DN $ yum install R-devel libcurl-devel openssl-devel 2. Run on each DN $ R > install.packages("knitr") 3. Test R from CLI [root@dghdp255 ~]# R -e "print(1+1)" R version 3.3.3 (2017-03-06) -- "Another Canoe" Copyright (C) 2017 The R Foundation for Statistical Computing Platform: x86_64-redhat-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > print(1+1) [1] 2 > > [root@dghdp255 ~]# 4. Zeppelin UI a) spark2 config SPARK_HOME /usr/hdp/current/spark2-client/ args master yarn-client spark.app.name Zeppelin spark.cores.max spark.executor.memory spark.yarn.keytab /etc/security/keytabs/zeppelin.server.kerberos.keytab spark.yarn.principal zeppelin-emeasupport@HWX.COM zeppelin.R.cmd R zeppelin.R.image.width 100% zeppelin.R.knitr true zeppelin.R.render.options out.format = 'html', comment = NA, echo = FALSE, results = 'asis', message = F, warning = F zeppelin.dep.additionalRemoteRepository spark-packages,http://dl.bintray.com/spark-packages/maven,false; zeppelin.dep.localrepo local-repo zeppelin.interpreter.localRepo /usr/hdp/current/zeppelin-server/local-repo/2CHXWU7YZ zeppelin.pyspark.python python zeppelin.spark.concurrentSQL false zeppelin.spark.importImplicit true zeppelin.spark.maxResult 1000 zeppelin.spark.printREPLOutput true zeppelin.spark.sql.stacktrace false zeppelin.spark.useHiveContext true b) test R from zeppelin UI c) create a test CSV file on the OS (zeppelin node) [root@dghdp254 ~]# ls -lrt /tmp/updated.csv -rw-r--r--. 1 root root 1326 Jun 6 07:07 /tmp/test.csv d) check reading the file from R CLI [root@dghdp254 ~]# R R version 3.3.3 (2017-03-06) -- "Another Canoe" Copyright (C) 2017 The R Foundation for Statistical Computing Platform: x86_64-redhat-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. Natural language support but running in an English locale R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > a<-read.csv("/tmp/test.csv") > print(a) [1] Test.File <0 rows> (or 0-length row.names) > e) restart spark2 interpreter and run the below %spark2.r a<-read.csv("/tmp/test.csv") print(a)

Online	Offline
Last Visited	‎02-06-2018 06:34 AM

Member Since	‎03-25-2016 06:26 AM
Last Visited	‎02-06-2018 06:34 AM
Posts	142
Kudos received	48

Cloudera Community

Re: ORC Table Timestamp PySpark 2.1 CASTIssue

Re: Can Kafka handle the mixture of authentication...

Re: How do I automate setting up LDAP in Ambari?

Re: Does jar files missing for spark interpreter?

Re: How to save results from dataframe into a sepa...

Re: Unable to run spark interpreter in Zeppelin - ...

Re: Enable phoenix access from Zeppelin in secure ...

Re: Beeline -u "JDBC:hive2://url..." -n username ...

Re: Beeline -u "JDBC:hive2://url..." -n username ...

Re: Is there any way to execute the paragraphs in ...

Re: ORC Table Timestamp PySpark 2.1 CASTIssue

Re: ORC Table Timestamp PySpark 2.1 CASTIssue

Re: ORC Table Timestamp PySpark 2.1 CASTIssue

Re: ORC Table Timestamp PySpark 2.1 CASTIssue

How to get spark2.r working with CSV file in Kerbe...