Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

CDP - Zeppeling: Spark + Livy + Hive - HWC

avatar
Explorer

Hello, 

I would like to create Hive tables  using Zeppeling and I found the the below document that use HWC:

https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/integrating-hive-and-bi/topics/hive-hwc-readi...

I'm not sure what means "livy configuration file"?

Is "%livy2" a new Interpreter? May I create a new Interpreter with only indicated configurations?

 

Thanks!

 

 

1 ACCEPTED SOLUTION

avatar
Master Collaborator

Steps for creating jdbc hive interpreter: As the keytab location is not consistent in CDP and changes as services are restarted, we should copy keytab to a consistent location. As we use proxyuser option with hive beeline, zeppelin user must be configured to allow impersonate of another user. Follow the below config steps to create a JDBC interpreter in Zeppelin.

- Copy keytab from the current process directory :

# cp $(ls -1drt /var/run/cloudera-scm-agent/process/*-ZEPPELIN_SERVER | tail -1)/zeppelin.keytab /var/tmp
# chown zeppelin:zeppelin /var/tmp/zeppelin.keytab

- Configure core-site to allow proxyuser for zeppelin

CM UI > HDFS > Configurations > Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml

hadoop.proxyuser.zeppelin.hosts=*
hadoop.proxyuser.zeppelin.groups=*

- Restart required services(must restart Hadoop and hive_on_tez service)

- Configure interpreter in zeppelin with below additional properties :

hive.driver org.apache.hive.jdbc.HiveDriver
hive.proxy.user.property hive.server2.proxy.user
hive.url jdbc:hive2://xxxxxxxxxxxx.com:2181/default;principal=hive/_HOST@XXXXXX.XXXXX.COM;serviceDiscoveryMode=zooKeeper;ssl=true;zooKeeperNamespace=hiveserver2
hive.user hive
zeppelin.jdbc.keytab.location /var/tmp/zeppelin.keytab
zeppelin.jdbc.principal zeppelin/xxxxxxxxxxxxx.com@XXXXXX.XXXXX.COM

- Make sure hive.url/keytab/principal configs are set as per the environment.

- Create notebook/paragraph and verify user impersonation and hive access

%jdbc(hive)
select current_user()

View solution in original post

7 REPLIES 7

avatar
Explorer

Any update?

avatar
Master Collaborator

Hi @paulo_klein  Apache Zeppelin on Cloudera Data Platform supports the following interpreters:

  • JDBC (supports Hive, Phoenix)

  • OS Shell

  • Markdown

  • Livy (supports Spark, Spark SQL, PySpark, PySpark3, and SparkR)

  • AngularJS

As you would like to create Hive tables using Zeppelin, you can use the JDBC interpreter to access Hive.

 

The %jdbc the interpreter supports access to Apache Hive data. The interpreter connects to Hive via Thrift. For more details, you can refer to this documentation which describes how to use the Apache Zeppelin JDBC interpreter to access Apache Hive.

avatar
Explorer

Hello @jagadeesan , 

Thanks for your reply.

 

On my fresh new CDP 7.1.7 cluster I havn't the interpreter %jdbc. I only have %livy, %angular and %md intepreters installed.

Can I create new one? Could you share the correct way to create the %jdbc interpreter on CDP?
I would like to load into the hive a spark dataframe. It's will be possible with this %jdbc interpreter?
Tks!

avatar
Master Collaborator

Steps for creating jdbc hive interpreter: As the keytab location is not consistent in CDP and changes as services are restarted, we should copy keytab to a consistent location. As we use proxyuser option with hive beeline, zeppelin user must be configured to allow impersonate of another user. Follow the below config steps to create a JDBC interpreter in Zeppelin.

- Copy keytab from the current process directory :

# cp $(ls -1drt /var/run/cloudera-scm-agent/process/*-ZEPPELIN_SERVER | tail -1)/zeppelin.keytab /var/tmp
# chown zeppelin:zeppelin /var/tmp/zeppelin.keytab

- Configure core-site to allow proxyuser for zeppelin

CM UI > HDFS > Configurations > Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml

hadoop.proxyuser.zeppelin.hosts=*
hadoop.proxyuser.zeppelin.groups=*

- Restart required services(must restart Hadoop and hive_on_tez service)

- Configure interpreter in zeppelin with below additional properties :

hive.driver org.apache.hive.jdbc.HiveDriver
hive.proxy.user.property hive.server2.proxy.user
hive.url jdbc:hive2://xxxxxxxxxxxx.com:2181/default;principal=hive/_HOST@XXXXXX.XXXXX.COM;serviceDiscoveryMode=zooKeeper;ssl=true;zooKeeperNamespace=hiveserver2
hive.user hive
zeppelin.jdbc.keytab.location /var/tmp/zeppelin.keytab
zeppelin.jdbc.principal zeppelin/xxxxxxxxxxxxx.com@XXXXXX.XXXXX.COM

- Make sure hive.url/keytab/principal configs are set as per the environment.

- Create notebook/paragraph and verify user impersonation and hive access

%jdbc(hive)
select current_user()

avatar
Explorer

Hello @jagadeesan , 

We did all the configurations, but now we are geting the error below.

May I create some entry for trustStore?

 

%jdbc2(hive)
select current_user()

Error: Also, could not send response: org.apache.thrift.transport.TTransportException: javax.net.ssl.SSLHandshakeException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target

avatar
Explorer

My currently interpreter configuration:

jdbc2 %jdbc2

Properties

        name                                                                                     value

common.max_count1000
default.completer.schemaFilters 
default.completer.ttlInSeconds120
default.driverorg.postgresql.Driver
default.password 
default.precode 
default.splitQueriesfalse
default.statementPrecode 
default.urljdbc:postgresql://localhost:5432/
default.usergpadmin
hive.driverorg.apache.hive.jdbc.HiveDriver
hive.urljdbc:hive2://SERVER01.LOCAL.NET:2181,SERVER02.LOCAL.NET:2181,SERVER03.LOCAL.NET:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
hive.userhive
zeppelin.jdbc.auth.type 
zeppelin.jdbc.concurrent.max_connection10
zeppelin.jdbc.concurrent.usetrue
zeppelin.jdbc.interpolationfalse
zeppelin.jdbc.keytab.location/var/tmp/zeppelin.keytab
zeppelin.jdbc.maxConnLifetime-1
zeppelin.jdbc.maxRows1000
zeppelin.jdbc.principalzeppelin/SERVER01.LOCAL.NET@LOCAL.NET

avatar
Explorer

 

To solve "unable to find valid certification path to requested target" I just import the certificate to java and restart the Zeppeling Server.

 

### LINUX LIST CERT
cd /usr/lib/jvm/java-11-openjdk-11.0.15.0.10-2.el8_6.x86_64/bin
./keytool -list -keystore /usr/lib/jvm/java-11-openjdk-11.0.15.0.10-2.el8_6.x86_64/lib/security/cacerts

 

### LINUX IMPORT CERT

./keytool --import --alias keystore_cloudera --file /var/lib/cloudera-scm-agent/agent-cert/cm-auto-global_cacerts.pem -keystore /usr/lib/jvm/java-11-openjdk-11.0.15.0.10-2.el8_6.x86_64/lib/security/cacerts