About RangaReddy

RangaReddy · ‎04-15-2021

Mostly your company firewall is blocking the cloudera/central maven repository to download the repositories.

RangaReddy · ‎04-08-2021

Are you able to download any dependencies from maven central? If no please remove cloudera library and try open source apache library. It will confirm whether cloudera repo is the issue or not.

RangaReddy · ‎03-24-2021

This article will describe the steps to access the remote CDP cluster HBase data from another CDP cluster using Spark in Kerberized environment. Assume we have two clusters say 'Cluster_A' and 'Cluster_B'. 'Cluster_A' is having the HBase data. 'Cluster_B' is having Spark. Now, we are trying to access the HBase data available in 'Cluster_A' using Spark from 'Cluster_B'. Prerequisites: Both clusters 'Cluster_A' and 'Cluster_B' with keytabs need to have the same REALM. Follow these steps: Login to 'Cluster_A' edge node Obtain the Kerberos ticket using kinit: kinit -kt <key_tab_file> <principal_name> Login to the Hbase shell and create the 'person' table: ranga]# hbase shell hbase(main):001:0> list TABLE 0 row(s) hbase(main):013:0> create 'person', 'p' Created table person Took 8.2307 seconds => Hbase::Table - person hbase(main):014:0> put 'person',1,'p:id','1' Took 0.0173 seconds hbase(main):015:0> put 'person',1,'p:name','Ranga Reddy' Took 0.0045 seconds hbase(main):016:0> put 'person',1,'p:email','ranga@gmail.com' Took 0.0043 seconds hbase(main):017:0> put 'person',1,'p:age','25' Took 0.0049 seconds hbase(main):018:0> scan 'person' ROW COLUMN+CELL 1 column=p:age, timestamp=1616425683759, value=25 1 column=p:email, timestamp=1616425681754, value=ranga@gmail.com 1 column=p:id, timestamp=1616425681717, value=1 1 column=p:name, timestamp=1616425681736, value=Ranga Reddy hbase(main):018:0> exit Copy the hbase-site.xml from 'Cluster_A' to 'Cluster_B': scp /etc/hbase/conf/hbase-site.xml root@cluster_b_ipaddress:/tmp Login to 'Cluster_B' edge node Obtain the Kerberos ticket using kinit: kinit -kt <key_tab_file> <principal_name> Place the above-copied hbase-site.xml (in Step 4) into some temporary directory. Example: /tmp/hbase/conf mkdir -p /tmp/hbase/conf cp /tmp/hbase-site.xml /tmp/hbase/conf Launch the Spark shell by providing HBase Spark connector packages and HBase configuration (/tmp/hbase/conf) directory: spark-shell \ --master yarn \ --conf spark.driver.extraClassPath=/tmp/hbase/conf \ --conf spark.executor.extraClassPath=/tmp/hbase/conf \ --jars /opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/hbase-spark-1.0.0*.jar,\ /opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/hbase-spark-protocol-shaded-*.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-client.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-server.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-mapreduce.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-shaded-miscellaneous.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-shaded-protobuf.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-shaded.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-shaded.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-shaded-netty.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-zookeeper.jar,\ /opt/cloudera/parcels/CDH/jars/htrace-core-3.1.0-incubating.jar,\ /opt/cloudera/parcels/CDH/lib/hbase/hbase-annotations.jar Run the following code after launching the spark-shell: val df = spark.read.format("org.apache.hadoop.hbase.spark").option("hbase.columns.mapping", "id STRING :key, name STRING :p:name, email STRING p:email, age STRING p:age").option("hbase.table", "person").option("hbase.spark.use.hbasecontext", false).load() df.show(truncate=false) The output of the above command: scala> df.show(truncate=false) +---+----+---------------+---+ |age|name|email |id | +---+----+---------------+---+ |25 |1 |ranga@gmail.com| | +---+----+---------------+---+ Thanks for reading this article. I hope you have enjoyed it.

RangaReddy · ‎03-24-2021

@zampJeri Yes one of operation write or msck repair command is using temp directory. Current running user is not having create directory permission. Could you please give the proper permission and re run the job.

RangaReddy · ‎03-23-2021

Hi @zampJeri Could you please let me know from which user you are running the spark application. Check that user is having creating files/directory access under /tmp/hive directory.

RangaReddy · ‎03-16-2021

Try this documentation: https://community.cloudera.com/t5/Support-Questions/CDP-7-1-3-Zepplin-not-able-to-login-with-default-username/td-p/303717

RangaReddy · ‎03-09-2021

Please let me know, further any help is required on this issue.

RangaReddy · ‎03-09-2021

Hi Steve, If you enable Atlas service in Spark, then there will be two flows ---> Your Application Flow Spark --> Write Spark events to Kafka --> HBase --> This HBase data is visualised in Atlas UI. Please check with admin team, for Spark Atlas service is required or not. If it is not required then disable in the Spark UI. Then you will not see any issues in oozie. Mean while have you tried to submit the same job without oozie?

RangaReddy · ‎03-08-2021

Hello hindmasj, Could you please check, Atlas service is enabled in the Spark service or not? Spark --> Configuration --> Atlas Service If Atlas service is enabled then Spark internally requires the spark-sql-kafka-0-10_2.11-2.4.0.7.1.4.0-203.jar file. Copy the jar to hdfs and update in Spark action in oozie <jar>hdfs://host/path/to/spark-sql-kafka-0-10_2.11-2.4.0.7.1.4.0-203.jar</jar> If Atlas service not required please disable and run the Spark job.

RangaReddy · ‎02-22-2021

Hi Keith, To check the issue further could you please create a cloudera case. I will investigate further from your side.

Online	Offline
Last Visited	‎08-29-2024 03:41 AM

Member Since	‎06-02-2020 05:25 AM
Last Visited	‎08-29-2024 03:41 AM
Posts	331
Kudos received	66

Cloudera Community

Re: Icebreg on CDP private cloud 7.1.9

Re: How to set default time zone/local time for Sp...

Re: Load Iceberg Table on PowerBI Desktop

Re: NoClassDefFoundError due to Incompatible Spark...

Re: Creating Iceberg table

Re: Maven Ccoudera Repo - SSL Errors - Unable to f...

Re: Maven Ccoudera Repo - SSL Errors - Unable to f...

Accessing the remote CDP cluster HBase data from a...

Re: User class threw exception: org.apache.spark.s...

Re: User class threw exception: org.apache.spark.s...

Re: Zeppelin default login

Re: Kafka class not found error when running Atlas...

Re: Kafka class not found error when running Atlas...

Re: Kafka class not found error when running Atlas...

Re: Spark Java API Hive Warehouse Connector Error ...