Member since
06-02-2020
331
Posts
65
Kudos Received
49
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1184 | 07-11-2024 01:55 AM | |
3404 | 07-09-2024 11:18 PM | |
2846 | 07-09-2024 04:26 AM | |
2268 | 07-09-2024 03:38 AM | |
2494 | 06-05-2024 02:03 AM |
04-15-2021
06:54 AM
Mostly your company firewall is blocking the cloudera/central maven repository to download the repositories.
... View more
04-08-2021
09:18 AM
Are you able to download any dependencies from maven central? If no please remove cloudera library and try open source apache library. It will confirm whether cloudera repo is the issue or not.
... View more
03-24-2021
11:21 AM
3 Kudos
This article will describe the steps to access the remote CDP cluster HBase data from another CDP cluster using Spark in Kerberized environment.
Assume we have two clusters say 'Cluster_A' and 'Cluster_B'. 'Cluster_A' is having the HBase data. 'Cluster_B' is having Spark. Now, we are trying to access the HBase data available in 'Cluster_A' using Spark from 'Cluster_B'.
Prerequisites:
Both clusters 'Cluster_A' and 'Cluster_B' with keytabs need to have the same REALM.
Follow these steps:
Login to 'Cluster_A' edge node
Obtain the Kerberos ticket using kinit: kinit -kt <key_tab_file> <principal_name>
Login to the Hbase shell and create the 'person' table: ranga]# hbase shell
hbase(main):001:0> list
TABLE
0 row(s)
hbase(main):013:0> create 'person', 'p'
Created table person
Took 8.2307 seconds
=> Hbase::Table - person
hbase(main):014:0> put 'person',1,'p:id','1'
Took 0.0173 seconds
hbase(main):015:0> put 'person',1,'p:name','Ranga Reddy'
Took 0.0045 seconds
hbase(main):016:0> put 'person',1,'p:email','ranga@gmail.com'
Took 0.0043 seconds
hbase(main):017:0> put 'person',1,'p:age','25'
Took 0.0049 seconds
hbase(main):018:0> scan 'person'
ROW COLUMN+CELL
1 column=p:age, timestamp=1616425683759, value=25
1 column=p:email, timestamp=1616425681754, value=ranga@gmail.com
1 column=p:id, timestamp=1616425681717, value=1
1 column=p:name, timestamp=1616425681736, value=Ranga Reddy
hbase(main):018:0> exit
Copy the hbase-site.xml from 'Cluster_A' to 'Cluster_B': scp /etc/hbase/conf/hbase-site.xml root@cluster_b_ipaddress:/tmp
Login to 'Cluster_B' edge node
Obtain the Kerberos ticket using kinit: kinit -kt <key_tab_file> <principal_name>
Place the above-copied hbase-site.xml (in Step 4) into some temporary directory. Example: /tmp/hbase/conf mkdir -p /tmp/hbase/conf
cp /tmp/hbase-site.xml /tmp/hbase/conf
Launch the Spark shell by providing HBase Spark connector packages and HBase configuration (/tmp/hbase/conf) directory: spark-shell \
--master yarn \
--conf spark.driver.extraClassPath=/tmp/hbase/conf \
--conf spark.executor.extraClassPath=/tmp/hbase/conf \
--jars /opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/hbase-spark-1.0.0*.jar,\
/opt/cloudera/parcels/CDH/lib/hbase_connectors/lib/hbase-spark-protocol-shaded-*.jar,\
/opt/cloudera/parcels/CDH/lib/hbase/hbase-client.jar,\
/opt/cloudera/parcels/CDH/lib/hbase/hbase-common.jar,\
/opt/cloudera/parcels/CDH/lib/hbase/hbase-server.jar,\
/opt/cloudera/parcels/CDH/lib/hbase/hbase-mapreduce.jar,\
/opt/cloudera/parcels/CDH/lib/hbase/hbase-shaded-miscellaneous.jar,\
/opt/cloudera/parcels/CDH/lib/hbase/hbase-shaded-protobuf.jar,\
/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-shaded.jar,\
/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol.jar,\
/opt/cloudera/parcels/CDH/lib/hbase/hbase-protocol-shaded.jar,\
/opt/cloudera/parcels/CDH/lib/hbase/hbase-shaded-netty.jar,\
/opt/cloudera/parcels/CDH/lib/hbase/hbase-zookeeper.jar,\
/opt/cloudera/parcels/CDH/jars/htrace-core-3.1.0-incubating.jar,\
/opt/cloudera/parcels/CDH/lib/hbase/hbase-annotations.jar
Run the following code after launching the spark-shell: val df = spark.read.format("org.apache.hadoop.hbase.spark").option("hbase.columns.mapping", "id STRING :key, name STRING :p:name, email STRING p:email, age STRING p:age").option("hbase.table", "person").option("hbase.spark.use.hbasecontext", false).load()
df.show(truncate=false) The output of the above command: scala> df.show(truncate=false)
+---+----+---------------+---+
|age|name|email |id |
+---+----+---------------+---+
|25 |1 |ranga@gmail.com| |
+---+----+---------------+---+
Thanks for reading this article. I hope you have enjoyed it.
... View more
03-24-2021
08:56 AM
@zampJeri Yes one of operation write or msck repair command is using temp directory. Current running user is not having create directory permission. Could you please give the proper permission and re run the job.
... View more
03-23-2021
06:31 AM
Hi @zampJeri Could you please let me know from which user you are running the spark application. Check that user is having creating files/directory access under /tmp/hive directory.
... View more
03-16-2021
05:32 AM
Try this documentation: https://community.cloudera.com/t5/Support-Questions/CDP-7-1-3-Zepplin-not-able-to-login-with-default-username/td-p/303717
... View more
03-09-2021
10:31 PM
Please let me know, further any help is required on this issue.
... View more
03-09-2021
03:43 AM
Hi Steve, If you enable Atlas service in Spark, then there will be two flows ---> Your Application Flow Spark --> Write Spark events to Kafka --> HBase --> This HBase data is visualised in Atlas UI. Please check with admin team, for Spark Atlas service is required or not. If it is not required then disable in the Spark UI. Then you will not see any issues in oozie. Mean while have you tried to submit the same job without oozie?
... View more
03-08-2021
10:58 PM
Hello hindmasj, Could you please check, Atlas service is enabled in the Spark service or not? Spark --> Configuration --> Atlas Service If Atlas service is enabled then Spark internally requires the spark-sql-kafka-0-10_2.11-2.4.0.7.1.4.0-203.jar file. Copy the jar to hdfs and update in Spark action in oozie <jar>hdfs://host/path/to/spark-sql-kafka-0-10_2.11-2.4.0.7.1.4.0-203.jar</jar> If Atlas service not required please disable and run the Spark job.
... View more
02-22-2021
08:28 PM
1 Kudo
Hi Keith, To check the issue further could you please create a cloudera case. I will investigate further from your side.
... View more
- « Previous
- Next »