Member since
04-23-2019
31
Posts
1
Kudos Received
0
Solutions
09-21-2020
08:44 AM
I have tested the backup/restore solution and seems to be working like charm with spark :
-First, check and record the names as given in the list of the kudu_master (or the primary elected master in case of multi masters ) http://Master1:8051/tables
-Download the kudu-backupX.X.jar in case you can't find it in /opt/cloudera/parcels/CDH-X.Xcdh.XX/lib/ and put it there
-In kuduMasterAddresses you put the name of your Kudu_master or the names of your three masters separated by ','
-Backup : sudo -u hdfs spark2-submit --class org.apache.kudu.backup.KuduBackup /opt/cloudera/parcels/CDH-X.Xcdh.XX/lib/kudu-backup2_2.11-1.13.0.jar --kuduMasterAddresses MASTER1(,MASTER2,..) --rootPath hdfs:///PATH_HDFS impala::DB.TABLE
-COPY : sudo -u hdfs hadoop distcp -i - hdfs:///PATH_HDFS/DB.TABLE hdfs://XXX:8020/kudu_backups/ -Restore:
sudo -u hdfs spark2-submit --class org.apache.kudu.backup.KuduRestore /opt/cloudera/parcels/CDH-X.Xcdh.XX/lib/kudu-backup2_2.11-1.13.0.jar --kuduMasterAddresses MASTER1(,MASTER2,..) --rootPath hdfs:///PATH_HDFS impala::DB.TABLE finally INVALIDATE METADATA
... View more
08-13-2019
04:44 PM
Hi @Harish19, There is SSL Options button somewhere in the ODBC driver configuration window, please click through and confirm if you have SSL enabled on the client side. Cheers Eric
... View more
07-15-2019
09:02 AM
1 Kudo
Hi @Harish19 , the best place for information regarding TCP-DS tests on Impala would be (follow the README.md): https://github.com/cloudera/impala-tpcds-kit Once the data is populated in HDFS and tables are created, you likely can run most the same queries in tree/master/queries/ on Hive and/or Hive on Spark to test. IBM and Databricks have githubs with some SparkSQL tests, which you can Google for, but I have not personally evaluated them, or know if they work. Thanks,
... View more
06-03-2019
01:46 PM
from spark or pyspark shell use the below commands to access hive database objects. spark.sql("show databases;") spark.sql("select * from databasename.tablename;") or spark.read.table("databasename.tablename") You can give any query inside spark.sql which will give you results.
... View more
05-07-2019
05:24 PM
The simplest way is through Cloudera Hue. See http://gethue.com/new-apache-oozie-workflow-coordinator-bundle-editors/ That said, if you've attempted something and have run into issues, please add more details so the community can help you on specific topics.
... View more
05-07-2019
05:21 PM
It would help if you add along some description of what you have found or attempted, instead of just a broad question. What load balancer are you choosing to use? We have some sample HAProxy configs at https://www.cloudera.com/documentation/enterprise/latest/topics/impala_proxy.html#tut_proxy for Impala that can be repurposed for other components. Hue also offers its own pre-optimized Load Balancer as roles in Cloudera Manager that you can add and have it setup automatically: https://www.cloudera.com/documentation/enterprise/latest/topics/hue_perf_tuning.html
... View more
04-29-2019
11:21 AM
1 Kudo
Hi , Please refer to the following documentation for details on migrating from Oracle JDK to OpenJDK https://www.cloudera.com/documentation/enterprise/upgrade/topics/ug_jdk8.html#concept_yky_c3z_4fb Hope thsi helps. Regards, Priya
... View more
04-28-2019
12:12 AM
1 Kudo
You can use the Spark Action in Oozie to submit any spark applications: https://archive.cloudera.com/cdh5/cdh/5/oozie/DG_SparkActionExtension.html#Spark_Action If you are more familar with spark-submit tool, you can try to use oozie shell action as well: https://archive.cloudera.com/cdh5/cdh/5/oozie/DG_ShellActionExtension.html You may need to make sure the spark gateway role is deployed on the oozie server and node manager nodes, so that the runtime env always have the depencies available.
... View more