Member since
04-23-2019
31
Posts
1
Kudos Received
0
Solutions
09-21-2020
08:44 AM
I have tested the backup/restore solution and seems to be working like charm with spark :
-First, check and record the names as given in the list of the kudu_master (or the primary elected master in case of multi masters ) http://Master1:8051/tables
-Download the kudu-backupX.X.jar in case you can't find it in /opt/cloudera/parcels/CDH-X.Xcdh.XX/lib/ and put it there
-In kuduMasterAddresses you put the name of your Kudu_master or the names of your three masters separated by ','
-Backup : sudo -u hdfs spark2-submit --class org.apache.kudu.backup.KuduBackup /opt/cloudera/parcels/CDH-X.Xcdh.XX/lib/kudu-backup2_2.11-1.13.0.jar --kuduMasterAddresses MASTER1(,MASTER2,..) --rootPath hdfs:///PATH_HDFS impala::DB.TABLE
-COPY : sudo -u hdfs hadoop distcp -i - hdfs:///PATH_HDFS/DB.TABLE hdfs://XXX:8020/kudu_backups/ -Restore:
sudo -u hdfs spark2-submit --class org.apache.kudu.backup.KuduRestore /opt/cloudera/parcels/CDH-X.Xcdh.XX/lib/kudu-backup2_2.11-1.13.0.jar --kuduMasterAddresses MASTER1(,MASTER2,..) --rootPath hdfs:///PATH_HDFS impala::DB.TABLE finally INVALIDATE METADATA
... View more
08-13-2019
04:44 PM
Hi @Harish19, There is SSL Options button somewhere in the ODBC driver configuration window, please click through and confirm if you have SSL enabled on the client side. Cheers Eric
... View more
07-15-2019
09:02 AM
1 Kudo
Hi @Harish19 , the best place for information regarding TCP-DS tests on Impala would be (follow the README.md): https://github.com/cloudera/impala-tpcds-kit Once the data is populated in HDFS and tables are created, you likely can run most the same queries in tree/master/queries/ on Hive and/or Hive on Spark to test. IBM and Databricks have githubs with some SparkSQL tests, which you can Google for, but I have not personally evaluated them, or know if they work. Thanks,
... View more
06-10-2019
07:11 AM
Hi, Could you please share the Entire Error log console for analysis purpose and also share the Pyspark command that you are submitting. Thanks AK
... View more
06-03-2019
01:46 PM
from spark or pyspark shell use the below commands to access hive database objects. spark.sql("show databases;") spark.sql("select * from databasename.tablename;") or spark.read.table("databasename.tablename") You can give any query inside spark.sql which will give you results.
... View more
05-15-2019
04:33 AM
1 Kudo
Have you thought about the option of using Cloudera Navigator? Queries can be passed along to Navigator and you can perform search through its interface.
... View more
05-13-2019
08:42 AM
Just enter the corresponding OU's where users or groups are found. Please make sure that setting are correct and allow users and it's member groups can be found in LDAP. We suggest to test your settings using the jmldap.jar tool first, for ease of testing and debugging.
... View more
05-08-2019
09:41 PM
The OIV tool is documented at https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsImageViewer.html and includes some examples. Try its Delimiter related options on a copy of your HDFS fsimage file and checkout the result.
... View more
05-07-2019
08:56 PM
1 Kudo
Have you gone over Kafka documentation? Are there specific parts or scenarios beyond the ones mentioned that you have these questions about? > data loss Kafka provides topic partition replication. > data duplication Kafka does not do anything specific for deduplicating data. Assuming you're asking about exactly-once processing semantics, it depends on your application and how it leverages Kafka. One record of this is at https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/
... View more
05-07-2019
05:24 PM
The simplest way is through Cloudera Hue. See http://gethue.com/new-apache-oozie-workflow-coordinator-bundle-editors/ That said, if you've attempted something and have run into issues, please add more details so the community can help you on specific topics.
... View more