About Shelton

Shelton · ‎09-11-2020

@wert_1311 Domain name changes will affect the KDC database. Kerberos is super sensitive to domain changes according to experience you will have to recreate the KDC database and regenerate the keytabs/principals to enable you applications to reconnect. Cluster hostname If the hosts in the cluster were re-named ie host1.old.com to host1.new.com then ensure those changes are also reflected or resolved by the DNS. This is going a tricky one but fortunately, CM or Ambari will make your work easy now that your domain has changed the earlier generated keytabs have the old domain name . A keytab contains a pair of principals and an encrypted copy of that principal's key it's unique to each host since the principal names include the hostname and may be concatenated with the domain name Delete the old KDC database Usually, as the root user call the Kerberos database utility kdb5_util destroy assuming the old domain was OLD.COM this should delete the keytabs and principals linked to the old REALM, # kdb5_util -r OLD.COM destroy You will need to manually delete the keytabs liked to the old REALM on the file system /etc/security/keytabs/ [HDP] or /etc/hadoop/conf/[CDH]. You will be prompted to confirm before destroying the database, usually, this is a better option if you have second thought rather than using the kdb5_util destroy -f will naturally not prompt you for a confirmation Recreate the New KDC database Use the Kerberos database utility kdb5_util create [-s] assuming the new domain was NEW.COM # kdb5_util NEW.COM create # kdb5_util -r NEW.COM create -s With the -s option, kdb5_util will stash a copy of the master key in a stash file this allows a KDC to authenticate itself to the database utilities, such as kadmin, kadmind, krb5kdc, and kdb5_util best option. Update Kerberos files. Make sure you update the below files to reflect the new REALM assuming your MIT KDC server's domain isn't changed. krb5.conf kdc.conf kadm5.acl Auth-to-local Rules jaas.conf files [if being used by applications] Enable Kerberos Using CM or Ambari the process is straight forward. Please let me know if you need more help

Shelton · ‎08-28-2020

@mahfooz The property value can be modified only in hive-site.xml cluster configuration file. This will oblige you to restart the stale hive configuration and becomes a cluster-wide change rather than a runtime change. HTH

Shelton · ‎08-15-2020

@anass Hive and Impala have 2 distinct use-cases Hive, a data warehouse system is used for analyzing structured data. Uses HQL or the Hive Query Language which gets internally converted to MapReduce jobs which is fault-tolerant and a very good candidate for ETLs and batch-processing. On the other hand, Impala executes faster using an engine designed especially for the mission of interactive SQL over HDFS although unlike Hive, Impala is not fault-tolerance. But a fantastic MPP (Massive Parallel Processing) engine. - Hive generates query expressions at compile time whereas Impala does runtime code generation for “big loops” with no need for data movement and data transformation for storing data on Hadoop, . - Impala no java knowledge is required programmatically accessing the data in HDFS or HBase a basic knowledge of SQL querying can do the work. - Impala performs best when it queries files stored as Parquet format. It's good for sampling data - Apache Hive is not ideal for interactive computing query whereas Impala is meant for interactive computing. - Hive is batch-based Hadoop MapReduce whereas Impala is more like MPP database. - Hive supports complex types but Impala does not. - Apache Hive is fault-tolerant whereas Impala does not support fault tolerance. So its the best candidate for batch processing which is prone to failures When a hive query is run and if the DataNode goes down while the query is being executed, the output of the query will be produced as Hive is fault-tolerant. However, that is not the case with Impala. If a query execution fails in Impala it has to be started all over again. - Hive can transform SQL queries into Spark or MR jobs making it a good choice for long-running ETL jobs for which it is desirable to have fault tolerance because developers do not want to re-run a long-running job after executing it for several hours. For better comparison here is the benchmark HAWQ,Hive and Impala Hope that helps

Shelton · ‎08-14-2020

@mike_bronson7 Let me try to answer all your 3 questions in a shot [snapshot] Zookeeper has 2 types of logs the snapshot and transactional log files. As changes are made to the znodes i.e addition or deletion of znodes these changes are appended to a transaction log, occasionally, when a log grows large, a snapshot of the current state of all znodes will be written to the filesystem. This snapshot supersedes all previous logs. To put you in context it's like the edit-logs and the fsimage in Namenode architecture, all changes made in the HDFS is logged in the edits-logs in secondary Namenode when a checkpoint kick in it merges the edits log with the old fsimage to incorporate the changes ever since the last checkpoint. So zk snapshot is synonym to the fsimage as it contains the current state of the znode entries and ACL's Snapshot policy In the earlier command shared the snapshot count parameter -n <count> if you really want to have sleep then you can increment it to 5 or 7 but I think 3 suffice to use the autopurge feature so I keep only 3 snapshots and 3 transaction logs. When enabled, ZooKeeper auto-purge feature retains the autopurge.snapRetainCount most recent snapshots and the corresponding transaction logs in the dataDir and dataLogDir respectively and deletes the rest. Defaults to 3. The minimum value is 3. Corrupt snapshots The Zookeeper might not be able to read its database and fail to come up because of some file corruption in the transaction logs of the ZooKeeper server. You will see some IOException on loading the ZooKeeper database. In such a case, make sure all the other servers in your ensemble are up and working. Use the 4 letters command "stat" command on the command port to see if they are in good health. After you have verified that all the other servers of the ensemble are up, you can go ahead and clean the database of the corrupt server. Solution Delete all the files in datadir/version-2 and datalogdir/version-2/. Restart the server. Hope that helps

Shelton · ‎08-13-2020

@mike_bronson7 A ZooKeeper server will not remove old snapshots and log files when using the default configuration auto-purge this is the responsibility of the operator as every environment is different and therefore the requirements of managing these files may differ from install to install. The PurgeTxnLog utility implements a simple retention policy that administrators can use. In the below example the last count snapshots and their corresponding logs are retained and the others are deleted. The value of <<count>> should typically be greater than 3 although not required; this provides 3 backups in the unlikely event a recent log has become corrupted. This can be run as a cron job on the ZooKeeper server machines to clean up the logs daily. java -cp zookeeper.jar:lib/slf4j-api-1.6.1.jar:lib/slf4j-log4j12-1.6.1.jar:lib/log4j-1.2.15.jar:conf org.apache.zookeeper.server.PurgeTxnLog <dataDir> <snapDir> -n <count> Automatic purging of the snapshots and corresponding transaction logs was introduced in version 3.4.0 and can be enabled via the following configuration parameters autopurge.snapRetainCount and autopurge.purgeInterval Hope that helps!

Shelton · ‎08-13-2020

@BBFayz Below is the link with all the passwords you could be interested in. The default root password by default is root/hadoop but you will be requested to change that on the first logon Sandbox passwords learning rope Setup Static IP on RHEL Hope that helps

Shelton · ‎08-11-2020

@kvinod Can you share the steps you have accomplished and attach the specific errors that you are encountering?

Shelton · ‎08-11-2020

@ashish_inamdar Have you enabled ranger hive plugin? If so then ensure the Atlas user has a Ranger policy that allows the user to correct database and tables permissions. Because once Ranger hive plugin has been enabled you MUST use Ranger for authorization. Hope that helps

Shelton · ‎08-01-2020

@amateur There is a Jira out there

Shelton · ‎07-30-2020

@Seaport I would think there is a typo error the dash [-] after - put and before the hdfs path hdfs dfs -cat /user/testuser/stage1.tar.gz | gzip -d | hdfs dfs -put - /user/testuser/test3 try this after removing the dash - hdfs dfs -cat /user/testuser/stage1.tar.gz | gzip -d | hdfs dfs -put /user/testuser/test3 Hope that helps

Online	Offline
Last Visited	‎12-11-2025 11:50 PM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎12-11-2025 11:50 PM
Posts	3,679
Kudos received	627

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: Kerberos configuration for cloudera hadoop clu...

Re: Modifying hive.spark.client.server.connect.tim...

Re: Wich sql engine best solution to use with CDP ...

Re: How to decide Zookeeper's snapshot file policy

Re: How to decide Zookeeper's snapshot file policy

Re: Set a static IP / Need centOS root password

Re: Need help to launch docker container for my ya...

Re: No database is being detecting in import-hive....

Re: How to enable TLSv1.2 for Livy

Re: Copy Files from Linux to HDFS - individually v...