About Shelton

Shelton · ‎10-10-2020

@vinod_artga The first step is to check the Cloudera upgrade path using the My environment matrix calculator See screenshot below After filling in all the information request this generates for you a report and warnings like Warning Upgrades from CDH 5.12 and lower to CDP Private Cloud Base are not supported. You must upgrade the cluster to CDH versions 5.13 - 5.16 before upgrading to CDP Private Cloud Base. Warning For upgrades from CDH 5 clusters with Sentry to Cloudera Runtime 7.1.1 (or higher) clusters where Sentry privileges are to be transitioned to Apache Ranger, the cluster must have Kerberos enabled before upgrading It also gives you comprehensive details about the best approach and component incompatibilities, this is your source of true I would suggest you try it and revert HTH

Shelton · ‎10-10-2020

@bvishal SmartSense Tool (HST) gives all support subscription customers access to a unique service that analyzes cluster diagnostic data, identifies potential issues, and recommends specific solutions and actions. These analytics proactively identify unseen issues and notify customers of potential problems before they occur. That is okay as you are just testing and you don't need to buy support which is advised when running a production environment To configure SmartSense you will need to configure the /etc/hst/conf/hst-server.ini the inputs/values you will get from Hortonworks support if you have paid for a subscription customer.smartsense.id customer.account.name customer.notification.email customer.enable.flex.subscription The error you are encountering is normal and won't impact your cluster Hope that helps

Shelton · ‎10-08-2020

@pazufst How Ranger policies work for HDFS Apache Ranger offers a federated authorization model for HDFS. Ranger plugin for HDFS checks for Ranger policies and if a policy exists, access is granted to user. If a policy doesn’t exist in Ranger, then Ranger would default to the native permissions model in HDFS (POSIX or HDFS ACL). This federated model is applicable for HDFS and Yarn service in Ranger. For other services such as Hive or HBase, Ranger operates as the sole authorizer which means only Ranger policies are in effect. The option for the fallback model is configured using a property in Ambari → Ranger → HDFS config → Advanced ranger-hdfs-security xasecure.add-hadoop-authorization=true The federated authorization model enables to safely implement Ranger in an existing cluster without affecting jobs that rely on POSIX permissions to enable this option as the default model for all deployments. org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=XXXXX, access=READ, inode="/user/.snapshot/user_201806150000":w93651:hdfs:drwx------ Is self-explanatory does the user w93651 exist on both clusters with valid Kerberos tickets if the cluster is kerberized? Ensure the CROSS-REALM is configured and working. Is your ranger managing the 2 clusters? HTH

Shelton · ‎10-06-2020

@pazufst You should toggle to RECURSIVE for the RWX permission on the hdfs should look like this /databank/* The selection should look like Hope that helps

Shelton · ‎10-06-2020

@pazufst Doesn't it look strange that permissions are /databank/.snapshot/databank_201904250000 :hdfs:hdfs:d--------- The normal permissions show be rwx Try that and revert

Shelton · ‎10-02-2020

@kvinod If you have installed Yarn and MRV2 can you check the value of the below parameter in the yarn-site.xml yarn.nodemanager.aux-services stop the services and change it to look like below <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> Restart and let me know

Shelton · ‎09-25-2020

@GangWar Can you regenerate the keytabs through Cloudare manager? That could resolve the problem if it doean't please revert with the error cêncountered?

Shelton · ‎09-11-2020

@wert_1311 Domain name changes will affect the KDC database. Kerberos is super sensitive to domain changes according to experience you will have to recreate the KDC database and regenerate the keytabs/principals to enable you applications to reconnect. Cluster hostname If the hosts in the cluster were re-named ie host1.old.com to host1.new.com then ensure those changes are also reflected or resolved by the DNS. This is going a tricky one but fortunately, CM or Ambari will make your work easy now that your domain has changed the earlier generated keytabs have the old domain name . A keytab contains a pair of principals and an encrypted copy of that principal's key it's unique to each host since the principal names include the hostname and may be concatenated with the domain name Delete the old KDC database Usually, as the root user call the Kerberos database utility kdb5_util destroy assuming the old domain was OLD.COM this should delete the keytabs and principals linked to the old REALM, # kdb5_util -r OLD.COM destroy You will need to manually delete the keytabs liked to the old REALM on the file system /etc/security/keytabs/ [HDP] or /etc/hadoop/conf/[CDH]. You will be prompted to confirm before destroying the database, usually, this is a better option if you have second thought rather than using the kdb5_util destroy -f will naturally not prompt you for a confirmation Recreate the New KDC database Use the Kerberos database utility kdb5_util create [-s] assuming the new domain was NEW.COM # kdb5_util NEW.COM create # kdb5_util -r NEW.COM create -s With the -s option, kdb5_util will stash a copy of the master key in a stash file this allows a KDC to authenticate itself to the database utilities, such as kadmin, kadmind, krb5kdc, and kdb5_util best option. Update Kerberos files. Make sure you update the below files to reflect the new REALM assuming your MIT KDC server's domain isn't changed. krb5.conf kdc.conf kadm5.acl Auth-to-local Rules jaas.conf files [if being used by applications] Enable Kerberos Using CM or Ambari the process is straight forward. Please let me know if you need more help

Shelton · ‎08-28-2020

@mahfooz The property value can be modified only in hive-site.xml cluster configuration file. This will oblige you to restart the stale hive configuration and becomes a cluster-wide change rather than a runtime change. HTH

Shelton · ‎08-15-2020

@anass Hive and Impala have 2 distinct use-cases Hive, a data warehouse system is used for analyzing structured data. Uses HQL or the Hive Query Language which gets internally converted to MapReduce jobs which is fault-tolerant and a very good candidate for ETLs and batch-processing. On the other hand, Impala executes faster using an engine designed especially for the mission of interactive SQL over HDFS although unlike Hive, Impala is not fault-tolerance. But a fantastic MPP (Massive Parallel Processing) engine. - Hive generates query expressions at compile time whereas Impala does runtime code generation for “big loops” with no need for data movement and data transformation for storing data on Hadoop, . - Impala no java knowledge is required programmatically accessing the data in HDFS or HBase a basic knowledge of SQL querying can do the work. - Impala performs best when it queries files stored as Parquet format. It's good for sampling data - Apache Hive is not ideal for interactive computing query whereas Impala is meant for interactive computing. - Hive is batch-based Hadoop MapReduce whereas Impala is more like MPP database. - Hive supports complex types but Impala does not. - Apache Hive is fault-tolerant whereas Impala does not support fault tolerance. So its the best candidate for batch processing which is prone to failures When a hive query is run and if the DataNode goes down while the query is being executed, the output of the query will be produced as Hive is fault-tolerant. However, that is not the case with Impala. If a query execution fails in Impala it has to be started all over again. - Hive can transform SQL queries into Spark or MR jobs making it a good choice for long-running ETL jobs for which it is desirable to have fault tolerance because developers do not want to re-run a long-running job after executing it for several hours. For better comparison here is the benchmark HAWQ,Hive and Impala Hope that helps

Online	Offline
Last Visited	‎06-05-2025 02:03 PM

Member Since	‎01-19-2017 04:35 AM
Last Visited	‎06-05-2025 02:03 PM
Posts	3,676
Kudos received	627

Cloudera Community

Re: Apache nifi memory consumption in kubernetes

Re: Nifi toolkit command for GitLabFlowRegistry

Re: Not able to delete the NiFi existing flow usin...

Re: Securing Nifi with SSL and using OIDC provider...

Re: External zookeeper and nifi cluster connection...

Re: Upgrading CDH 5.8 to CDP 7.1

Re: SmartSense Bundle Capture Failure

Re: Hadoop backup with distcp: org.apache.hadoop.s...

Re: Hadoop backup with distcp: org.apache.hadoop.s...

Re: Hadoop backup with distcp: org.apache.hadoop.s...

Re: NODEMANAGERs are going to unknown state

Re: RM is getting down after enabling kerberos

Re: Kerberos configuration for cloudera hadoop clu...

Re: Modifying hive.spark.client.server.connect.tim...

Re: Wich sql engine best solution to use with CDP ...