About HanzalaShaikh

HanzalaShaikh · ‎03-02-2024

@Loaimohamed79 please restart cloudera-scm-agent on all the kafka instances and check if you are still see this error. Were u able to run this command before enabling it?

HanzalaShaikh · ‎11-19-2023

After a thorough research on the space capacity and best practice. I have the recommendations to share and can conclude this as resolved. As a general recommendation, it is suggested not exceeding the data node DFS configured capacity for more than 100TB. Since our current setup is already at 87TB, It is recommended for scaling the cluster horizontally by adding more data nodes with homogeneous configurations. This way we can add more compute as well as additional I/O capacity to the cluster.

HanzalaShaikh · ‎11-14-2023

Hello All, We have 12 disk configured on each data node of 7.3 TB disk size of each, so that makes up the total space on each data node 87.6 TB. Disk size on each data node: /dev/sdb1 7.3T /data1 /dev/sdc1 7.3T /data2 /dev/sdd1 7.3T /data3 /dev/sde1 7.3T /data4 /dev/sdf1 7.3T /data5 /dev/sdg1 7.3T /data6 /dev/sdh1 7.3T /data7 /dev/sdi1 7.3T /data8 /dev/sdj1 7.3T /data9 /dev/sdk1 7.3T /data10 /dev/sdl1 7.3T /data11 /dev/sdm1 7.3T /data12 Due to growth in data we have already come close to the threshold value on each data node. Hence need suggestion/best practice to know is it better to add more disk of the same size or should we increase data nodes. considering we don't compromise with the load on the servers. So if adding nodes would be a better option then we will go for that. We are currently on CDP 7 and I want to ensure we follow the best practice so that should accommodate our space concerns in the long run. Kindly share if any one has to come to this situation and what would be the best approach in this case.

HanzalaShaikh · ‎10-09-2023

Hi hassan, Please check the yarn instance if any of those are down? if not please do rolling restart for yarn and then try running the same command and check if this message persist. Secondly can you share the roles you have assigned to all these 3 nodes.

HanzalaShaikh · ‎10-02-2023

Hi miked, I had done those courses on coursera quiet a while ago. I would suggest you to still go through the training to get your concepts clear however if you talk bout the relevance then yes it is still relevant but the new offering are definitely not present so you will have to do self-study or join Cloudera webinars to gain knowledge on those areas. If you find my suggestion useful then please do reply.

HanzalaShaikh · ‎09-30-2023

Since we are using CDW Cloudera data warehouse, upon doing some research I can say there no change required on code level as we are using hive on Tez and it is going to be a not a major upgrade as we do from CDH to CDP. Hence accepting this as a solution to my query after thorough research.

HanzalaShaikh · ‎09-30-2023

Hi @Noel_0317 I am sharing few hdfs commands to be checked on file system level of hdfs. hdfs dfs -df /hadoop - Shows the capacity, free and used space of the filesystem. hdfs dfs -df -h /hadoop - Shows the capacity, free and used space of the filesystem. -h parameter Formats the sizes of files in a human-readable fashion. hdfs dfs -du /hadoop/file - Show the amount of space, in bytes, used by the files that match the specified file pattern. hdfs dfs -du -s /hadoop/file - Rather than showing the size of each individual file that matches the pattern, shows the total (summary) size. hdfs dfs -du -h /hadoop/file - Show the amount of space, in bytes, used by the files that match the specified file pattern. Formats the sizes of files in a human-readable. fashion Let me know if this helps.

HanzalaShaikh · ‎09-30-2023

Thanks samanta

HanzalaShaikh · ‎09-26-2023

@nysq_sq this seems to be a timeout issue. Need to change the session time out. Can you check few parameters in HUE > Configuration > idle_session_timeout and share what's the value.

HanzalaShaikh · ‎09-25-2023

Hi Experts, To address few of the vulnerabilities present in CDP 7.1.6 we were advised to upgrade to CDP 7.1.7 SP2 version and then we came across with Cloudera releasing the new CDP 7.1.9 version. So, we are confused as to which one should we go for. We are still using HDFS for storage and hive, impala, spark, Tez, oozie as other services in the stack along with JDK 8 and RHEL 7.8, we don't intend to compromise our current flow hope this functionality we can use in CDP 7.1.9. Please share what's your suggestion on this. our architect mostly prefer selecting the -1 version of the current version so technically we should be selecting CDP 7.1.8 which is the -1 version of CDP 7.1.9. The same was done during our last upgrade wherein we moved to CDP 7.1.6 where 7.1.7 was already released.

Online	Offline
Last Visited	‎10-05-2024 10:54 AM

Member Since	‎04-25-2020 12:26 AM
Last Visited	‎10-05-2024 10:54 AM
Posts	43
Kudos received	5

Cloudera Community

Re: Is it best to increase the disk or datanode to...

Re: CDP minor upgrade to close vulnerability

Re: Oozie workflow failed for sqoop import job

Re: Impala Daemon Shutdown and unable to start up

Re: Error in Kafka when enable TLS

Re: Is it best to increase the disk or datanode to...

Is it best to increase the disk or datanode to add...

Re: Yarn Application list command gives RetryUpToM...

Re: Training material on coursera

Re: CDP minor upgrade to close vulnerability

Re: hdfs-datanode size

Re: Cloudera Upgrade either to CDP 7.1.7 SP2 or CD...

Re: Unable to connect Hive with Hue

Cloudera Upgrade either to CDP 7.1.7 SP2 or CDP 7....