About Fawze

Fawze · ‎04-27-2017

@ShilpaSinha what is the total size of your cluster and the total blocks. Seems you are writing too much small files. 1- Adding more nodes or disk to handle number of blocks isn't the ideal solution. 2- If you have an enough memory in the namenode, and the storage is fine in the cluster, just add more memory for the namenode service. 3- If you have a scheduled jobs try to figure which job is writing too many files and reduce it's frequency. 4- The best solution to handle this issue is to use a compaction process as part of the job or a separated one.

Fawze · ‎04-19-2017

@mbigelow That's helped me alot, Thanks I made small additions to the command: hdfs dfs -ls /liveperson/data | grep -v storage | awk '{system("hdfs dfs -count " $8) }' | awk '{ gsub(/\/liveperson\/data\/server_/,"hadoop.hdfs.",$4); print $4 ".folderscount",$1"\n"$4 ".filescount",$2"\n"$4 ".size",$3;}' still investigating the usage error for the first run, and want to add a variable before the hadoop.hdfs. Can you help with this. I have a vaibale called DC, and i want to concat it to the path and should looks like this (exampe DC is VA) VA.hadoop.hdfs.$4. I identified $DC

Fawze · ‎04-18-2017

Hi, I need to send the hdfs dfs -count output to graphite, but want to do this on one command rather to do 3 commands: one for the folders count, the files count and the size, I can do this by separated commands like this: hdfs dfs -ls /fawze/data | awk '{system("hdfs dfs -count " $8) }' | awk '{print $4,$2;}' But i want the output to be like this: fawze/data/x.folders 20 fawze/data/x.files 200 fawze/data/x.size 2650 fawze/data/y.folders 25 fawze/data/y.files 2450 fawze/data/y.size 23560 I'm not a linux expert so will appreciate any help.

Fawze · ‎04-18-2017

Hi @adam990, As you can see from the report that you have 1 DataNode Live and not 2. Number of data-nodes: 1 Can you please make sure that you see 2 Live DataNodes at the HDFS

Fawze · ‎04-17-2017

Hi Sanjeev, Each role should be migrated. moved separaetly, for example: Name node, journal node and the failover controller: https://www.cloudera.com/documentation/enterprise/5-6-x/topics/admin_nn_migrate_roles.html Cloudera manager: https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_ag_restore_server.html zookeeper: https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_mc_zookeeper_service.html#ReplacingZkServerCM for datanode: make sure to deco the service on the nodes you want to stop the service on, wait till the under replicated block to closed and then delete the role from the nodes adding datanode role to another node can be done by cloudera manager by going to the HDFS ---> instances and add role instances for NodeManager, only stop and add the same as in the HDFS regarding hive and oozie, you need to stop the services, backup the database and then create the database on the new node. If it not yet aproduction environment, i would recommend to create it from scratch as it safe, easier and faster

Fawze · ‎04-17-2017

I would recommend the following: a- the 2 DataNodes: 1- HDFS DataNode 2- YARN (MR2 Included) NodeManager 3- Impala Daemon 4- Spark Gateway ( to overcome some bugs in different cloudera manager versions) b- All Cloudera manager services to be on VM machine with 16-24 GB memory will be enough c- Hive roles,Oozie, Hue, and the database ( mysql) to be on another VM as it allow you to create new one in case of disaster. d- 2 other physical servers, both have: 1- YARN (MR2 Included) ResourceManager 2- HDFS NameNode 3- ZooKeeper Server one of them has the following additional roles: 4- YARN (MR2 Included) JobHistory Server 5- Spark History Server 6- HDFS Balancer and the other to have the following roles: 7-Impala Catalog Server 8- Impala StateStore e- i would recommend small node for the HDFS,Spark and Hive Gateway, if not it can be added to the DataNodes.

Fawze · ‎04-17-2017

any new insights?

Fawze · ‎04-16-2017

Hi Adam, Basically you can move any role you need to the other node. Can you please list the roles in the default node?

Fawze · ‎04-09-2017

Try to do it very low and then check.

Fawze · ‎04-06-2017

The Network IO at the same level when it was 4096?

Online	Offline
Last Visited	‎10-19-2023 10:11 PM

Member Since	‎01-25-2017 01:09 PM
Last Visited	‎10-19-2023 10:11 PM
Posts	396
Kudos received	27

Cloudera Community

Re: How to make Yarn deploy resources to new added...

Re: Upgrade to CDH 6.0.x from 5.15

Re: How to define concrete resource consumption fo...

Re: Excution of the following command gives warnin...

Re: Excution of the following command gives warnin...

Re: Datanodes report block count more than thresho...

Re: parsing the HDFS dfs -count output

parsing the HDFS dfs -count output

Re: Move Role to another host

Re: Move Role to another host

Re: Move Role to another host

Re: Wrong spark history redirection for finished j...

Re: Move Role to another host

Re: How to limit speed ?

Re: How to limit speed ?