Member since
01-25-2017
396
Posts
28
Kudos Received
11
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
837 | 10-19-2023 04:36 PM | |
4372 | 12-08-2018 06:56 PM | |
5466 | 10-05-2018 06:28 AM | |
19894 | 04-19-2018 02:27 AM | |
19916 | 04-18-2018 09:40 AM |
04-27-2017
11:26 PM
1 Kudo
@ShilpaSinha what is the total size of your cluster and the total blocks. Seems you are writing too much small files. 1- Adding more nodes or disk to handle number of blocks isn't the ideal solution. 2- If you have an enough memory in the namenode, and the storage is fine in the cluster, just add more memory for the namenode service. 3- If you have a scheduled jobs try to figure which job is writing too many files and reduce it's frequency. 4- The best solution to handle this issue is to use a compaction process as part of the job or a separated one.
... View more
04-19-2017
01:56 PM
@mbigelow That's helped me alot, Thanks I made small additions to the command: hdfs dfs -ls /liveperson/data | grep -v storage | awk '{system("hdfs dfs -count " $8) }' | awk '{ gsub(/\/liveperson\/data\/server_/,"hadoop.hdfs.",$4); print $4 ".folderscount",$1"\n"$4 ".filescount",$2"\n"$4 ".size",$3;}' still investigating the usage error for the first run, and want to add a variable before the hadoop.hdfs. Can you help with this. I have a vaibale called DC, and i want to concat it to the path and should looks like this (exampe DC is VA) VA.hadoop.hdfs.$4. I identified $DC
... View more
04-18-2017
06:46 AM
Hi, I need to send the hdfs dfs -count output to graphite, but want to do this on one command rather to do 3 commands: one for the folders count, the files count and the size, I can do this by separated commands like this: hdfs dfs -ls /fawze/data | awk '{system("hdfs dfs -count " $8) }' | awk '{print $4,$2;}' But i want the output to be like this: fawze/data/x.folders 20 fawze/data/x.files 200 fawze/data/x.size 2650 fawze/data/y.folders 25 fawze/data/y.files 2450 fawze/data/y.size 23560 I'm not a linux expert so will appreciate any help.
... View more
Labels:
- Labels:
-
HDFS
04-18-2017
06:30 AM
Hi @adam990, As you can see from the report that you have 1 DataNode Live and not 2. Number of data-nodes: 1 Can you please make sure that you see 2 Live DataNodes at the HDFS
... View more
04-17-2017
11:54 AM
1 Kudo
Hi Sanjeev, Each role should be migrated. moved separaetly, for example: Name node, journal node and the failover controller: https://www.cloudera.com/documentation/enterprise/5-6-x/topics/admin_nn_migrate_roles.html Cloudera manager: https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_ag_restore_server.html zookeeper: https://www.cloudera.com/documentation/enterprise/5-8-x/topics/cm_mc_zookeeper_service.html#ReplacingZkServerCM for datanode: make sure to deco the service on the nodes you want to stop the service on, wait till the under replicated block to closed and then delete the role from the nodes adding datanode role to another node can be done by cloudera manager by going to the HDFS ---> instances and add role instances for NodeManager, only stop and add the same as in the HDFS regarding hive and oozie, you need to stop the services, backup the database and then create the database on the new node. If it not yet aproduction environment, i would recommend to create it from scratch as it safe, easier and faster
... View more
04-17-2017
10:19 AM
1 Kudo
I would recommend the following: a- the 2 DataNodes: 1- HDFS DataNode 2- YARN (MR2 Included) NodeManager 3- Impala Daemon 4- Spark Gateway ( to overcome some bugs in different cloudera manager versions) b- All Cloudera manager services to be on VM machine with 16-24 GB memory will be enough c- Hive roles,Oozie, Hue, and the database ( mysql) to be on another VM as it allow you to create new one in case of disaster. d- 2 other physical servers, both have: 1- YARN (MR2 Included) ResourceManager 2- HDFS NameNode 3- ZooKeeper Server one of them has the following additional roles: 4- YARN (MR2 Included) JobHistory Server 5- Spark History Server 6- HDFS Balancer and the other to have the following roles: 7-Impala Catalog Server 8- Impala StateStore e- i would recommend small node for the HDFS,Spark and Hive Gateway, if not it can be added to the DataNodes.
... View more
04-17-2017
12:47 AM
any new insights?
... View more
04-16-2017
10:54 PM
Hi Adam, Basically you can move any role you need to the other node. Can you please list the roles in the default node?
... View more