Member since
09-11-2018
76
Posts
7
Kudos Received
5
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
339 | 07-15-2021 06:18 AM | |
666 | 06-21-2021 01:36 AM | |
763 | 06-21-2021 01:29 AM | |
833 | 04-19-2021 10:17 AM | |
1857 | 02-09-2020 10:24 PM |
09-17-2021
03:09 AM
Hi @Chetankumar , Given you have heterogeneous storage & HDFS follows rack topology to balance the blocks across the datanodes. Currently the DataNode Volume uses Round Robin policy, we change it to Available Space policy. This means new data will be written to the lesser used disks.By doing that it will chose datanode based on available space. This can help in your case You can avail the below settings in HDFS - CM->HDFS-> config -> DataNode Volume Choosing Policy -> change to Available Space Save changes and restart datanodes. If that helps, Please feel free to mark the post as Accepted solution. regards, Vipin
... View more
07-15-2021
06:18 AM
1 Kudo
Hi @Amn_468 In Kudu Table is divided into multiple tablets and those tables are distributed across the cluster. So the table data will be stored across multiple TS (kudu nodes) You can get that info from Kudu master WebUI CM->Kudu ->Webui -> Tables ->select table curl -i -k --negotiate -u : "http://Abcde-host:8051/tables" Also, You can run ksck command to get that info :- https://kudu.apache.org/docs/command_line_tools_reference.html#table-list Does that answer your question, if yes please feel free to mark the post as solution accepted and give a thumbs up. regards,
... View more
06-21-2021
03:49 AM
Hi @FEIDAI Check in hdfs trash if the deleted folder is there. (if you haven't used -skipTrash) If you manage to find the folder under trash, copy it to your destination path hdfs dfs -cp /user/hdfs/.Trash/Current/<your file> <destination> Otherwise, The best option is probably to find and use a data recovery tool or backup, Regards,
... View more
06-21-2021
01:36 AM
Hi @sakitha Seems to be a known issue. Is the topic whitelist is set to " * " ? Can you please try with dot - " .*" Let us know if that works for you. Regards, ~ If the above answers your questions. Please give a thumbs up and mark the post as accept as solution.
... View more
06-21-2021
01:29 AM
1 Kudo
Hi @wert_1311 That's right, balancer just balances the tablet across the kudu cluster. If one host is consuming more space, it could be that the size of tablets is huge. Thats right, Kudu cant rebalance like HDFS based on dfs usage. one of the workaround you can try:- - Stop that specific kudu TS role - Run ksck until it comes healthy. - once ksck is healthy, rebuild that particular Kudu TS (rebuilding = wiping all data and wal dir) https://kudu.apache.org/docs/administration.html#rebuilding_kudu - start that specific TS - Run rebalance again That should help. Let me know how did that go. Cheers, ~ If that answers your question - Please give the thumbs up & mark the post as accept as solution.
... View more
06-17-2021
12:39 PM
Hi @wert_1311 , Check for Tablet distribution across tablet servers. For some reason if one tablet server goes down/unavailable, the data will be replicated to other tablet servers. You get can get number of tablets per tablet server using this command :- sudo -u kudu kudu table list <csv of master addresses> -list_tablets | grep "^ " | cut -d' ' -f6,7 | sort | uniq -c If you find the tablet distribution is uneven. You can go ahead with kudu rebalance tool to balance your cluster. https://docs.cloudera.com/runtime/7.2.2/administering-kudu/topics/kudu-running-tablet-rebalancing-tool.html Let me know how did that go. If that answers your question, Please mark this post as "accept as solution" Regards,
... View more
04-20-2021
11:08 PM
Hi @sipocootap2 Answered it here - https://community.cloudera.com/t5/Support-Questions/How-can-I-get-fsimage-with-curl-command/m-p/314859/highlight/false#M226223 cheers,
... View more
04-20-2021
10:11 PM
Hi @sipocootap2 , AFAIK "/getimage" is deprecated in CDH and we suggest this not to be used. Instead you can use the command " hdfs dfsadmin -fetchImage <dir>" to download & save the latest fsimage. Based on research, in earlier versions of CDH, the getImage method was available after which a need was realized to provide a proper command/utility to download the FSimage and as a result of which "hdfs dfsadmin -fetchImage" was born. Once that was put in place, the getImage was removed. Does that answers your questions ? If yes, feel free to mark this post as "accept as solution" Regards,
... View more
04-20-2021
11:43 AM
Hi @ROACH Ideally we recommend 1gb heap per 1 million blocks. Also , How much memory you actually need depends on your workload, especially on the number of files, directories, and blocks generated in each namespace. Type of hardware VM or bare metal etc also taken into account. https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_nn_memory_config.html Also have a look at examples of estimating namenode heap memory https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/hdfs-overview/topics/hdfs-examples-namenode-heap-memory.html If write intensive operations or snapshots operations are being performed on Cluster oftenly then 6-9 gb sounds fine. I would suggest to do grep for GC in Namenode logs and if you see long Pauses says for more than 3-5 seconds then its a good starting point to increase the heap size. Does that answer your question. do let us know. regards,
... View more
04-20-2021
11:16 AM
Hi @Chetankumar , I think we answered in this thread :- https://community.cloudera.com/t5/Support-Questions/How-to-move-block-from-one-mount-point-to-other-and-remove/td-p/314861 If that answers all your questions, feel free to mark the post "accepted solution" Regards,
... View more
04-20-2021
11:05 AM
Hi @rubysimmons63 Also, Falcon is explained in detail in contrast with Atlas here, do check it out for better understanding :- https://community.cloudera.com/t5/Support-Questions/What-is-the-difference-between-Apache-atlas-and-Apache/m-p/122450 https://www.cloudera.com/products/open-source/apache-hadoop/apache-falcon.html Regards,
... View more
04-20-2021
10:53 AM
Hi @ryu , We can trigger manual GC in datanode JVM application AFAIK. The best way to deal with long GC pauses is to allocate right amount of heap memory. We recommend the formula of - 1gb heap per 1 million blocks. You can get the number of block count from NameNode Webui -> datanode (through ambari or CM). Increase the heap and that should fix your issue. Do check for "No GC detected" in Datanode logs, if you see those then it could be hardware problem triggering GC. Does that answers your questions. Let me know Regards, Vipin
... View more
04-20-2021
10:43 AM
Hi @Seeker90 , The ERROR message that you see is because you are running ZK in standalone mode. This is more of a warning than ERROR. Invalid configuration, only one server specified (ignoring) Further i see the ZK started properly, However while reading snapshots it throws Exception. Probable causes of Canary test failure & ZooKeeper Quorum: 1. Max Client Connections is set too low - 2. Long fsyncs (disk writes) - 3. Insufficient heap (long GCs) - Try below :- 1. Increasing ZK heap size (maybe undersized heap or if size of snapshots is huge, increasing heap would be good starting point ) 2. Increase maximum number of connection to 300 3. grep for "fsync" in ZK logs. Check if ZK disk is independent. Does that answers your questions. Do let us know. Regards,
... View more
04-19-2021
10:17 AM
Hi @Chetankumar You can perform disk hot swap of DN. https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/admin_dn_swap.html If the Replication factor is set to 3 for all the files then taking down one disk shouldn't be a problem as Namenode will auto-replicate the under-replicated blocks. As part of small test first stop the datanode and wait for sometime (While NN copies the blocks to other available datanodes). Run fsck to confirm if HDFS file system is healthy. When it is healthy, you can easily play around with that stopped datanode. Idea is to ensure the replication factor to 3 so that you dont incur any dataloss. if the Replication factor is set to 1 for some files and if those blocks are hosted on that /data01 disk. Then it could be a potential loss. As long as you have RF=3 you would be good. Does that answer your questions ? Let us know Regards,
... View more
03-31-2021
03:28 AM
1 Kudo
Hi @rocky_tian Although it appears to be https://issues.apache.org/jira/browse/KUDU-2412 but since you are on el7. Not sure if this jira is valid for your case. Ensure all the prerequisites kudu libraries are installed as mentioned in the doc :- https://kudu.apache.org/docs/installation.html#_install_on_rhel_or_centos_hosts Related external link - https://stackoverflow.com/questions/52526013/how-to-read-from-kudu-to-python Regards, Vipin
... View more
03-16-2021
01:38 AM
Hi @rOckChew Its consensus error. we would need to validate if all masters are voting to leader. In a multi master Kudu environment, if a master is restarted or goes offline for a few minutes, it can occasionally have trouble joining the cluster on startup. For example, if this happens in case of three kudu masters, and one of the other two masters is stopped or dies during this time, then the overall Kudu cluster is down because the majority of the masters are not running. This issue is resolved by the KUDU-2748 upstream JIRA. https://my.cloudera.com/knowledge/TSB-2020-442-Kudu-Masters-unable-to-join-back-after-a?id=304920 Run sudo -u kudu kudu cluster ksck <master1,master2,master3> to find no ‘*’ on one of the masters’ consensus. >>A quick workaround is to restart all the kudu masters together. >> Rewrite kudu master consensus Thanks Vipin
... View more
03-02-2021
09:15 PM
Hi @JeromeAlbin Looks like https://issues.apache.org/jira/browse/IMPALA-9486 The Error pop up because you are connecting to Impala anonymously (no user, no password). You can specify a user (even if it's not declared in Kudu), then it should work Please read the page 12 of the following document: https://docs.cloudera.com/documentation/other/connectors/impala-jdbc/2-6-15/Cloudera-JDBC-Driver-for-Impala-Install-Guide.pdf Using User Name ----------------------- This authentication mechanism requires a user name but does not require a password. The user name labels the session, facilitating database tracking. Does that answer your question ? if yes, then feel free to mark this post "accept as solution" Regards, vipin
... View more
02-04-2021
02:24 AM
Hi @Smashedcat32 To give some background on the ZooKeeper Canary, the ServiceMonitor will regularly check the health of the ZooKeeper Service by 1. connecting to the ZooKeeper quorum and locate the leader 2. create a znode 3. read the znode 4 deleting the znode. If any of these steps fail the ServiceMonitor will report the ZOOKEEPER_CANARY_HEALTH has become bad. In the health reported above, the reason was "Canary test failed to establish a connection or a client session to the ZooKeeper service", which means it failed on step 1. The problem could lie in three locations: 1. The ZooKeeper Quorum - Fsync, low GC , Low max client connections 2. The Service Monitor - false reports 3. Network connectivity between the Service Monitor and the ZooKeepers Now coming to your query regarding canary test commands, i dont think we have it available in docs. You can use the commands from ZK guide to test Example - To verify if the ZK instance is leader echo stat | nc ZOOKEEPER_IP ZOOKEEPER PORT | grep Mode http://www.corejavaguru.com/bigdata/zookeeper/cli https://zookeeper.apache.org/doc/r3.3.3/zookeeperStarted.html#sc_ConnectingToZooKeeper
... View more
02-03-2021
08:41 AM
The disk space occupied by a deleted row is only reclaimable via compaction and given you have deleted some data and if the space is not reclaimed then probably you are hitting the bug https://issues.apache.org/jira/browse/KUDU-1625 The jira stands unresolved. However if the goal is to delete the data and reclaim disk space, then you can drop partition (if range partition) in order to reclaim space. Tombstone tablets have all their data removed from disk and don't consume significant resources. These tablet are necessary for correct operation of kudu. See - https://docs.cloudera.com/runtime/7.1.0/troubleshooting-kudu/topics/kudu-tombstoned-or-stopped-tablet-replicas.html
... View more
02-03-2021
03:32 AM
Ideally if you have dropped the table then the data should get deleted immediately. The metrics in CM may take some time to reflect, we can verify from backend if the table is actually deleted. Verify if the table still exist in kudu FS. You can verify this by using kudu ksck command with -tables flags :- kudu cluster ksck <master_addresses> -tables=<tables> Note if the table created through impala use "impala::db.tablename" If you see the table in ksck then run below command to delete the table from kudu:- kudu table delete <master_addresses> <table_name>
... View more
02-03-2021
12:20 AM
1 Kudo
HI @vidanimegh Ensure if you are able to do forward and reverse dns lookup., Iptables are off. Perform CM agent hard restart. Whats the java version, There's this bug https://bugs.openjdk.java.net/browse/JDK-8215032 wherein Servers with Kerberos enabled stop functioning. That could be a possibility
... View more
01-28-2021
12:04 AM
@mike_bronson7 Adding to @GangWar :- To your question - dose this action could also affected the data itself on the data-nodes machines ? No it doesnt affect data on datanode directly. This is metadata operation on namenode which when need to be run when NameNode fails to progress through the edits or fsimage then the NameNode may need to be started with -recover option. Since the metadata has reference to the blocks on datanode, hence this is a critical operation and may incur data loss.
... View more
01-27-2021
11:37 PM
Adding to @smdas This is one of the kudu limitations :- " There is no way to run compaction manually, but dropping the table will reclaim the space immediately." You can verify the size from CM graphs:- Go to the Kudu service and navigate to the Charts Library tab. On the left-hand side menu, click Tables to display the list of tables currently stored in Kudu. Click on a table name to view the default dashboard for that table. The Total Tablet Size On Disk Across Kudu Replicas chart displays the total size of the table on disk using a time-series chart. Hovering with your mouse over the line on the chart opens a small pop-up window that displays information about that data point. Click the data stream within the chart to display a larger pop-up window that includes additional information for the table at the point in time where the mouse was clicked. reference :- http://apache.github.io/kudu/docs/known_issues.html#_other_usage_limitations
... View more
11-05-2020
12:56 AM
Hi Prash, By default CM will throw an alert on CM->HDFS ->status page when the namenode is in Safe Mode. NameNode Safemode Health Test - Enables the health test that the NameNode is not in safemode https://docs.cloudera.com/documentation/enterprise/6/properties/6.1/topics/cm_props_cdh5150_hdfs.html You can configure email Alert, using the below https://docs.cloudera.com/documentation/enterprise/latest/topics/cm_ag_email.html Let me know if that helps. Thanks Vipin
... View more
02-09-2020
10:24 PM
1 Kudo
@Amn_468 , Depends if you are receiving the file descriptor warning message across all the TS and if you happen to change the fds threshold across kudu Tserver roles then you should be doing in non prod hours or ensure no jobs are running. Please be informed that kudu doesnt offer rolling restart feature. Service level config TS CM->kudu->config->file descriptor for TS-> require all TS restart However if the warning appears only for one specific TS and if you plan to change the thershold for one specific TS then it shouldnt cause any issue as its a distributed system and one worker role restart wont impact on your ongoing operations. CM->kudu->instances->select the TS instance which throws the warning-> config-> set the fds value ->restart specific TS
... View more
02-06-2020
05:58 AM
Hi @AM , The alert regarding fd is a warning message that you have crossed the warning threshold of open fds for kudu. Ideally there are 3 sources which hold up the fds in kudu. 1. File cache 2. Hot replica 3. cold replica Typically there is no downside of increasing the fds in kudu tserver. You can increase the fds from CM->kudu->config->file descriptor ->increase the value to 64k or you may also tune the threshold and set the warning to 75%. Restart of TS is required if you tune the file descriptor. Hope this helps!!
... View more
02-06-2020
02:21 AM
Hi @dsht6955 , Can you try running the hdfs oiv command with -t flag as show below and see if that helps: hdfs oiv -p Delimited -delimiter delimiterString -t temporaryDir -i fsimage -o output.xml Thanks Vipin
... View more
02-04-2020
10:24 AM
It seems NN refuses to connect to DN, it could be DN is not there in include list file on NN or maybe DNS issue (check /etc/hosts). Can you run the below command and see if the error goes off- hdfs dfsadmin -refreshNodes Further, Check this link - https://stackoverflow.com/questions/17252955/getting-the-following-error-datanode-denied-communication-with-namenode-while/29598059#29598059
... View more
01-06-2020
05:49 AM
CDP isn't 100% open source. You may have to purchase the subscription. Trial can be download from here- https://www.cloudera.com/downloads.html "THE TRIAL VERSION INCLUDES ALL FEATURES OF THE FULL PRODUCT AND IS VALID FOR 60 DAYS FROM THE TIME OF INSTALLATION."
... View more
08-12-2019
04:51 AM
Hi Vijaya, Under-replicated blocks are the blocks which does not meet the target replication factor. HDFS has self healing mechanism wherein it will create new replicas of under-replicated blocks until they meet the target replication. if multi node cluster then you can verify if any datanodes are down Or if it is one node cluster then please ensure replication factor is set to 1. Thanks.
... View more