Member since
04-27-2016
26
Posts
6
Kudos Received
6
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1203 | 09-30-2020 10:30 AM | |
9361 | 07-14-2020 12:40 PM | |
2468 | 07-14-2020 12:27 PM | |
5080 | 05-10-2017 03:45 PM | |
1516 | 04-24-2017 09:14 PM |
04-17-2024
03:10 PM
1 Kudo
I was looking for the same info and found that great link below. https://hadoop.apache.org/docs/current/hadoop-distcp/DistCp.html I hope it can help you. Best,
... View more
09-30-2020
10:14 PM
Hello @vincentD Please review the stdout and stderr of the DN which going down frequently. You can navigate to CM > HDFS > Instance > Select the DN which went down > Processes > click on stdout/stderr atthe bottom of the page. I am asking to verify stdout/stderr suspecting an OOM error (due to java heap running out of memory) leading to the DN exit/shutdown abruptly. If the DN exit is due to OOM Error, please increase the DN heap size to adequate value to get rid off teh issue further. DN heap sizing rule of thumb says: 1 GB heap memory for 1Million blocks. You can verify your block counts on each DN by navigating to CM > HDFS > NN Web UI > Active NN > DataNode and you can see the DN stats on that page showing block counts and disk usage etc..
... View more
09-30-2020
10:30 AM
@Koffi Heap plays a vital role on overall HDFS performance. So you might need to confirm the usage is legitimate and you have adequate heap configured for Name node and Data nodes. Please refer below documentation. https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.5/bk_command-line-installation/content/configuring-namenode-heap-size.html
... View more
07-14-2020
12:27 PM
1 Kudo
@mike_bronson7 It is recommended to have minimum 3 data nodes in the cluster to accommodate 3 healthy replicas of a block as the default replication factor is 3. HDFS will not write replicas of same blocks on the same data node. In this scenario there will be under replicated blocks and 2 healthy replicas will be placed on the available 2 data nodes.
... View more
05-11-2017
02:35 PM
1 Kudo
From the logs and HDFS usage outputs it confirms the growth is related to HBASE snapshots. We can confirm this by checking if all the snapshots having same time stamp.
"list_snapshots" command from hbase shell will provide an output like below. hbase> list_snapshots
SYSTEM.CATALOG-ru-20160512 SYSTEM.CATALOG (Thu May 12 01:47:24 +0000 2016)
SYSTEM.FUNCTION-ru-20160512 SYSTEM.FUNCTION (Thu May 12 01:47:24 +0000 2016)
SYSTEM.SEQUENCE-ru-20160512 SYSTEM.SEQUENCE (Thu May 12 01:47:24 +0000 2016)
SYSTEM.STATS-ru-20160512 SYSTEM.STATS (Thu May 12 01:47:32 +0000 2016)
US_1-ru-20160512 US_1 (Thu May 12 01:47:32 +0000 2016)
ambarismoketest-ru-20160512 ambarismoketest (Thu May 12 01:47:32 +0000 2016)
dev.hadoop-ru-20160512 dev.hadoop(Thu May 12 01:47:33 +0000 2016)
prod.hadoop-ru-20160512 prod.hadoop (Thu May 12 01:47:35 +0000 2016)
compact.daily-ru-20160512 compact.daily (Thu May 12 01:47:43 +0000 2016)
compact.hourly-ru-20160512 compact.hourly (Thu May 12 01:47:43 +0000 2016)
test-ru-20160512 test (Thu May 12 01:47:43 +0000 2016)
We can confirm the timestamp of these snapshots from "hdfs dfs -ls -R /apps/hbase/ " out put as well. drwxr-xr-x - hbase hdfs 0 2016-05-12 01:58 /apps/hbase/data/.hbase-snapshot
drwxr-xr-x - hbase hdfs 0 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.CATALOG-ru-20160512
-rw-r--r-- 3 hbase hdfs 55 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.CATALOG-ru-20160512/.snapshotinfo
-rw-r--r-- 3 hbase hdfs 972 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.CATALOG-ru-20160512/data.manifest
drwxr-xr-x - hbase hdfs 0 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.FUNCTION-ru-20160512
-rw-r--r-- 3 hbase hdfs 57 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.FUNCTION-ru-20160512/.snapshotinfo
-rw-r--r-- 3 hbase hdfs 1064 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.FUNCTION-ru-20160512/data.manifest
drwxr-xr-x - hbase hdfs 0 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.SEQUENCE-ru-20160512
-rw-r--r-- 3 hbase hdfs 57 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.SEQUENCE-ru-20160512/.snapshotinfo
-rw-r--r-- 3 hbase hdfs 16813 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.SEQUENCE-ru-20160512/data.manifest
drwxr-xr-x - hbase hdfs 0 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.STATS-ru-20160512
-rw-r--r-- 3 hbase hdfs 51 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.STATS-ru-20160512/.snapshotinfo
-rw-r--r-- 3 hbase hdfs 928 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.STATS-ru-20160512/data.manifest
Hbase snapshots will be created as part of HDP Upgrade process. During upgrade process “snapshot_all” command will be triggered from “hbase_upgrade.py” script. Hence we can see all the snapshots have the same time stamp. Initially the snapshots will be a reference to the original table. When we run jobs after upgrade or insert data in to Hbase tables, these snapshots will expand with the delta to maintaining its original state. This could cause the gradual increase of snapshot size and hence HDFS size.
It is safe to delete the HBASE snapshots since they are just a reference to the original HBASE table. Deletion of the snapshots will clear up respective archive files too. Please remember not to delete archive directly or we will corrupt snapshot.
... View more
Labels:
05-10-2017
09:47 PM
This issue is caused due to the size of "~/.beeline/history" file in user's home directory. While we run big queries, it will flood the history file and slow down beeline on start up and shutdown. To resolve this issue, please move "~/.beeline/history" file and retry beeline.
In the lower versions (below 2.1.0), we can have a cron job to delete "history" file periodically. This issue has been fixed in latest Hive versions by adding "--maxHistoryRows". Please check the below JIRA for more details.
https://issues.apache.org/jira/browse/HIVE-15166
... View more
Labels:
05-10-2017
06:09 PM
1 Kudo
Error Stack is below. Package Manager failed to install packages. Error: (4, 'Interrupted system call')
Traceback (most recent call last):
File "/var/lib/ambari-agent/cache/custom_actions/scripts/install_packages.py", line 376, in install_packages
retry_count=agent_stack_retry_count
File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__
self.env.run()
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run
self.run_action(resource, action)
File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action
provider_action()
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 58, in action_upgrade
self.upgrade_package(package_name, self.resource.use_repos, self.resource.skip_repos)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/yumrpm.py", line 56, in upgrade_package
return self.install_package(name, use_repos, skip_repos, is_upgrade)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/yumrpm.py", line 51, in install_package
self.checked_call_with_retries(cmd, sudo=True, logoutput=self.get_logoutput())
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 86, in checked_call_with_retries
return self._call_with_retries(cmd, is_checked=True, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 98, in _call_with_retries
code, out = func(cmd, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner
result = function(command, **kwargs)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call
tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper
result = _call(command, **kwargs_copy)
File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 251, in _call
ready, _, _ = select.select(read_set, [], [], 1)
error: (4, 'Interrupted system call')
1) Navigate to "/etc/yum.repos.d" a) Check if there are any other HDP repositories present other than current version and target version repos. b) Remove all other HDP repositories from "/etc/yum.repos.d" directory. c) Run "yum clean all" d) retry the installation from Ambari UI. 2) Manually run "yum info <package name>" on the node and verify the package information is being listed within expected timeframe. a) If response from yum is slow, please check the "load average" on the node using "top" command. b) If the load average is high, that could cause the system to slow down and hence yum commands to hang. c) Please identify the cause using ‘Sysstat’ Utilities. 3) Increase package installation timeout. a) Login to Ambari server terminal session. b) vi /etc/ambari-server/conf/ambari.properties c) Increase the timeout value for "agent.package.install.task.timeout". d) Run "ambari-server restart"
... View more
Labels:
05-09-2017
08:02 AM
1 Kudo
I have a solution for option 1.) now. If you add a custom yarn-site.xml property on Ambari like below, the SSL nodemanager WebUI starts on port 8042 yarn.nodemanager.webapp.https.address 0.0.0.0:8042
... View more