About vsundaram

vsundaram · ‎09-30-2020

@vincentD Please look at DATANODE logs to check for any FATAL/ERRORs before the shutdown. That could shed some light on the root cause of DN failure.

vsundaram · ‎09-30-2020

@Koffi Heap plays a vital role on overall HDFS performance. So you might need to confirm the usage is legitimate and you have adequate heap configured for Name node and Data nodes. Please refer below documentation. https://docs.cloudera.com/HDPDocuments/HDP2/HDP-2.6.5/bk_command-line-installation/content/configuring-namenode-heap-size.html

vsundaram · ‎07-14-2020

@mike_bronson7 Are you trying to copy local files from your remote machine to destination HDFS cluster? You could use distcp if it is between hdfs. Please refer below documentation. https://docs.cloudera.com/documentation/enterprise/5-5-x/topics/cdh_admin_distcp_data_cluster_migrate.html For the local files in remote machine, you could SCP the files to any of the cluster node which has hdfs client installed and the do a "-copyFromLocal" or "-put" to push that to HDFS. Hope this helps.

vsundaram · ‎07-14-2020

@mike_bronson7 It is recommended to have minimum 3 data nodes in the cluster to accommodate 3 healthy replicas of a block as the default replication factor is 3. HDFS will not write replicas of same blocks on the same data node. In this scenario there will be under replicated blocks and 2 healthy replicas will be placed on the available 2 data nodes.

vsundaram · ‎05-11-2017

From the logs and HDFS usage outputs it confirms the growth is related to HBASE snapshots. We can confirm this by checking if all the snapshots having same time stamp. "list_snapshots" command from hbase shell will provide an output like below. hbase> list_snapshots SYSTEM.CATALOG-ru-20160512 SYSTEM.CATALOG (Thu May 12 01:47:24 +0000 2016) SYSTEM.FUNCTION-ru-20160512 SYSTEM.FUNCTION (Thu May 12 01:47:24 +0000 2016) SYSTEM.SEQUENCE-ru-20160512 SYSTEM.SEQUENCE (Thu May 12 01:47:24 +0000 2016) SYSTEM.STATS-ru-20160512 SYSTEM.STATS (Thu May 12 01:47:32 +0000 2016) US_1-ru-20160512 US_1 (Thu May 12 01:47:32 +0000 2016) ambarismoketest-ru-20160512 ambarismoketest (Thu May 12 01:47:32 +0000 2016) dev.hadoop-ru-20160512 dev.hadoop(Thu May 12 01:47:33 +0000 2016) prod.hadoop-ru-20160512 prod.hadoop (Thu May 12 01:47:35 +0000 2016) compact.daily-ru-20160512 compact.daily (Thu May 12 01:47:43 +0000 2016) compact.hourly-ru-20160512 compact.hourly (Thu May 12 01:47:43 +0000 2016) test-ru-20160512 test (Thu May 12 01:47:43 +0000 2016) We can confirm the timestamp of these snapshots from "hdfs dfs -ls -R /apps/hbase/ " out put as well. drwxr-xr-x - hbase hdfs 0 2016-05-12 01:58 /apps/hbase/data/.hbase-snapshot drwxr-xr-x - hbase hdfs 0 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.CATALOG-ru-20160512 -rw-r--r-- 3 hbase hdfs 55 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.CATALOG-ru-20160512/.snapshotinfo -rw-r--r-- 3 hbase hdfs 972 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.CATALOG-ru-20160512/data.manifest drwxr-xr-x - hbase hdfs 0 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.FUNCTION-ru-20160512 -rw-r--r-- 3 hbase hdfs 57 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.FUNCTION-ru-20160512/.snapshotinfo -rw-r--r-- 3 hbase hdfs 1064 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.FUNCTION-ru-20160512/data.manifest drwxr-xr-x - hbase hdfs 0 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.SEQUENCE-ru-20160512 -rw-r--r-- 3 hbase hdfs 57 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.SEQUENCE-ru-20160512/.snapshotinfo -rw-r--r-- 3 hbase hdfs 16813 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.SEQUENCE-ru-20160512/data.manifest drwxr-xr-x - hbase hdfs 0 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.STATS-ru-20160512 -rw-r--r-- 3 hbase hdfs 51 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.STATS-ru-20160512/.snapshotinfo -rw-r--r-- 3 hbase hdfs 928 2016-05-12 01:47 /apps/hbase/data/.hbase-snapshot/SYSTEM.STATS-ru-20160512/data.manifest Hbase snapshots will be created as part of HDP Upgrade process. During upgrade process “snapshot_all” command will be triggered from “hbase_upgrade.py” script. Hence we can see all the snapshots have the same time stamp. Initially the snapshots will be a reference to the original table. When we run jobs after upgrade or insert data in to Hbase tables, these snapshots will expand with the delta to maintaining its original state. This could cause the gradual increase of snapshot size and hence HDFS size. It is safe to delete the HBASE snapshots since they are just a reference to the original HBASE table. Deletion of the snapshots will clear up respective archive files too. Please remember not to delete archive directly or we will corrupt snapshot.

vsundaram · ‎05-10-2017

This issue is caused due to the size of "~/.beeline/history" file in user's home directory. While we run big queries, it will flood the history file and slow down beeline on start up and shutdown. To resolve this issue, please move "~/.beeline/history" file and retry beeline. In the lower versions (below 2.1.0), we can have a cron job to delete "history" file periodically. This issue has been fixed in latest Hive versions by adding "--maxHistoryRows". Please check the below JIRA for more details. https://issues.apache.org/jira/browse/HIVE-15166

vsundaram · ‎05-10-2017

Error Stack is below. Package Manager failed to install packages. Error: (4, 'Interrupted system call') Traceback (most recent call last): File "/var/lib/ambari-agent/cache/custom_actions/scripts/install_packages.py", line 376, in install_packages retry_count=agent_stack_retry_count File "/usr/lib/python2.6/site-packages/resource_management/core/base.py", line 155, in __init__ self.env.run() File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 160, in run self.run_action(resource, action) File "/usr/lib/python2.6/site-packages/resource_management/core/environment.py", line 124, in run_action provider_action() File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 58, in action_upgrade self.upgrade_package(package_name, self.resource.use_repos, self.resource.skip_repos) File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/yumrpm.py", line 56, in upgrade_package return self.install_package(name, use_repos, skip_repos, is_upgrade) File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/yumrpm.py", line 51, in install_package self.checked_call_with_retries(cmd, sudo=True, logoutput=self.get_logoutput()) File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 86, in checked_call_with_retries return self._call_with_retries(cmd, is_checked=True, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/providers/package/__init__.py", line 98, in _call_with_retries code, out = func(cmd, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 72, in inner result = function(command, **kwargs) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 102, in checked_call tries=tries, try_sleep=try_sleep, timeout_kill_strategy=timeout_kill_strategy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 150, in _call_wrapper result = _call(command, **kwargs_copy) File "/usr/lib/python2.6/site-packages/resource_management/core/shell.py", line 251, in _call ready, _, _ = select.select(read_set, [], [], 1) error: (4, 'Interrupted system call') 1) Navigate to "/etc/yum.repos.d" a) Check if there are any other HDP repositories present other than current version and target version repos. b) Remove all other HDP repositories from "/etc/yum.repos.d" directory. c) Run "yum clean all" d) retry the installation from Ambari UI. 2) Manually run "yum info <package name>" on the node and verify the package information is being listed within expected timeframe. a) If response from yum is slow, please check the "load average" on the node using "top" command. b) If the load average is high, that could cause the system to slow down and hence yum commands to hang. c) Please identify the cause using ‘Sysstat’ Utilities. 3) Increase package installation timeout. a) Login to Ambari server terminal session. b) vi /etc/ambari-server/conf/ambari.properties c) Increase the timeout value for "agent.package.install.task.timeout". d) Run "ambari-server restart"

vsundaram · ‎05-10-2017

@Avijeet Dash Could you please attach a screenshot of your hive view configuration page?

vsundaram · ‎05-05-2017

Hi @Jasper I believe your alert.json would be still pointing to 8042. Could you please try below steps? 1. Go to your Ambari server. 2. vi /var/lib/ambari-server/resources/common-services/YARN/2.1.0.2.0/alerts.json Please change "2.1.0.2.0" according to your environment. 3. Change "default_port" from 8042 to 8044. 4. Save the file and restart Ambari server.

vsundaram · ‎04-24-2017

@Kumar Veerappan Please do find the URL below for HDP2.6.0. Hope this helps. http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_cluster-planning/content/ch_hardware-recommendations_chapter.html

Online	Offline
Last Visited	‎03-11-2021 09:26 AM

Member Since	‎04-27-2016 02:46 PM
Last Visited	‎03-11-2021 09:26 AM
Posts	26
Kudos received	6

Cloudera Community

Re: Namenode Heap Size usage is > 80% on daily ba...

Re: how to copy file from remote server to HDFS

Re: HDFS replica + and min data nodes number in th...

Re: Ambari HIVE view

Re: Cluster Planning Guide

Re: DataNode daemon restarted frequently

Re: Namenode Heap Size usage is > 80% on daily ba...

Re: how to copy file from remote server to HDFS

Re: HDFS replica + and min data nodes number in th...

HDFS capacity usage doubled after HDP upgrade

Beeline hangs only from few sessions and exit with...

HDP-2.6.0.3 package installation failing with "err...

Re: Ambari HIVE view

Re: Yarn-site template for SSL

Re: Cluster Planning Guide