About Harsh J

Harsh J · ‎10-19-2017

What command are you using to check the block replica count of each DataNode? That information is present in either the DN metrics (SELECT blocks_total WHERE roleType = DATANODE) or in the NameNode Web UI as a column under its live DataNodes page. The block replica count is not shown as part of 'hdfs dfsadmin -report' and if you are relying on FSCK then ensure you're not counting just block IDs but actual replicas instead. Do the information in these sources still indicate that each of the DataNodes have way lesser replicas than its alert threshold?

Harsh J · ‎10-16-2017

For remote HDFS clusters, just ensure to define the required namespace resolving configuration in your HDFS Gateway hdfs-site.xml configuration. Then in Flume you can use the remote namespace defined name. See http://community.cloudera.com/t5/Storage-Random-Access-HDFS/distcp-with-same-nameservicename/m-p/49311/highlight/true#M2631 for more details on how to define this.

Harsh J · ‎10-15-2017

Currently the MapReduceIndexerTool appears to hardcode the job names, so it does not appear configurable: https://github.com/cloudera/search/blob/cdh5.13.0-release/search-mr/src/main/java/org/apache/solr/hadoop/MapReduceIndexerTool.java#L812 (and other such setJobName calls in the driver).

Harsh J · ‎10-04-2017

Deleted rows are not erased from disk synchronously with the operation, if I understand your question right - they are 'marked' and only truly erased from disk at the next RowSet compaction. If you haven't yet, read https://kudu.apache.org/kudu.pdf (the section of interest is (4), "Tablet storage"), and https://github.com/apache/kudu/blob/master/docs/design-docs/tablet.md#mvcc-mutations-in-memrowset. The latter link also compares Kudu with some other DB systems that use MVCC/etc., including Postgres, which you may find useful.

Harsh J · ‎09-14-2017

For posterity, would you be willing to share what those config changes were? In spirit of https://xkcd.com/979/ 🙂

Harsh J · ‎09-06-2017

It appears from your error that your rate of insert is much higher than the rate of flushing. When you do regular mutates (Puts/Deletes) via HBase APIs, the data lands in the WAL and the MemStore. The error is indicating that the MemStore for the targeted region has exceeded its blocking capacity. Usually, when the MemStore for a region nears its configured limit (such as 256 MB), it triggers a HDFS flush. Flushing ~256 MB should be quick enough that the MemStore can be trimmed down again. However, in your case the Flush is likely blocked (waiting in a queue, or waiting on HDFS I/O) or is taking very long. Some ideas: Look in your RegionServer logs (moe-cn05 for example) for "[Ff]lush" related messages around the time of the issue (2017-09-01 ~0700 hours). If you are observing small data size flushes completing in long times, the issue may be on the HDFS I/O (Investigate NN response times, DN connectivity, Network and Disk I/O). If you are seeing flushes occur in regular time, then it may be the flush request queue (CM has an alert for this). You can see the metrics of this RS to find out how many flush requests were waiting in the queue at that point. Increasing the total number of parallel flusher work threads can help drain the request queue faster. If you're observing no flushes complete, it could be a bug or a hang due to some custom logic (if you use coprocessors). Use a jstack output (or visit /stacks on the RS Web UI) to analyze where the flusher threads are hung or if they are waiting to lock some resource thats hung in another thread.

Harsh J · ‎09-06-2017

The way a job end is notified back to Oozie at the end of the MR job execution is via the callback interface. Often, depending on your network configuration between NodeManagers and Oozie hosts, or Oozie security configurations (such as TLS and Load Balancers) this callback interaction could break. Could you provide more information on how your cluster is setup? Do you use firewalls, load balancers for Oozie, and/or TLS for Oozie? In the meantime, you should be able to lower the 10 minute recheck interval on the Oozie server oozie-site.xml configuration via the key "oozie.service.ActionCheckerService.action.check.delay" (specified in seconds, its default value is 600 for 10 minutes).

Harsh J · ‎09-06-2017

How are you invoking your job? Do you use 'hadoop jar …' to invoke your jar, or are you triggering it with a more raw 'java -cp …' style CLI? If the latter, ensure you also pass the directory '/etc/hadoop/conf/' as an early element on your -cp/CLASSPATH-env. Also ensure your submitting host has a YARN+MR2 gateway deployed on it: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_intro_primer.html#concept_fgj_tny_jk__section_zjt_fwz_xk

Harsh J · ‎09-05-2017

Yes. Use of YARN APIs will allow you to distribute and run any arbitrary command. Spark and MR2 are apps that leverage this to run Java commands with wrapper classes that drive their logic and flow, but there's nothing preventing you from writing your own. Take a look at the Distributed Shell application implementation to understand the raw YARN APIs used to run arbitrary commands via YARN allocated resource containers: https://github.com/cloudera/hadoop-common/blob/cdh5.12.0-release/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java#L201 If you're asking of an inbuilt way of running programs over YARN without any code, then aside of the DistributedShell there's no other included implementation. Even with the DistributedShell you may not really get the tight integration (such as result extraction, status viewing, etc.) you require. There's likely a few more higher level frameworks that can make things easier when developing custom YARN apps, such as Spring (https://spring.io/guides/gs/yarn-basic/), Kitten (https://github.com/cloudera/kitten), Cask's CDAP (https://docs.cask.co/cdap/current/en/developers-manual/getting-started/index.html).

Harsh J · ‎08-28-2017

If you're seeing this exception in the Oozie Spark action launcher log, please ignore it as it may be expected in a secure environment. If your launcher/action is truly failing, then the real exception will lie in the logs that follow/other parts of the log - the log4j permission error can be treated as a red herring as you investigate.

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: block count warning still shows in cloudera ma...

Re: Flume - HDFS HA

Re: How to set job name with MapReduceIndexerTool

Re: Kudu - deleting data

Re: HBase region servers giving org.apache.hadoop....

Re: HBase region servers giving org.apache.hadoop....

Re: Oozie Action Status not updating

Re: Caused by: java.io.IOException: Cannot initial...

Re: Using Yarn as resource manager for standalone ...

Re: Log4j.properties permission denied