Member since
07-31-2013
1924
Posts
462
Kudos Received
311
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1969 | 07-09-2019 12:53 AM | |
| 11881 | 06-23-2019 08:37 PM | |
| 9147 | 06-18-2019 11:28 PM | |
| 10136 | 05-23-2019 08:46 PM | |
| 4581 | 05-20-2019 01:14 AM |
10-19-2017
10:11 AM
1 Kudo
What command are you using to check the block replica count of each DataNode? That information is present in either the DN metrics (SELECT blocks_total WHERE roleType = DATANODE) or in the NameNode Web UI as a column under its live DataNodes page. The block replica count is not shown as part of 'hdfs dfsadmin -report' and if you are relying on FSCK then ensure you're not counting just block IDs but actual replicas instead. Do the information in these sources still indicate that each of the DataNodes have way lesser replicas than its alert threshold?
... View more
10-16-2017
02:20 AM
For remote HDFS clusters, just ensure to define the required namespace resolving configuration in your HDFS Gateway hdfs-site.xml configuration. Then in Flume you can use the remote namespace defined name. See http://community.cloudera.com/t5/Storage-Random-Access-HDFS/distcp-with-same-nameservicename/m-p/49311/highlight/true#M2631 for more details on how to define this.
... View more
10-15-2017
09:18 PM
Currently the MapReduceIndexerTool appears to hardcode the job names, so it does not appear configurable: https://github.com/cloudera/search/blob/cdh5.13.0-release/search-mr/src/main/java/org/apache/solr/hadoop/MapReduceIndexerTool.java#L812 (and other such setJobName calls in the driver).
... View more
10-04-2017
07:57 AM
Deleted rows are not erased from disk synchronously with the operation, if I understand your question right - they are 'marked' and only truly erased from disk at the next RowSet compaction. If you haven't yet, read https://kudu.apache.org/kudu.pdf (the section of interest is (4), "Tablet storage"), and https://github.com/apache/kudu/blob/master/docs/design-docs/tablet.md#mvcc-mutations-in-memrowset. The latter link also compares Kudu with some other DB systems that use MVCC/etc., including Postgres, which you may find useful.
... View more
09-14-2017
10:34 AM
For posterity, would you be willing to share what those config changes were? In spirit of https://xkcd.com/979/ 🙂
... View more
09-06-2017
02:57 AM
1 Kudo
It appears from your error that your rate of insert is much higher than the rate of flushing. When you do regular mutates (Puts/Deletes) via HBase APIs, the data lands in the WAL and the MemStore. The error is indicating that the MemStore for the targeted region has exceeded its blocking capacity. Usually, when the MemStore for a region nears its configured limit (such as 256 MB), it triggers a HDFS flush. Flushing ~256 MB should be quick enough that the MemStore can be trimmed down again. However, in your case the Flush is likely blocked (waiting in a queue, or waiting on HDFS I/O) or is taking very long. Some ideas: Look in your RegionServer logs (moe-cn05 for example) for "[Ff]lush" related messages around the time of the issue (2017-09-01 ~0700 hours). If you are observing small data size flushes completing in long times, the issue may be on the HDFS I/O (Investigate NN response times, DN connectivity, Network and Disk I/O). If you are seeing flushes occur in regular time, then it may be the flush request queue (CM has an alert for this). You can see the metrics of this RS to find out how many flush requests were waiting in the queue at that point. Increasing the total number of parallel flusher work threads can help drain the request queue faster. If you're observing no flushes complete, it could be a bug or a hang due to some custom logic (if you use coprocessors). Use a jstack output (or visit /stacks on the RS Web UI) to analyze where the flusher threads are hung or if they are waiting to lock some resource thats hung in another thread.
... View more
09-06-2017
12:21 AM
The way a job end is notified back to Oozie at the end of the MR job execution is via the callback interface. Often, depending on your network configuration between NodeManagers and Oozie hosts, or Oozie security configurations (such as TLS and Load Balancers) this callback interaction could break. Could you provide more information on how your cluster is setup? Do you use firewalls, load balancers for Oozie, and/or TLS for Oozie? In the meantime, you should be able to lower the 10 minute recheck interval on the Oozie server oozie-site.xml configuration via the key "oozie.service.ActionCheckerService.action.check.delay" (specified in seconds, its default value is 600 for 10 minutes).
... View more
09-06-2017
12:10 AM
How are you invoking your job? Do you use 'hadoop jar …' to invoke your jar, or are you triggering it with a more raw 'java -cp …' style CLI? If the latter, ensure you also pass the directory '/etc/hadoop/conf/' as an early element on your -cp/CLASSPATH-env. Also ensure your submitting host has a YARN+MR2 gateway deployed on it: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_intro_primer.html#concept_fgj_tny_jk__section_zjt_fwz_xk
... View more
09-05-2017
11:59 PM
Yes. Use of YARN APIs will allow you to distribute and run any arbitrary command. Spark and MR2 are apps that leverage this to run Java commands with wrapper classes that drive their logic and flow, but there's nothing preventing you from writing your own. Take a look at the Distributed Shell application implementation to understand the raw YARN APIs used to run arbitrary commands via YARN allocated resource containers: https://github.com/cloudera/hadoop-common/blob/cdh5.12.0-release/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-distributedshell/src/main/java/org/apache/hadoop/yarn/applications/distributedshell/Client.java#L201 If you're asking of an inbuilt way of running programs over YARN without any code, then aside of the DistributedShell there's no other included implementation. Even with the DistributedShell you may not really get the tight integration (such as result extraction, status viewing, etc.) you require. There's likely a few more higher level frameworks that can make things easier when developing custom YARN apps, such as Spring (https://spring.io/guides/gs/yarn-basic/), Kitten (https://github.com/cloudera/kitten), Cask's CDAP (https://docs.cask.co/cdap/current/en/developers-manual/getting-started/index.html).
... View more
08-28-2017
03:15 AM
1 Kudo
If you're seeing this exception in the Oozie Spark action launcher log, please ignore it as it may be expected in a secure environment. If your launcher/action is truly failing, then the real exception will lie in the logs that follow/other parts of the log - the log4j permission error can be treated as a red herring as you investigate.
... View more