Member since
07-31-2013
1924
Posts
462
Kudos Received
311
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 1978 | 07-09-2019 12:53 AM | |
| 11921 | 06-23-2019 08:37 PM | |
| 9178 | 06-18-2019 11:28 PM | |
| 10174 | 05-23-2019 08:46 PM | |
| 4600 | 05-20-2019 01:14 AM |
01-09-2016
10:03 PM
2 Kudos
What version of CM are you using? We've also made some improvements in 5.5 that should help with this, and more is on the way. Onto your question of CLI, you can certainly utilise the API to manage your replications just as you would on the UI. Look for the "/replications/" endpoints in the API docs at http://cloudera.github.io/cm_api/apidocs/v11/index.html (more on API and Java+Python bindings on http://cloudera.github.io/cm_api/).
... View more
01-09-2016
09:46 PM
You may want to read FB's experience with that algo: https://issues.apache.org/jira/browse/HADOOP-6837?focusedCommentId=13687660&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13687660 It looks like you can try https://github.com/yongtang/hadoop-xz (although it seems like pure-java instead of native-extended, but not necessarily a bad thing given LZMA's true goals).
... View more
01-03-2016
09:38 PM
For the original error, it appears I misread the stack trace the first time. Hive appears to be wanting to write to the local filesystem (on the NodeManager the task executes on), instead of HDFS, for some part of its work: > Caused by: java.io.IOException: Mkdirs failed to create file:/tmp/training/hive_2015-12-10_08-07-28_115_8039040536647708382/_task_tmp.-ext-10001 > at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:434) Thereby, can you ensure the local /tmp directory exists on all your cluster host root filesystems with the drwxrwxrwt permissions? Also try clearing out local directory /tmp/training from every host and re-run the query. > instead the job is just going in pending state If you notice your RM screenshot, it tells there are 0 active nodes. This means your NodeManager is unavailable/dead/not-started, and the RM has no resources to allocate to (thereby the hang in PENDING state, as it is waiting for some NodeManager to come along and satisfy the requested resources of the application). You may want to restart NodeManager services, and/or check its logs if its gone down for some FATAL reason.
... View more
01-03-2016
09:17 PM
FSCK prints the full identifier of a block, which is useful in some contexts depending on what you're about to troubleshoot or investigate. Here's a break down: BP-929597290-192.0.0.2-1439573305237 = This is a BlockPool (BP) ID. Its the mark of a NameNode's ownership of the block in question. You might recall that HDFS now supports federated namespaces, wherein multiple NameNodes may be served by a single DataNode. This ID is how each NameNode is uniquely identified to be the owner of a held block ID. Even though you do not explicitly utilise federation, the block-pool concept is now inbuilt into the identifier design of HDFS by default. See http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/Federation.html#Multiple_NamenodesNamespaces blk_1074084574_344316 = This is the block ID (blk_X_Y). Each block under every file is uniquely identified by a number X and a sub-number Y (generation stamp). More on block IDs and HDFS architecture can be read in the AOS book: http://aosabook.org/en/hdfs.html DS-730a75d3-046c-4254-990a-4eee9520424f,DISK = This is a storage identifier ID. It helps tell that on the specified DN IP:PORT, which disk (hashed identifier) is actually the one holding the data, and what is the type of the disk (DISK). HDFS now supports tiered storage, in which this comes useful: http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html (aside of other things).
... View more
12-30-2015
09:51 AM
1 Kudo
Could you re-run the command also with the below env set? $ export HADOOP_ROOT_LOGGER=TRACE,console $ export HADOOP_OPTS="-Dsun.security.krb5.debug=true -Djavax.net.debug=ssl" $ hadoop fs -ls / Is this remote host also carrying the Unlimited JCE policy jars under its JDK, so it may use AES-256 if that is in use?
... View more
12-30-2015
09:25 AM
Project-history-wise, Apache Hive 0.14 eventually got renamed into Apache Hadoop 1.0. So yes, whatever is in Apache Hive 1.0.x is already in CDH 5.5.1 (which supplies Apache Hive 1.1.0 plus backports). That said, I've not attempted to use the feature (if its not enabled by default).
... View more
12-21-2015
07:12 PM
1 Kudo
Have you already given http://archive.cloudera.com/cdh5/cdh/5/oozie/DG_SLAMonitoring.html#Overview a read?
... View more
12-19-2015
10:18 AM
1 Kudo
CDH5 Hive includes a JSON SerDe from the HCatalog component. Please use that instead: Jar path: /opt/cloudera/parcels/CDH/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar DDL line snippet: ROW FORMAT SERDE "org.apache.hive.hcatalog.data.JsonSerDe"
... View more
12-19-2015
10:16 AM
What username are you running the Hive CLI as? And what are your outputs for the below three commands: hadoop fs -ls /tmp/training hadoop fs -ls -d /tmp/training hadoop fs -ls -d /tmp Note that CDH4 is way past its EOL (End Of Life) and is no longer supported by Cloudera. It is recommended to use CDH5 instead.
... View more
12-19-2015
10:03 AM
1 Kudo
This feature is limited to use of Solr 5.x, which is not part of CDH 5 yet (plan is for CDH 6 to carry it). It is noted so on the blog post behind the video referenced: """ Preview of nested Analytics facets Solr 5.1 is seeing new Analytics Facets. A beta support for them has been added and can be enabled in the hue.ini with: [search] latest=true """ - http://gethue.com/dynamic-search-dashboard-improvements-3/ P.s. If you're looking for just the field statistics, it can be found under the Stats tab of the dialog that opens when you click the (?) icon next to any shown field in the filter list.
... View more