About Harsh J

Harsh J · ‎01-09-2016

What version of CM are you using? We've also made some improvements in 5.5 that should help with this, and more is on the way. Onto your question of CLI, you can certainly utilise the API to manage your replications just as you would on the UI. Look for the "/replications/" endpoints in the API docs at http://cloudera.github.io/cm_api/apidocs/v11/index.html (more on API and Java+Python bindings on http://cloudera.github.io/cm_api/).

Harsh J · ‎01-09-2016

You may want to read FB's experience with that algo: https://issues.apache.org/jira/browse/HADOOP-6837?focusedCommentId=13687660&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13687660 It looks like you can try https://github.com/yongtang/hadoop-xz (although it seems like pure-java instead of native-extended, but not necessarily a bad thing given LZMA's true goals).

Harsh J · ‎01-03-2016

For the original error, it appears I misread the stack trace the first time. Hive appears to be wanting to write to the local filesystem (on the NodeManager the task executes on), instead of HDFS, for some part of its work: > Caused by: java.io.IOException: Mkdirs failed to create file:/tmp/training/hive_2015-12-10_08-07-28_115_8039040536647708382/_task_tmp.-ext-10001 > at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:434) Thereby, can you ensure the local /tmp directory exists on all your cluster host root filesystems with the drwxrwxrwt permissions? Also try clearing out local directory /tmp/training from every host and re-run the query. > instead the job is just going in pending state If you notice your RM screenshot, it tells there are 0 active nodes. This means your NodeManager is unavailable/dead/not-started, and the RM has no resources to allocate to (thereby the hang in PENDING state, as it is waiting for some NodeManager to come along and satisfy the requested resources of the application). You may want to restart NodeManager services, and/or check its logs if its gone down for some FATAL reason.

Harsh J · ‎01-03-2016

FSCK prints the full identifier of a block, which is useful in some contexts depending on what you're about to troubleshoot or investigate. Here's a break down: BP-929597290-192.0.0.2-1439573305237 = This is a BlockPool (BP) ID. Its the mark of a NameNode's ownership of the block in question. You might recall that HDFS now supports federated namespaces, wherein multiple NameNodes may be served by a single DataNode. This ID is how each NameNode is uniquely identified to be the owner of a held block ID. Even though you do not explicitly utilise federation, the block-pool concept is now inbuilt into the identifier design of HDFS by default. See http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/Federation.html#Multiple_NamenodesNamespaces blk_1074084574_344316 = This is the block ID (blk_X_Y). Each block under every file is uniquely identified by a number X and a sub-number Y (generation stamp). More on block IDs and HDFS architecture can be read in the AOS book: http://aosabook.org/en/hdfs.html DS-730a75d3-046c-4254-990a-4eee9520424f,DISK = This is a storage identifier ID. It helps tell that on the specified DN IP:PORT, which disk (hashed identifier) is actually the one holding the data, and what is the type of the disk (DISK). HDFS now supports tiered storage, in which this comes useful: http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/ArchivalStorage.html (aside of other things).

Harsh J · ‎12-30-2015

Could you re-run the command also with the below env set? $ export HADOOP_ROOT_LOGGER=TRACE,console $ export HADOOP_OPTS="-Dsun.security.krb5.debug=true -Djavax.net.debug=ssl" $ hadoop fs -ls / Is this remote host also carrying the Unlimited JCE policy jars under its JDK, so it may use AES-256 if that is in use?

Harsh J · ‎12-30-2015

Project-history-wise, Apache Hive 0.14 eventually got renamed into Apache Hadoop 1.0. So yes, whatever is in Apache Hive 1.0.x is already in CDH 5.5.1 (which supplies Apache Hive 1.1.0 plus backports). That said, I've not attempted to use the feature (if its not enabled by default).

Harsh J · ‎12-21-2015

Have you already given http://archive.cloudera.com/cdh5/cdh/5/oozie/DG_SLAMonitoring.html#Overview a read?

Harsh J · ‎12-19-2015

CDH5 Hive includes a JSON SerDe from the HCatalog component. Please use that instead: Jar path: /opt/cloudera/parcels/CDH/lib/hive-hcatalog/share/hcatalog/hive-hcatalog-core.jar DDL line snippet: ROW FORMAT SERDE "org.apache.hive.hcatalog.data.JsonSerDe"

Harsh J · ‎12-19-2015

What username are you running the Hive CLI as? And what are your outputs for the below three commands: hadoop fs -ls /tmp/training hadoop fs -ls -d /tmp/training hadoop fs -ls -d /tmp Note that CDH4 is way past its EOL (End Of Life) and is no longer supported by Cloudera. It is recommended to use CDH5 instead.

Harsh J · ‎12-19-2015

This feature is limited to use of Solr 5.x, which is not part of CDH 5 yet (plan is for CDH 6 to carry it). It is noted so on the blog post behind the video referenced: """ Preview of nested Analytics facets Solr 5.1 is seeing new Analytics Facets. A beta support for them has been added and can be enabled in the hue.ini with: [search] latest=true """ - http://gethue.com/dynamic-search-dashboard-improvements-3/ P.s. If you're looking for just the field statistics, it can be found under the Stats tab of the dialog that opens when you click the (?) icon next to any shown field in the filter list.

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Can HDFS Backup Replications be Accessed from ...

Re: LZMA compression codec support

Re: MKDirs failed to create file

Re: what is BP, Blk in fsck output? Can you explai...

Re: Connect to secure hadoop cluster from non-clus...

Re: Vectorized query execution for parquet tables

Re: oozie workflow takes more time than specified ...

Re: Adding Hive SerDe jar on SparkSQL Thrift Serve...

Re: MKDirs failed to create file

Re: Dynamic Search Dashboard improvements in Hue 3...