About Harsh J

Harsh J · ‎08-11-2016

Please add more notes on how you've installed CDH and how you're managing it. If you use Cloudera Manager, then does the host you use to run this command have a YARN Gateway role added and deployed on it? Specifically, do you have a /etc/hadoop/conf/ directory with all required config files? If not this error would pop up.

Harsh J · ‎08-11-2016

@sgiri - This has already been answered above. If you use un-secure YARN, all your containers and the commands the container runs (such as in shell action) will run as the "yarn" user (i.e. the user your NodeManager daemon runs as). If you want to run containers in YARN as the actual submitting user, use the LinuxContainerExecutor instead. The caveat of using it is that you need your user accounts available on all NodeManagers so a setuid can be done against them when spawning the container processes.

Harsh J · ‎08-08-2016

> Could it be related to region being split during the write ? If yes, why ? This is the right question to ask. The NotServingRegion exception occurs when the region for a given key is not online yet on a particular RS. Note that clients reach a specific RS by first querying meta, and if the sought RS claims its not serving the region then it could mean two things: (1) The region was recently moved around by the HBase Balancer, or (2) The region experienced a split and there are two parts in it now. In either case, the client treats the error as a cue to refetch meta content and then retry based on newer information present in it, which leads it to the newer location for the sought row key. Given your workload is insert and you only see this happen when you insert, its likely your problem is (2). Its easy to prove that by simply measuring how many regions were at the beginning (before the query runs) and how many there are at end (after the query completes or stalls) via the HMaster's Table UI or its equivalent page inside CM's HBase -> Tables area. Splitting is a good thing for scalability but it can certainly hamper your insert rate cause when the split is occurring all writes are temporarily rejected (this usually is only for a short period of time) and there is also an eventual major compaction that needs to occur later to truly split the singular region data store of the parent into two stores on HDFS (adds additional I/O expense). If by your observation of the numbers splits are indeed occurring, you can prevent them by reviewing and raising the split size attribute on the table, to have larger but fewer regions each. Splits are done usually when a size bound is hit. If your table is not pre-split, then consider pre-splitting it. The HBase documentation covers best practices around this: http://archive.cloudera.com/cdh5/cdh/5/hbase/book.html#manual_region_splitting_decisions (and other topics)

Harsh J · ‎08-08-2016

If you are using the QuickStart VM please read this note that comes along https://www.cloudera.com/documentation/enterprise/latest/topics/cloudera_quickstart_vm.html: """ The VM uses a package-based install. This allows you to work with or without Cloudera Manager. Parcels do not work with the VM unless you first migrate your CDH installation to use parcels. On your production systems, Cloudera recommends that you use parcels. """ In order to use a Kafka parcel, the CDH installation must first exist as a parcel by itself. The error you face is telling you that no CDH parcel was detected, which likely means your current CDH is package-based. Follow this guide to migrate from packages to parcels first for your CDH cluster software: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_migrating_packages_to_parcels.html, and then you should be able to use your Kafka parcel.

Harsh J · ‎08-02-2016

For effectively managing specific usage of different disk hardware types, HDFS offers a Heterogenous Storage Management feature, documented at https://www.cloudera.com/documentation/enterprise/latest/topics/admin_heterogeneous_storage_oview.html. Is that what you are looking for? The HDFS mover tool available for use with this feature, along with the storage policies that dictate replica placement selection will let you manage the differing hardware better. If you'd still like to move blocks over manually and not configure the HSM feature, then please read the method and caveat presented under http://wiki.apache.org/hadoop/FAQ#On_an_individual_data_node.2C_how_do_you_balance_the_blocks_on_the_disk.3F

Harsh J · ‎07-28-2016

The Cloudera ODBC connector is available for Windows, and .NET does support ODBC: http://www.cloudera.com/downloads/connectors/hive/odbc.html There's no direct client for HDFS in .NET, but HDFS offers a REST API via its WebHDFS component. This REST API is documented at http://archive.cloudera.com/cdh5/cdh/5/hadoop/hadoop-project-dist/hadoop-hdfs/WebHDFS.html, and you can use your .NET HTTP client to make use of it.

Harsh J · ‎07-28-2016

I've not tried it, but you should be able to use webhdfs:// instead of hdfs:// in that config. You need to also change the 8020 to 50070 (or your custom NN HTTP port).

Harsh J · ‎07-27-2016

On your insecure cluster, even though it does not use security it may need to still parse a secure username such as foo@REALM. To allow for this, you need to edit on the insecure cluster's CM, the value of HDFS -> Configuration -> Trusted Realms and add the value as the realm used on the secure cluster. Save and restart the insecure cluster as marked by CM. This change won't alter your security states, its only allowed to make rules to parse such secure-incoming usernames and avoid the EOFException (which happens when it closes the connection being unable to parse the username from the secure accessor).

Harsh J · ‎07-27-2016

Thank you for the update, please keep us posted.

Harsh J · ‎07-27-2016

Its actually because you're using an incorrect schema name and version in the xmlns field for the hive2 action type. Change the below: <hive2 xmlns="uri:oozie:hive-action:0.3"> To this valid value instead: <hive2 xmlns="uri:oozie:hive2-action:0.2"> The hive2 action is documented at http://archive.cloudera.com/cdh5/cdh/5/oozie/DG_Hive2ActionExtension.html

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: command "yarn application" doesn't work. Getti...

Re: How to run Oozie workfllow or action as anothe...

Re: HBase slow bulk loading using Hive

Re: Unable to download and start Kafka in CDH 5.7....

Re: Heterogenous disk volume in a data node

Re: How can I connect HDFS through .NET ?

Re: flume < secure cluster insecure cluster >hdfs ...

Re: flume < secure cluster insecure cluster >hdfs ...

Re: Unable to upload new files to encrypted zone i...

Re: Error while running beeline / hive2 action thr...