About Harsh J

Harsh J · ‎02-04-2014

The only thing you'll lose out if you do the fresh approach is the history, but not the functionality/correctness of the newly added jobs. I'd pick using the fresh mysql approach if it is an option, cause delving down into extracting data from Apache Derby DB for moving it onto MySQL is a tad more painful operation.

Harsh J · ‎02-04-2014

Pig's default PigStorage loader may not understand how to use the index files created alongside. You'll need to use the ElephantBird loader functions available at https://github.com/kevinweil/elephant-bird to properly load them in a scalable way (you need its com.twitter.elephantbird.pig.load.LzoTextLoader loader specifically, for indexed LZO text files).

Harsh J · ‎01-30-2014

Ah I didn't notice JD's post when replying by email, my bad.

Harsh J · ‎01-30-2014

> * Is this available in CDH4.5.0 running with hbase 0.94.6? Yes. See https://github.com/cloudera/hbase/blob/cdh4.5.0-release/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java#L1507. > * Do i set this in the safety valvle for the regionserver settings? Yes, this goes into the Advanced properties (safety valve for hbase-site.xml) of the Region Servers, as thats where compactions of store files are performed. > * Can I verify this is the current policy running on major compactions? There's no direct way AFAICT, but setting the property should suffice in enabling that sub-routine, per the logic shown above.

Harsh J · ‎01-30-2014

Hi, If you use MR1, then the /usr/lib/hadoop-mapreduce will not be on your classpath, but /usr/lib/hadoop-0.20-mapreduce would. For MR1, these set of classes reside under /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/*.jar, which is typically not on the default classpath. Can you try the below, perhaps? ~> export HADOOP_CLASSPATH="/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/*" ~> hadoop jar your-jar...

Harsh J · ‎01-20-2014

Many thanks for following up! I do think the directory-or-file can be checked for before traversal, or at least the error message may be improved, for which I've filed https://issues.apache.org/jira/browse/HDFS-5802. The shell utils handle this in a clearer way btw: ➜ ~ ls foo/file ls: foo/file: Not a directory ➜ ~ hadoop fs -ls foo/file ls: `foobar/file': No such file or directory

Harsh J · ‎01-14-2014

You need to simply add these role types from CM -> HBase -> Instances tab -> "Add" button -> Select a host for REST and/or THRIFT services.

Harsh J · ‎01-03-2014

Use the Quorum Journal Manager mode. I believe the Wizard also recommends use of this mode clearly. The NFS mode was an early way of doing it, but requires the additional HA-NFS hardware and the maintenance associated with it and has not been popular.

Harsh J · ‎12-31-2013

(1) - No, but its possible that your job's running via the LocalJobRunner. See (2). (2) - This may mean that your Job in the IDE is not specifying the right URLs for FS and MR access, and hence the Job runs in a special default mode called "LocalJobRunner", which runs the job in the same JVM as the Driver, without submitting it anywhere. (3) - I believe you should be able to find a variety of examples in the Hue Web UI to run/read/follow. (4) - The daemons run as service usernames. You need to run jps as root to be able to see all the JVMs running on a machine.

Harsh J · ‎12-30-2013

No, while CM monitors and alerts on critical FS states such as missing or corrupt blocks based on metrics, there's no "fsck" command you can run. Typically you want to run fsck on the CLI so you can read or parse its output if you're looking to analyse the state further.

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Oozie embedded derby to mysql, what is the bes...

Re: Pig LZO Inputsplits

Re: CDH4.5 enable ExploringCompation

Re: CDH4.5 enable ExploringCompation

Re: NoClassDefFoundError thrown when using TypedBy...

Re: Permission denied, access=EXECUTE on getting t...

Re: how to start Hbase REST server

Re: HA for Name Node

Re: CDH4 Eclipse how does it work

Re: How to run FSCK from cloudera manager.