About Harsh J

syamsri · ‎11-21-2017

command : sudo -u hdfs hdfs fsck -list-corruptfileblocks For deleting corrupted files : sudo -u hdfs hdfs fsck / -delete Thanks.

bvk · ‎11-20-2017

Thanks a lot.. This resolved the issue : ) I have one more doubt, If I get java heap size issue like, Caused by: java.lang.OutOfMemoryError: Java heap space when running any mapreduce job, how to increase the java heap size runtime? Does “-Dmapreduce.map.java.opts=-Xmx2048m” this really do something there? I dint find any changes. Could you please advice the best way to increase java heap size? Thanks in advance

cdhhadoop · ‎11-14-2017

@HarshJ Thanks for inputs. I checked heap charts on jobtracker instance and it's hitting the maximum heap value frequently and then reducing to little less value. Also, there has not been any change/increase in load. I checked the jobtracker logs but couldn't find any pauses as logging is not enabled for GC. Can you please let me know what are the history retention configurations of jobtracker? Can you please suggest me how to identify the reason behind the GC taking significant time? Thanks, Priya

epowell · ‎11-13-2017

As noted in the previous reply, I did not have any nodes with the Failover Controller role. Importantly, I also had not enabled Automatic Failover despite running in an HA configuration. I went ahead and added the Failover Controller role to both namenodes - the good one and the bad one. After that, I attempted enable the Automatic Failover using the link shown in the screenshot from this post. To do that, however, I needed to first start Zookeeper. At that point, If I recall correctly, the other namenode was still not active but I then restarted the entire cluster and the automatic failover kicked in, using the other namenode as the active one and leaving the bad namenode in a stopped state.

Harry · ‎11-12-2017

Also some character case issue, you should follow flume official document to do configure. http://flume.apache.org/FlumeUserGuide.html#spooling-directory-source

Harsh J · ‎11-12-2017

Yes, that is right. In MySQL though, you can ease user access provisioning by granting wildcard host login access, to allow all hosts: https://dev.mysql.com/doc/refman/5.7/en/adding-users.html (look for the % character example on page).

Harsh J · ‎11-08-2017

Is there any pattern to this? For ex., do the few tasks that hang all run on the same host or specific set of hosts among all nodes in the cluster? A more detailed root cause can be sought by performing a jstack on a task that appears hung live. This is done by first finding which host the hung task is running on (within the task timeout period, after noticing it hanging), discovering its container ID and finding the associated java process on the machine followed by the jstack command run on the PID.

josholsan · ‎11-07-2017

That's helpful and it's all I missed. Thanks you so much, I'm marking your last answer as solution. Best regards.

Harsh J · ‎11-07-2017

If you're managing your Hue service via Cloudera Manager, you can do the dump and load via the UI. Stop the Hue service with the SQLite configuration, then click 'Dump Database' under the Hue service page's Actions button. Next, reconfigure the stopped Hue service to use your new MySQL DB, and before starting it go back to the Actions button and click 'Load Database'.

Harsh J · ‎11-07-2017

To find what's included in a CDH release, visit the 'CDH Version and Packaging Information' area of the documentation: https://www.cloudera.com/documentation/enterprise/release-notes/topics/rg_cdh_vd.html Specifically, to find what's in the 5.13.x and 5.12.x releases, visit the following links: - https://www.cloudera.com/documentation/enterprise/release-notes/topics/cm_vd_cdh_package_tarball_513.html - https://www.cloudera.com/documentation/enterprise/release-notes/topics/cm_vd_cdh_package_tarball_512.html Kafka is currently not a part of the base CDH packaging, and is available as a separate parcel. Follow the Kafka doc. page for instructions on how to add it to your cluster: https://www.cloudera.com/documentation/kafka/latest/topics/kafka.html Sqoop and Flume are included in CDH5 since its inception, and Kudu is included in CDH5 since 5.13.x (previously it was a separate parcel).

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: hdfs fsck command isse

Re: Job submitted on edge node runs in local host ...

Re: GC duration concerning for jobtracker in CDH5....

Re: ls: Operation category READ is not supported i...

Re: multiple sources of flume agent

Re: how to grant specific privileges when using sq...

Re: Multiple Tasks attempts FAILED in YARN: Timed ...

Re: Spark 2 not working after upgrade. PySpark err...

Re: CommandError: Unable to serialize database: no...

Re: CDH (12 and 13) parcel list