About HenryPark

HenryPark · ‎04-13-2019

Harsh J: Thanks for the help on the previous issue. We finally resolved the issue. It was due to an undocumented port required in the CDH 6.2 to CDH 6.2 distcp. Now, we are migrating the task over to Oozie and having some trouble. Could you elaborate a bit more or give us some links or pointers? Thanks. We could not find "mapreduce.job.hdfs-servers" . Where is that?

Harsh J · ‎04-10-2019

One possibility could be the fetch size (combined with some unexpectedly wide rows). Does lowering the result fetch size help? >From http://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html#idp774390917888 : --fetch-size Number of entries to read from database at once. Also, do you always see it fail with the YARN memory kill (due to pmem exhaustion) or do you also observe an actual java.lang.OutOfMemoryError occasionally? If it is always the former, then another suspect would be some off-heap memory use done by the JDBC driver in use, although I've not come across such a problem.

awong · ‎01-09-2019

In terms of resources, five masters wouldn't strain the cluster much. The big change is that the state that lives on the master (e.g. catalog metadata, etc.) would need to be replicated with a replication factor of 5 in mind (i.e. at least 3 copies to be considered "written"). While this is possible, the recommended configuration is 3. It is the most well-tested and commonly-used.

Online	Offline
Last Visited	‎04-15-2019 10:44 AM

Member Since	‎06-25-2018 08:36 PM
Last Visited	‎04-15-2019 10:44 AM
Posts	10
Kudos received	1

Cloudera Community

Re: DistCp over Oozie .vs. from shell

Re: Sqoop virtual memory error

Re: Small Kudu Cluster