About Harsh J

Amn_468 · ‎04-17-2019

Thank you all for your help

HenryPark · ‎04-13-2019

Harsh J: Thanks for the help on the previous issue. We finally resolved the issue. It was due to an undocumented port required in the CDH 6.2 to CDH 6.2 distcp. Now, we are migrating the task over to Oozie and having some trouble. Could you elaborate a bit more or give us some links or pointers? Thanks. We could not find "mapreduce.job.hdfs-servers" . Where is that?

Harsh J · ‎04-10-2019

One possibility could be the fetch size (combined with some unexpectedly wide rows). Does lowering the result fetch size help? >From http://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html#idp774390917888 : --fetch-size Number of entries to read from database at once. Also, do you always see it fail with the YARN memory kill (due to pmem exhaustion) or do you also observe an actual java.lang.OutOfMemoryError occasionally? If it is always the former, then another suspect would be some off-heap memory use done by the JDBC driver in use, although I've not come across such a problem.

AlbertAlbert · ‎04-08-2019

Thank you. I had the same issue. Should some content be added to the tutorial indicating that these services need to be started?

Harsh J · ‎04-03-2019

For CDH / CDK Kafka users, the command is already in your PATH as "kafka-consumer-groups".

Tomtong · ‎03-28-2019

Can you provide more information on reporting load (for low-latency operations) issue when we have datanode with 100T+ storage? We need archive node for HDFS storage only purpose. No Yarn/spark running on it. It will only storage data based on storage migration policy. Node's network/storage IO bandwidth is considered be able to handle the larger storage size.

David_Schwab · ‎03-20-2019

Ok, I figured it out. There was a mapping rule that translated my Kerberos principal name to a lower-case short name, i.e. USER1@EXAMPLE.COM becomes user1 I had entered both USER1 and USER1@EXAMPLE.COM as HBase superusers, but not user1. Tricky. . .

Harsh J · ‎03-20-2019

Flume scripts need to be run under a Bash shell environment, but it appears that you are trying PowerShell in Windows.

GYD · ‎03-18-2019

Your job.properties serves future launches and very hand when any of your cluster parameters changes due to upgrades or other issues. It is not meant for launching only but can be used for testing as well if you configure your files appropriately.

Harsh J · ‎03-13-2019

Please create a new thread for distinct questions, instead of bumping an older, resolved thread. As to your question, the error is clear as is the documentation, quoted below: """ Spooling Directory Source This source lets you ingest data by placing files to be ingested into a “spooling” directory on disk. This source will watch the specified directory for new files, and will parse events out of new files as they appear. The event parsing logic is pluggable. After a given file has been fully read into the channel, it is renamed to indicate completion (or optionally deleted). Unlike the Exec source, this source is reliable and will not miss data, even if Flume is restarted or killed. In exchange for this reliability, only immutable, uniquely-named files must be dropped into the spooling directory. Flume tries to detect these problem conditions and will fail loudly if they are violated: If a file is written to after being placed into the spooling directory, Flume will print an error to its log file and stop processing. If a file name is reused at a later time, Flume will print an error to its log file and stop processing. """ - https://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#spooling-directory-source It appears that you can get around this by using ExecSource with a script or command that reads the files, but you'll have to sacrifice reliability. It may be instead worth investing in an approach that makes filenames unique (`uuidgen` named softlinks in another folder, etc.)

Member Since	‎07-31-2013 07:21 AM
Last Visited
Posts	1,924
Kudos received	461

Cloudera Community

Re: S3Guard Suggested to help fix Consistency

Re: Failed to start namenode. java.io.FileNotFound...

Re: sqoop import issue

Re: Efficient ways to store many images files

Re: S3 loading into HDFS

Re: Impala Deamon Logs

Re: DistCp over Oozie .vs. from shell

Re: Sqoop virtual memory error

Re: Exercise 1 - Faced error on my first trying

Re: kafka-consumer-groups.sh is missing missing

Re: Maximum capacity per DataNode

Re: HBase Insufficient Permissions with Kerberos

Re: Flume cannot find teh flume-ng.ps1 file

Re: Oozie job.properties - Local or HDFS

Re: Some questions with Flume