Member since
07-31-2013
1924
Posts
462
Kudos Received
311
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1543 | 07-09-2019 12:53 AM | |
9297 | 06-23-2019 08:37 PM | |
8052 | 06-18-2019 11:28 PM | |
8677 | 05-23-2019 08:46 PM | |
3474 | 05-20-2019 01:14 AM |
04-13-2019
01:40 AM
Harsh J: Thanks for the help on the previous issue. We finally resolved the issue. It was due to an undocumented port required in the CDH 6.2 to CDH 6.2 distcp. Now, we are migrating the task over to Oozie and having some trouble. Could you elaborate a bit more or give us some links or pointers? Thanks. We could not find "mapreduce.job.hdfs-servers" . Where is that?
... View more
04-10-2019
12:31 AM
1 Kudo
One possibility could be the fetch size (combined with some unexpectedly wide rows). Does lowering the result fetch size help? >From http://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html#idp774390917888 : --fetch-size Number of entries to read from database at once. Also, do you always see it fail with the YARN memory kill (due to pmem exhaustion) or do you also observe an actual java.lang.OutOfMemoryError occasionally? If it is always the former, then another suspect would be some off-heap memory use done by the JDBC driver in use, although I've not come across such a problem.
... View more
04-08-2019
10:15 AM
Thank you. I had the same issue. Should some content be added to the tutorial indicating that these services need to be started?
... View more
04-03-2019
02:10 AM
For CDH / CDK Kafka users, the command is already in your PATH as "kafka-consumer-groups".
... View more
03-28-2019
01:25 PM
Can you provide more information on reporting load (for low-latency operations) issue when we have datanode with 100T+ storage? We need archive node for HDFS storage only purpose. No Yarn/spark running on it. It will only storage data based on storage migration policy. Node's network/storage IO bandwidth is considered be able to handle the larger storage size.
... View more
03-20-2019
09:01 AM
Ok, I figured it out. There was a mapping rule that translated my Kerberos principal name to a lower-case short name, i.e. USER1@EXAMPLE.COM becomes user1 I had entered both USER1 and USER1@EXAMPLE.COM as HBase superusers, but not user1. Tricky. . .
... View more
03-20-2019
04:12 AM
Flume scripts need to be run under a Bash shell environment, but it appears that you are trying PowerShell in Windows.
... View more
03-18-2019
01:26 PM
Your job.properties serves future launches and very hand when any of your cluster parameters changes due to upgrades or other issues. It is not meant for launching only but can be used for testing as well if you configure your files appropriately.
... View more
03-13-2019
06:52 PM
Please create a new thread for distinct questions, instead of bumping an older, resolved thread. As to your question, the error is clear as is the documentation, quoted below: """ Spooling Directory Source This source lets you ingest data by placing files to be ingested into a “spooling” directory on disk. This source will watch the specified directory for new files, and will parse events out of new files as they appear. The event parsing logic is pluggable. After a given file has been fully read into the channel, it is renamed to indicate completion (or optionally deleted). Unlike the Exec source, this source is reliable and will not miss data, even if Flume is restarted or killed. In exchange for this reliability, only immutable, uniquely-named files must be dropped into the spooling directory. Flume tries to detect these problem conditions and will fail loudly if they are violated: If a file is written to after being placed into the spooling directory, Flume will print an error to its log file and stop processing. If a file name is reused at a later time, Flume will print an error to its log file and stop processing. """ - https://archive.cloudera.com/cdh5/cdh/5/flume-ng/FlumeUserGuide.html#spooling-directory-source It appears that you can get around this by using ExecSource with a script or command that reads the files, but you'll have to sacrifice reliability. It may be instead worth investing in an approach that makes filenames unique (`uuidgen` named softlinks in another folder, etc.)
... View more