Member since
01-09-2014
283
Posts
70
Kudos Received
50
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1698 | 06-19-2019 07:50 AM | |
2723 | 05-01-2019 08:07 AM | |
2772 | 04-10-2019 08:49 AM | |
2666 | 03-20-2019 09:30 AM | |
2355 | 01-23-2019 10:58 AM |
01-23-2019
10:47 AM
1 Kudo
The problem is usually that the kafka consumer is not configured properly, and is failing silently while it is running. You can verify if the flume consumer group is actually connected to partitions by running the "kafka-consumer-groups" command. You could also turn on log4j.logger.org.apache.kafka=DEBUG in the broker logging safety valve, and review the messages when flume tries to connect to kafka. A lot of "errors" are retryable, meaning they won't throw an exception, but you won't see any output. -pd
... View more
01-17-2019
01:19 PM
The recommended path in this situation is to just comment out the sources line that specifies which sources are configured.: # tier1.sources = kafkasource1 kafkasource2 etc The flume agent can function without any sources and will then drain the channel through the sinks, without adding any new data to the channel. -pd
... View more
12-26-2018
03:50 PM
2 Kudos
A word of caution: Flume isn't really designed for transferring files of large sizes. It would be recommended for you to use oozie or an nfs gateway with cron to transfer files on a regular basis, especially if you want the file preserved in its entirety. One of the things that you will observe, is that if flume has any temporary transmission errors, it will attempt to resend parts of those files, which will result in duplicates (a standard and expected scenario when using flume), and so your resultant files in hdfs would have those duplicates within them. Additionally, when you do have interruptions, existing hdfs files are closed and new ones are opened. -pd
... View more
11-08-2018
01:08 PM
The issue is not whether kerberos is used, rather that the curl command expects it to be there (since it is there by default with the standard OS distribution of curl). Since it is not there, then the curl command fails, thus the solrctl script fails. If you run the following, what is your result: curl --version If you are running redhat, can you also do: which curl yum whatprovides curl And provide the output? -pd
... View more
11-05-2018
04:18 PM
CDH6 has rebased to Solr 7. Given the large new set of features, it is included in a major release and not a minor release. If you need the functionality in Solr 7, the recommendation would be to upgrade to CDH6. -pd
... View more
11-05-2018
04:16 PM
Thats your problem, you are using a version of curl that doesn't support kerberos you should see something like this for the curl --version command: [root@nightly515-1 ~]# curl --version curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.21 Basic ECC zlib/1.2.7 libidn/1.28 libssh2/1.4.3 Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz unix-sockets It needs to support "GSS-Negotiate". It's likely you installed a custom version of curl, or updated to a version that doesn't support it. -pd
... View more
11-02-2018
12:53 PM
Does the curl command I noted return an actual web page? From the output, it is possible there is something wrong with the curl binaries that you are using... -pd
... View more
11-01-2018
09:58 AM
It looks like its failing contacting the solr nodes. are you able to run this successfully from the host where the solrctl command is running? curl -i --retry 5 -s -L -k --negotiate -u : http://ip-172-31-82-140.ec2.internal:8983/solr -pd
... View more
11-01-2018
08:37 AM
Can you run with the --trace option and see if theres any indication of why the ZK_ENSEMBLE is not being used? -pd
... View more
08-31-2018
09:01 AM
FLUME-3027 has been backported to CDH5.11.0 and above, so if you are able to upgrade, it would prevent the issue of offsets bouncing back and forward. One thing you may want to consider, if you are getting rebalances, it may be because it is taking too long to deliver by your sink, before polling kafka again. You may want to consider lowering your sink batch size in order to deliver and ack the messages in a timely fashion. Additionally, if you upgrade to CDH5.14 or higher, the flume kafka client is 0.10.2, and you would be able to set max.poll.records to match the batchSize you are using for the flume sink. Additionally, you could increase the max.poll.interval.ms, which is decoupled from the session.timeout.ms in 0.10.0 and above. This would prevent the rebalancing from occurring since the client would still heartbeat without having to do a poll to pull more records before session.timeout.ms expires. -pd
... View more