About pdvorak

pdvorak · ‎01-23-2019

The problem is usually that the kafka consumer is not configured properly, and is failing silently while it is running. You can verify if the flume consumer group is actually connected to partitions by running the "kafka-consumer-groups" command. You could also turn on log4j.logger.org.apache.kafka=DEBUG in the broker logging safety valve, and review the messages when flume tries to connect to kafka. A lot of "errors" are retryable, meaning they won't throw an exception, but you won't see any output. -pd

pdvorak · ‎01-17-2019

The recommended path in this situation is to just comment out the sources line that specifies which sources are configured.: # tier1.sources = kafkasource1 kafkasource2 etc The flume agent can function without any sources and will then drain the channel through the sinks, without adding any new data to the channel. -pd

pdvorak · ‎12-26-2018

A word of caution: Flume isn't really designed for transferring files of large sizes. It would be recommended for you to use oozie or an nfs gateway with cron to transfer files on a regular basis, especially if you want the file preserved in its entirety. One of the things that you will observe, is that if flume has any temporary transmission errors, it will attempt to resend parts of those files, which will result in duplicates (a standard and expected scenario when using flume), and so your resultant files in hdfs would have those duplicates within them. Additionally, when you do have interruptions, existing hdfs files are closed and new ones are opened. -pd

pdvorak · ‎11-08-2018

The issue is not whether kerberos is used, rather that the curl command expects it to be there (since it is there by default with the standard OS distribution of curl). Since it is not there, then the curl command fails, thus the solrctl script fails. If you run the following, what is your result: curl --version If you are running redhat, can you also do: which curl yum whatprovides curl And provide the output? -pd

pdvorak · ‎11-05-2018

CDH6 has rebased to Solr 7. Given the large new set of features, it is included in a major release and not a minor release. If you need the functionality in Solr 7, the recommendation would be to upgrade to CDH6. -pd

pdvorak · ‎11-05-2018

Thats your problem, you are using a version of curl that doesn't support kerberos you should see something like this for the curl --version command: [root@nightly515-1 ~]# curl --version curl 7.29.0 (x86_64-redhat-linux-gnu) libcurl/7.29.0 NSS/3.21 Basic ECC zlib/1.2.7 libidn/1.28 libssh2/1.4.3 Protocols: dict file ftp ftps gopher http https imap imaps ldap ldaps pop3 pop3s rtsp scp sftp smtp smtps telnet tftp Features: AsynchDNS GSS-Negotiate IDN IPv6 Largefile NTLM NTLM_WB SSL libz unix-sockets It needs to support "GSS-Negotiate". It's likely you installed a custom version of curl, or updated to a version that doesn't support it. -pd

pdvorak · ‎11-02-2018

Does the curl command I noted return an actual web page? From the output, it is possible there is something wrong with the curl binaries that you are using... -pd

pdvorak · ‎11-01-2018

It looks like its failing contacting the solr nodes. are you able to run this successfully from the host where the solrctl command is running? curl -i --retry 5 -s -L -k --negotiate -u : http://ip-172-31-82-140.ec2.internal:8983/solr -pd

pdvorak · ‎11-01-2018

Can you run with the --trace option and see if theres any indication of why the ZK_ENSEMBLE is not being used? -pd

pdvorak · ‎08-31-2018

FLUME-3027 has been backported to CDH5.11.0 and above, so if you are able to upgrade, it would prevent the issue of offsets bouncing back and forward. One thing you may want to consider, if you are getting rebalances, it may be because it is taking too long to deliver by your sink, before polling kafka again. You may want to consider lowering your sink batch size in order to deliver and ack the messages in a timely fashion. Additionally, if you upgrade to CDH5.14 or higher, the flume kafka client is 0.10.2, and you would be able to set max.poll.records to match the batchSize you are using for the flume sink. Additionally, you could increase the max.poll.interval.ms, which is decoupled from the session.timeout.ms in 0.10.0 and above. This would prevent the rebalancing from occurring since the client would still heartbeat without having to do a poll to pull more records before session.timeout.ms expires. -pd

Online	Offline
Last Visited	‎01-08-2020 04:37 PM

Member Since	‎01-09-2014 08:15 AM
Last Visited	‎01-08-2020 04:37 PM
Posts	283
Kudos received	70

Cloudera Community

Re: spooldir channel error - too many files. - how...

Re: How to configure Flume with Kafka channel with...

Re: How to configure Flume with Kafka channel with...

Re: Solrcloud Replica Names

Re: flume kafkasource, hdfs sink remove avro field

Re: Problem about Configuring Flume as Kafka Consu...

Re: Possible to drain a Flume channel?

Re: Some questions with Flume

Re: unable to create Solr collection / index

Re: SOLR Upgrade

Re: unable to create Solr collection / index

Re: unable to create Solr collection / index

Re: unable to create Solr collection / index

Re: unable to create Solr collection / index

Re: Kafka consumer group lag in one or two partiti...