About pdvorak

pdvorak · ‎03-12-2018

Double Check what version of CDH you are running. Since 5.8, CDH flume uses the new flume configuration for kafka sources, meaning you have to specify the bootstrap servers as: agent1.sources.kafka-source.kafka.bootstrap.servers = localhost:9092 http://flume.apache.org/FlumeUserGuide.html#kafka-source -pd

pdvorak · ‎02-23-2018

If you have done an init --force, then the collection config information is unfortunately deleted in ZK (this is a destructive command that will remove everything under /solr znode). You might be able to look at the zk data dir and see if there are any snapshots and logs prior to the date that you deleted the data. You may be able to copy the data dir to a dev server and remove the current log and snapshot and start up the zk instance to review the data. See here for more information: https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_filemanagement More links that may provide useful information: https://groups.google.com/forum/#!topic/marathon-framework/EfhJ9A_6myc http://zookeeper-user.578899.n2.nabble.com/How-to-restore-a-snapshot-after-an-accidental-ZKclenup-td7582059.html I would suggest not trying to restore a snapshot for the running cluster, as you don't know what changes have been made since the solrctl init --force was executed. -pd

pdvorak · ‎01-16-2018

As Manikumar noted above, the old flume agent configuration has been deprecated, you can refer to the flume kafka source here: http://flume.apache.org/FlumeUserGuide.html Also, you can confirm with the consumerGroupCommand that the flume agents are in an acive consumer group: kafka-run-class kafka.admin.ConsumerGroupCommand --bootstrap-server ${HOSTNAME}:9092 --describe --group flume --new-consumer -pd

pdvorak · ‎12-14-2017

Kafka 2.2 uses sentry to provide authorization for kafka topics: https://www.cloudera.com/documentation/kafka/2-2-x/topics/kafka_security.html#using_kafka_with_sentry If you are using kerberos, you can add the sentry service and then follow the documentation for configuring kafka privileges. -pd

pdvorak · ‎09-20-2017

You shouldn't have to reindex the whole set of documents, unless you need that new field to be added to those existing documents. New documents that have that field would be searchable with that field, but older documents would not be returned. Reindexing would consist of removing the existing documents in the solr collection, and re running your indexing application (MRIT, solrj etc) to index all the original documents again. Alternatively you could have a solrj application that reads the old documents and adds the value to the document for the newly created field. Of course, you should test this in a QA environment to confirm the desired behavior. -pd

pdvorak · ‎09-20-2017

Thanks for the clarification, the original comments said your flume file channel was running out of space. With regards to the hdfs sink, once flume delivers to the hdfs sink, it no longer controls those files. Whatever post processing you are doing that uses those files should be responsible for cleaning up those folders. There isn't functionality within the flume sink to clean up old folders or expire data that has been delivered already. You could run a simple cron job that removes directories in hdfs older than a month, or run an oozie job that does the same. HTH -pd

pdvorak · ‎09-19-2017

Flume shouldn't be holding on to old files that have all the events delivered to the sinks. If that is the case, there may be some inconsistency in the checkpoints that is causing this. You could resolve this by regenerating the checkpoints as I noted previously. The suggestion would be to shut down flume, increase the heap size to a large amount, and then add the use-fast-replay=true property to the channel. Delete the checkpoints and then start up flume. The checkpoints will be recreated and properly record which events were delivered to the sinks, and then any old log files that are no longer needed should be removed. As a safety measure, you may want to backup the files (data and checkpoints), but regenerating the checkpoints shouldn't negatively affect the flume channel, it will just take some time to replay. -pd

pdvorak · ‎09-19-2017

As I stated in my recent comment, the flume kafka client was upgraded as a part of the CDH5.8 upgrade to be able to use the new consumer API, which supports secure communication with kerberos. Versions prior to CDH5.8 use the old api which doesn't support kerberos or SSL. You will have to upgrade to get this new functionality, or run flume outside of Cloudera Manager, using tarballs or RPM's. -pd

pdvorak · ‎09-18-2017

That documentation is correct. The flume kafka client version that is in CDH5.7 and lower isn't capable of communicating with secure kafka. You need to upgrade to CDH5.8 or higher version of flume in order to be able to connect to secure kafka, and use these steps to configure flume: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_flume_kafka_security_confg.html -pd

pdvorak · ‎09-15-2017

You need to provide your broker config as well. Are you seeing any errors on the broker logs? Are you able to use kafka-console-consumer and kafka-console-producer to send messages? -pd

Online	Offline
Last Visited	‎01-08-2020 04:37 PM

Member Since	‎01-09-2014 08:15 AM
Last Visited	‎01-08-2020 04:37 PM
Posts	283
Kudos received	70

Cloudera Community

Re: spooldir channel error - too many files. - how...

Re: How to configure Flume with Kafka channel with...

Re: How to configure Flume with Kafka channel with...

Re: Solrcloud Replica Names

Re: flume kafkasource, hdfs sink remove avro field

Re: Flume ingestion error ( need solution)

Re: solr zookeper /solr/collections are empty

Re: Problem about Configuring Flume as Kafka Consu...

Re: Does kafka2.2.0 in CDH 5.11.2 support ACL's on...

Re: Adding new field to the schema.xml

Re: Flume - File Channel getting full

Re: Flume - File Channel getting full

Re: Kafka.properties override for listeners proper...

Re: Kafka.properties override for listeners proper...

Re: Kafka.properties override for listeners proper...