Member since
01-09-2014
283
Posts
70
Kudos Received
50
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
1730 | 06-19-2019 07:50 AM | |
2786 | 05-01-2019 08:07 AM | |
2835 | 04-10-2019 08:49 AM | |
2734 | 03-20-2019 09:30 AM | |
2381 | 01-23-2019 10:58 AM |
03-12-2018
12:02 PM
1 Kudo
Double Check what version of CDH you are running. Since 5.8, CDH flume uses the new flume configuration for kafka sources, meaning you have to specify the bootstrap servers as: agent1.sources.kafka-source.kafka.bootstrap.servers = localhost:9092 http://flume.apache.org/FlumeUserGuide.html#kafka-source -pd
... View more
02-23-2018
08:48 AM
If you have done an init --force, then the collection config information is unfortunately deleted in ZK (this is a destructive command that will remove everything under /solr znode). You might be able to look at the zk data dir and see if there are any snapshots and logs prior to the date that you deleted the data. You may be able to copy the data dir to a dev server and remove the current log and snapshot and start up the zk instance to review the data. See here for more information: https://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html#sc_filemanagement More links that may provide useful information: https://groups.google.com/forum/#!topic/marathon-framework/EfhJ9A_6myc http://zookeeper-user.578899.n2.nabble.com/How-to-restore-a-snapshot-after-an-accidental-ZKclenup-td7582059.html I would suggest not trying to restore a snapshot for the running cluster, as you don't know what changes have been made since the solrctl init --force was executed. -pd
... View more
01-16-2018
10:06 AM
As Manikumar noted above, the old flume agent configuration has been deprecated, you can refer to the flume kafka source here: http://flume.apache.org/FlumeUserGuide.html Also, you can confirm with the consumerGroupCommand that the flume agents are in an acive consumer group: kafka-run-class kafka.admin.ConsumerGroupCommand --bootstrap-server ${HOSTNAME}:9092 --describe --group flume --new-consumer -pd
... View more
12-14-2017
11:05 AM
1 Kudo
Kafka 2.2 uses sentry to provide authorization for kafka topics: https://www.cloudera.com/documentation/kafka/2-2-x/topics/kafka_security.html#using_kafka_with_sentry If you are using kerberos, you can add the sentry service and then follow the documentation for configuring kafka privileges. -pd
... View more
09-20-2017
12:30 PM
You shouldn't have to reindex the whole set of documents, unless you need that new field to be added to those existing documents. New documents that have that field would be searchable with that field, but older documents would not be returned. Reindexing would consist of removing the existing documents in the solr collection, and re running your indexing application (MRIT, solrj etc) to index all the original documents again. Alternatively you could have a solrj application that reads the old documents and adds the value to the document for the newly created field. Of course, you should test this in a QA environment to confirm the desired behavior. -pd
... View more
09-20-2017
12:24 PM
Thanks for the clarification, the original comments said your flume file channel was running out of space. With regards to the hdfs sink, once flume delivers to the hdfs sink, it no longer controls those files. Whatever post processing you are doing that uses those files should be responsible for cleaning up those folders. There isn't functionality within the flume sink to clean up old folders or expire data that has been delivered already. You could run a simple cron job that removes directories in hdfs older than a month, or run an oozie job that does the same. HTH -pd
... View more
09-19-2017
01:41 PM
Flume shouldn't be holding on to old files that have all the events delivered to the sinks. If that is the case, there may be some inconsistency in the checkpoints that is causing this. You could resolve this by regenerating the checkpoints as I noted previously. The suggestion would be to shut down flume, increase the heap size to a large amount, and then add the use-fast-replay=true property to the channel. Delete the checkpoints and then start up flume. The checkpoints will be recreated and properly record which events were delivered to the sinks, and then any old log files that are no longer needed should be removed. As a safety measure, you may want to backup the files (data and checkpoints), but regenerating the checkpoints shouldn't negatively affect the flume channel, it will just take some time to replay. -pd
... View more
09-19-2017
12:16 PM
As I stated in my recent comment, the flume kafka client was upgraded as a part of the CDH5.8 upgrade to be able to use the new consumer API, which supports secure communication with kerberos. Versions prior to CDH5.8 use the old api which doesn't support kerberos or SSL. You will have to upgrade to get this new functionality, or run flume outside of Cloudera Manager, using tarballs or RPM's. -pd
... View more
09-18-2017
10:01 AM
That documentation is correct. The flume kafka client version that is in CDH5.7 and lower isn't capable of communicating with secure kafka. You need to upgrade to CDH5.8 or higher version of flume in order to be able to connect to secure kafka, and use these steps to configure flume: https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_flume_kafka_security_confg.html -pd
... View more
09-15-2017
09:18 AM
You need to provide your broker config as well. Are you seeing any errors on the broker logs? Are you able to use kafka-console-consumer and kafka-console-producer to send messages? -pd
... View more