About jholoman

jholoman · ‎09-24-2016

Great, and thanks for posting back your resulotion as it may help others down the road. Thanks Jeff

jholoman · ‎09-24-2016

Hi I don't see any reason why CDH Kafka wouldn't work with Storm, but it's not something we test at all. You can see the underylying versions here: https://www.cloudera.com/documentation/kafka/latest/topics/kafka_packaging.html Thanks Jeff

jholoman · ‎08-05-2016

Hi Mark, I think Jeremy summed up the differences reasonably well. We both offer management capabilities for Kafka and probably have similar strategies for what gets released in terms of back-ports etc. One interesting thing that's coming soon is Sentry integration with Kafka. Thanks Jeff

jholoman · ‎10-21-2015

Quaie, I think you are actually ok on tier 2 (Kafka Channel to Sink - I do this all the time) even though CM does use the "warning" CSS which is not ideal if trying to edit the file directly. These warnings will be dismissable in CM 5.5 So for tier 1, currently this isn't possible but, you can pretty easily accomplish this by hacking around the agent file. When flume starts, if a particular component doesn't have the confiugration correct, the agent will still run, it will just omit the misconfigured component. So, perhaps you could just add the sink component but make the configuration wrong. Give that a shot and let me know how it goes. Thanks Jeff

jholoman · ‎06-19-2015

Scott, I'll refer you to the documentation on this topic here: http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_refresh.html and http://www.cloudera.com/content/cloudera/en/documentation/cloudera-impala/latest/topics/impala_invalidate_metadata.html In terms of "best practice": Use the REFRESH statement to load the latest metastore metadata and block location data for a particular table in these scenarios: After loading new data files into the HDFS data directory for the table. (Once you have set up an ETL pipeline to bring data into Impala on a regular basis, this is typically the most frequent reason why metadata needs to be refreshed.) After issuing ALTER TABLE, INSERT, LOAD DATA, or other table-modifying SQL statement in Hive. INVALIDATE METADATA and REFRESH are counterparts: INVALIDATE METADATA waits to reload the metadata when needed for a subsequent query, but reloads all the metadata for the table, which can be an expensive operation, especially for large tables with many partitions. REFRESH reloads the metadata immediately, but only loads the block location data for newly added data files, making it a less expensive operation overall. If data was altered in some more extensive way, such as being reorganized by the HDFS balancer, use INVALIDATE METADATA to avoid a performance penalty from reduced local reads. If you used Impala version 1.0, the INVALIDATE METADATA statement works just like the Impala 1.0 REFRESH statement did, while the Impala 1.1 REFRESH is optimized for the common use case of adding new data files to an existing table, thus the table name argument is now required. Let me know if this doesn't answer your question. Thanks Jeff

Online	Offline
Last Visited	‎01-10-2017 02:24 PM

Member Since	‎07-08-2013 02:30 PM
Last Visited	‎01-10-2017 02:24 PM
Posts	21
Kudos received	6

Cloudera Community

Re: disable Flume syntax checker

Re: Refresh the Impala metadata from Hive Metastor...

Re: CDH 5.8 Kafka-Storm1.0.2 integration issues

Re: CDH 5.8 Kafka-Storm1.0.2 integration issues

Re: Vendor comparison Cloudera Kafka vs. Confluent

Re: disable Flume syntax checker

Re: Refresh the Impala metadata from Hive Metastor...