About rtempleton

rtempleton · ‎11-14-2016

Hi Ambud, This section in the Storm documentation provides an excellent explanation on how acking and fault tolerance work together to guarantee at least once processing: http://storm.apache.org/releases/1.0.2/Guaranteeing-message-processing.html

rtempleton · ‎11-25-2015

Yes, I ran both. I executed the explain command from within the Ambari Hive view ( I realize not optimal for the result it produces) so I checked the hiveserver2.log. I don't see any info there regarding why it thinks the stats are missing or incomplete.

rtempleton · ‎11-24-2015

I'm running an explain on a query and the result includes "Plan not optimized by CBO due to missing statistics. Please check log for more details" 1. All the tables in the query have had compute statistics run on them and the describe formatted output shows that stats are present and up to date. What is missing? 2. Which log file is the message referring to? I looked in Hive the hiveserver2.log and can see the log entries for the explain command here but there's no explanation on what stats it thinks are missing.

rtempleton · ‎11-03-2015

Thanks @Neeraj - The 200GB guide line is something I can share with customers.

rtempleton · ‎11-02-2015

rtempleton · ‎10-26-2015

In my mind the two biggest considerations for ORC over Parquet are: 1. Many of the performance improvements provided in the Stinger initiative are dependent on features of the ORC format including block level index for each column. This leads to potentially more efficient I/O allowing Hive to skip reading entire blocks of data if it determines predicate values are not present there. Also the Cost Based Optimizer has the ability to consider column level metadata present in ORC files in order to generate the most efficient graph. 2. ACID transactions are only possible when using ORC as the file format.

rtempleton · ‎10-21-2015

Hi Artem, The default stream id is "default". Check to see if the mapper bolt you mention is declaring a new outputStream. This would occur in the declareOutputFields method of that class. Also, check the collector.emit call within the execute method. The default behavior is to emit on the "default" stream so you can confirm what stream this is emitting on.

rtempleton · ‎10-01-2015

My question was unclear. I have a topic with 4 partitions, I wanted to know how to wire up a Spout to read from all partitions simultaneously. I now know that if I set my spout parallelism to match the number of partitions, it accomplishes this automatically. I had assumed incorrectly that more configuration was required to achieve this.

rtempleton · ‎09-30-2015

The cluster was initially installed with the broker only on one node. Is there a way to install and register additional brokers in Ambari after the fact?

rtempleton · ‎09-30-2015

Yes, to be clear I meant having multiple instances of a Kafka spout reading from the multiple partitions of a single topic. As long as the parallelism hint for the KafkaSpout matches the number of partitions, this is handled automatically?

Online	Offline
Last Visited	‎10-31-2018 10:40 PM

Member Since	‎09-25-2015 06:37 PM
Last Visited	‎10-31-2018 10:40 PM
Posts	33
Kudos received	47

Cloudera Community

Re: ORC vs Parquet - When to use one over the othe...

Re: Implementing custom grouping on Storm throws a...

Re: Details on Nifi Fault Tolerance

Re: Hive Explain says "Plan not optimized by CBO d...

Hive Explain says "Plan not optimized by CBO due t...

Re: What is a good resource for best practices aro...

What is a good resource for best practices around ...

Re: ORC vs Parquet - When to use one over the othe...

Re: Implementing custom grouping on Storm throws a...

Re: Where can I find a good example of a Storm top...

Is it possible to add another Kafka broker to my c...

Re: Where can I find a good example of a Storm top...