Member since
09-25-2015
33
Posts
49
Kudos Received
2
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
76381 | 10-26-2015 04:42 PM | |
1431 | 10-21-2015 08:03 PM |
11-14-2016
09:32 PM
Hi Ambud, This section in the Storm documentation provides an
excellent explanation on how acking and fault tolerance work together to
guarantee at least once processing: http://storm.apache.org/releases/1.0.2/Guaranteeing-message-processing.html
... View more
11-25-2015
12:00 AM
1 Kudo
Yes, I ran both. I executed the explain command from within the Ambari Hive view ( I realize not optimal for the result it produces) so I checked the hiveserver2.log. I don't see any info there regarding why it thinks the stats are missing or incomplete.
... View more
11-24-2015
11:10 PM
1 Kudo
I'm running an explain on a query and the result includes "Plan not optimized by CBO due to missing statistics. Please check log for more details" 1. All the tables in the query have had compute statistics run on them and the describe formatted output shows that stats are present and up to date. What is missing? 2. Which log file is the message referring to? I looked in Hive the hiveserver2.log and can see the log entries for the explain command here but there's no explanation on what stats it thinks are missing.
... View more
Labels:
- Labels:
-
Apache Hive
11-03-2015
11:12 PM
1 Kudo
Thanks @Neeraj - The 200GB guide line is something I can share with customers.
... View more
11-02-2015
01:50 PM
Labels:
- Labels:
-
Apache Hadoop
10-26-2015
04:42 PM
6 Kudos
In my mind the two biggest considerations for ORC over Parquet are: 1. Many of the performance improvements provided in the Stinger initiative are dependent on features of the ORC format including block level index for each column. This leads to potentially more efficient I/O allowing Hive to skip reading entire blocks of data if it determines predicate values are not present there. Also the Cost Based Optimizer has the ability to consider column level metadata present in ORC files in order to generate the most efficient graph. 2. ACID transactions are only possible when using ORC as the file format.
... View more
10-21-2015
08:03 PM
Hi Artem, The default stream id is "default". Check to see if the mapper bolt you mention is declaring a new outputStream. This would occur in the declareOutputFields method of that class. Also, check the collector.emit call within the execute method. The default behavior is to emit on the "default" stream so you can confirm what stream this is emitting on.
... View more
10-01-2015
10:16 PM
My question was unclear. I have a topic with 4 partitions, I wanted to know how to wire up a Spout to read from all partitions simultaneously. I now know that if I set my spout parallelism to match the number of partitions, it accomplishes this automatically. I had assumed incorrectly that more configuration was required to achieve this.
... View more
09-30-2015
02:01 PM
1 Kudo
The cluster was initially installed with the broker only on one node. Is there a way to install and register additional brokers in Ambari after the fact?
... View more
Labels:
- Labels:
-
Apache Ambari
-
Apache Kafka
09-30-2015
01:50 PM
Yes, to be clear I meant having multiple instances of a Kafka spout reading from the multiple partitions of a single topic. As long as the parallelism hint for the KafkaSpout matches the number of partitions, this is handled automatically?
... View more
- « Previous
-
- 1
- 2
- Next »