Member since
11-03-2014
46
Posts
8
Kudos Received
7
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
6074 | 11-01-2016 11:02 PM | |
2061 | 01-05-2016 07:27 PM | |
3665 | 05-27-2015 09:15 PM | |
1683 | 04-01-2015 11:39 PM | |
3678 | 03-17-2015 09:31 PM |
10-11-2017
07:07 PM
Hi Athtsang, Did you find proper instructions to change IP address of entire hadoop cluster, which is managed by Cloudera Manager.
... View more
07-05-2017
08:38 AM
1 Kudo
I believe the 5.11 RPM should work ok.
... View more
04-19-2017
09:44 PM
1 Kudo
You are correct, this is a limitation: https://issues.cloudera.org/browse/IMPALA-2108 Impala would basically need to do infer the IN predicate that you are using in your workaround. You are welcome to take a stab at contributing a patch!
... View more
04-05-2017
07:28 PM
1 Kudo
The cause of my case was described in Message 4-5 of the thread. Here are some possible solutions set spark.local.dir to somewhere else outside /tmp . Refer to Spark Configuration for how to configure the value. disable housekeeping of /tmp/spark-... periodic restart your spark streaming job
... View more
11-01-2016
11:02 PM
Answering my question... The source code of org.apache.hadoop.hdfs.server.blockmanagement.BlockManager says ...
if (numCurrentReplica > expectedReplication) {
if (num.replicasOnStaleNodes() > 0) {
// If any of the replicas of this block are on nodes that are
// considered "stale", then these replicas may in fact have
// already been deleted. So, we cannot safely act on the
// over-replication until a later point in time, when
// the "stale" nodes have block reported.
return MisReplicationResult.POSTPONE;
}
... So the key point is whether the DataNodes are "stale". I don't know how to force the nodes to have block reported besides restarting. So I restarted all DataNode and over-replicated blocks gone.
... View more
10-05-2016
12:43 AM
1 Kudo
For (1), the answer right now is no. Once the dead node detection occurs, NameNode will swiftly act at re-replicating the identified lost replicas. There's something along the lines of what you need being worked upon upstream via https://issues.apache.org/jira/browse/HDFS-7877 but the work is still in progress and will only arrive in a future undetermined CDH release. For (2), you can hunt such files with replication factor of 1 and raise them to 2 and wait for under-replication count to reach 0 before you take the DN down. The change of replication factor is doable by the command 'hadoop fs -setrep'.
... View more
04-27-2016
06:22 PM
That's it. Thanks.
... View more
01-05-2016
07:27 PM
Replying myself. I worked around this with Sink Groups and a Null Sink. Relevant settings in flume.conf a1.sinks = hdfssink avrosink nullsink
a1.sinkgroups = avrosinkgroup
a1.sinkgroups.avrosinkgroup.sinks = avrosink nullsink
a1.sinkgroups.avrosinkgroup.processor.type = failover
a1.sinkgroups.avrosinkgroup.processor.priority.avrosink = 100
a1.sinkgroups.avrosinkgroup.processor.priority.nullsink = 10
a1.sinks.nullsink.type = null
a1.sinks.nullsink.channel = avrochannel
a1.sinks.nullsink.batchsize = 10000 The end result is that avrochannel use the high priority avrosink (priority=100) normally. If this sink fails, it failover to the low prioirty nullsink, which simply discard the events. PS: Upgraded to CDH5.5.1, which bundles Flume 1.6 This works with Spark Streaming "Flume-style Push-based Approach" (sink type=avro), but not "Pull-based Approach using a Custom Sink" (sink type=org.apache.spark.streaming.flume.sink.SparkSink). Guess the custom sink refuse to admit fail because of fault-tolerance guarantees. Reference: http://spark.apache.org/docs/latest/streaming-flume-integration.html
... View more
06-25-2015
03:05 PM
Makes sense. I appreciate your thorough question, and I completely agree that we should point out this expression-substitution behavior in the performance guide. It's not the first time it has come, and I'd imagine it will not be the last 🙂 Btw, if you really really want to get the materialization behavior with an inline view without an ORDER BY, then you can apply the following terrible hack. Original query: select a, b, c from (select f(x) as a, f(y) as b, f(z) as c from mytable) v Modified query to force materialization of inline view: select a, b, c from (select f(x) as a, f(y) as b, f(z) as c from mytable union all select NULL, NULL, NULL from mytable where false) v The "union all" will force materialization, but the second union operand will be dropped due to the "false" predicate. Obviously, that behavior is implementation defined and subject to change any time, so it would be wise not to rely on it.
... View more
05-27-2015
09:17 PM
Thanks for the update!
... View more