About athtsang

Priyanka26 · ‎10-11-2017

Hi Athtsang, Did you find proper instructions to change IP address of entire hadoop cluster, which is managed by Cloudera Manager.

Tim Armstrong · ‎07-05-2017

I believe the 5.11 RPM should work ok.

alex.behm · ‎04-19-2017

You are correct, this is a limitation: https://issues.cloudera.org/browse/IMPALA-2108 Impala would basically need to do infer the IN predicate that you are using in your workaround. You are welcome to take a stab at contributing a patch!

athtsang · ‎04-05-2017

The cause of my case was described in Message 4-5 of the thread. Here are some possible solutions set spark.local.dir to somewhere else outside /tmp . Refer to Spark Configuration for how to configure the value. disable housekeeping of /tmp/spark-... periodic restart your spark streaming job

athtsang · ‎11-01-2016

Answering my question... The source code of org.apache.hadoop.hdfs.server.blockmanagement.BlockManager says ... if (numCurrentReplica > expectedReplication) { if (num.replicasOnStaleNodes() > 0) { // If any of the replicas of this block are on nodes that are // considered "stale", then these replicas may in fact have // already been deleted. So, we cannot safely act on the // over-replication until a later point in time, when // the "stale" nodes have block reported. return MisReplicationResult.POSTPONE; } ... So the key point is whether the DataNodes are "stale". I don't know how to force the nodes to have block reported besides restarting. So I restarted all DataNode and over-replicated blocks gone.

Harsh J · ‎10-05-2016

For (1), the answer right now is no. Once the dead node detection occurs, NameNode will swiftly act at re-replicating the identified lost replicas. There's something along the lines of what you need being worked upon upstream via https://issues.apache.org/jira/browse/HDFS-7877 but the work is still in progress and will only arrive in a future undetermined CDH release. For (2), you can hunt such files with replication factor of 1 and raise them to 2 and wait for under-replication count to reach 0 before you take the DN down. The change of replication factor is doable by the command 'hadoop fs -setrep'.

athtsang · ‎04-27-2016

That's it. Thanks.

athtsang · ‎01-05-2016

Replying myself. I worked around this with Sink Groups and a Null Sink. Relevant settings in flume.conf a1.sinks = hdfssink avrosink nullsink a1.sinkgroups = avrosinkgroup a1.sinkgroups.avrosinkgroup.sinks = avrosink nullsink a1.sinkgroups.avrosinkgroup.processor.type = failover a1.sinkgroups.avrosinkgroup.processor.priority.avrosink = 100 a1.sinkgroups.avrosinkgroup.processor.priority.nullsink = 10 a1.sinks.nullsink.type = null a1.sinks.nullsink.channel = avrochannel a1.sinks.nullsink.batchsize = 10000 The end result is that avrochannel use the high priority avrosink (priority=100) normally. If this sink fails, it failover to the low prioirty nullsink, which simply discard the events. PS: Upgraded to CDH5.5.1, which bundles Flume 1.6 This works with Spark Streaming "Flume-style Push-based Approach" (sink type=avro), but not "Pull-based Approach using a Custom Sink" (sink type=org.apache.spark.streaming.flume.sink.SparkSink). Guess the custom sink refuse to admit fail because of fault-tolerance guarantees. Reference: http://spark.apache.org/docs/latest/streaming-flume-integration.html

alex.behm · ‎06-25-2015

Makes sense. I appreciate your thorough question, and I completely agree that we should point out this expression-substitution behavior in the performance guide. It's not the first time it has come, and I'd imagine it will not be the last 🙂 Btw, if you really really want to get the materialization behavior with an inline view without an ORDER BY, then you can apply the following terrible hack. Original query: select a, b, c from (select f(x) as a, f(y) as b, f(z) as c from mytable) v Modified query to force materialization of inline view: select a, b, c from (select f(x) as a, f(y) as b, f(z) as c from mytable union all select NULL, NULL, NULL from mytable where false) v The "union all" will force materialization, but the second union operand will be dropped due to the "false" predicate. Obviously, that behavior is implementation defined and subject to change any time, so it would be wise not to rely on it.

GautamG · ‎05-27-2015

Thanks for the update!

Online	Offline
Last Visited	‎07-03-2019 10:01 PM

Member Since	‎11-03-2014 11:01 PM
Last Visited	‎07-03-2019 10:01 PM
Posts	46
Kudos received	8

Cloudera Community

Re: Fixing Over-replicated Blocks

Re: Isolation between Flume Channels?

Re: Problem Controlling CM Agent Start / Stop in a...

Re: Repairing a corrupt Cloudera Manager Installat...

Re: After server crash, HA Standby NameNode "Prema...

Re: Changing IP Address of Cluster

Re: UDF Problem: unresolvable relocation of 'ZNSs4...

Re: Question about Partition Pruning in Impala

Re: Spark Streaming: FileNotFoundException on file...

Re: Fixing Over-replicated Blocks

Re: Concern about Replication when Scheduled NameN...

Re: IllegalArgumentException: requirement failed: ...

Re: Isolation between Flume Channels?

Re: Performance Reduced after Removing ORDER BY cl...

Re: Problem Controlling CM Agent Start / Stop in a...