Reply
New Contributor
Posts: 6
Registered: ‎06-23-2016
Accepted Solution

Problem with kudu flume sink

Hey,

I'm using kudu 0.9.9 version.
Now, I'm testing on kudu so I trying to ingest data to kudu with kudu-flume-sink module but it's not works.

I can see below logs with flume agent logs.
The 192.168.80.21 ~ 23 hosts were deleted but still trying to connect to them.

What do I need to do to fix it?


No leader provided for tablet ce411790aee5412e84ca78c69552f6d6
No leader provided for tablet cbb4af8271844725abbd86e1390815dd
No leader provided for tablet 6cdad68755c84c64ba93eeee28d91248
No leader provided for tablet 6cdad68755c84c64ba93eeee28d91248
No leader provided for tablet 40b0084706d340a88e3017bf839dde30
[Peer 36db5ba67e56407a8249a7f9084e42db] Unexpected exception from downstream on [id: 0x830d23d0]
java.net.ConnectException: Connection refused: /192.168.80.22:7050
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.kududb.client.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
at org.kududb.client.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.kududb.client.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.kududb.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.kududb.client.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.kududb.client.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.kududb.client.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
No leader provided for tablet 2122c8e655904bad86ffe247b50c21a1
No leader provided for tablet 6498fbaa5b64403d8b33ebfab32ee72d
[Peer b28a9662cfcd48189fc075491f106d91] Unexpected exception from downstream on [id: 0x718714b0]
java.net.ConnectException: Connection refused: /192.168.80.23:7050
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.kududb.client.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
at org.kududb.client.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.kududb.client.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.kududb.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.kududb.client.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.kududb.client.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.kududb.client.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[Peer ac3c70ef86f047878153e6d4c3a61296] Unexpected exception from downstream on [id: 0x1f90039c]
java.net.ConnectException: Connection refused: /192.168.80.21:7050
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.kududb.client.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.connect(NioClientBoss.java:152)
at org.kududb.client.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.processSelectedKeys(NioClientBoss.java:105)
at org.kududb.client.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.process(NioClientBoss.java:79)
at org.kududb.client.shaded.org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:337)
at org.kududb.client.shaded.org.jboss.netty.channel.socket.nio.NioClientBoss.run(NioClientBoss.java:42)
at org.kududb.client.shaded.org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.kududb.client.shaded.org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Cloudera Employee
Posts: 70
Registered: ‎04-08-2014

Re: Problem with kudu flume sink

Hi kspark,

Please provide your Flume configuration.

 

Thanks,

Mike

New Contributor
Posts: 6
Registered: ‎06-23-2016

Re: Problem with kudu flume sink

Hi mpercy,

 

Here is my flume configuration.

 

# Apache Log Source
tier3.sources.rr4.type = org.apache.flume.source.kafka.KafkaSource
tier3.sources.rr4.channels = cc4
tier3.sources.rr4.zookeeperConnect = mn1.igloosecurity.co.kr:2181,mn2.igloosecurity.co.kr:2181,mn3.igloosecurity.co.kr:2181
tier3.sources.rr4.topic = ApacheJson1
tier3.sources.rr4.groupId = ApacheJson1_group2
tier3.sources.rr4.kafka.consumer.timeout.ms = 100


tier3.sinks.kk4.channel = cc4
tier3.sinks.kk4.type = org.kududb.flume.sink.KuduSink
tier3.sinks.kk4.batchSize = 2000
#tier3.sinks.kk4.type = null
tier3.sinks.kk4.tableName = apachekudu
tier3.sinks.kk4.masterAddresses = mn1.igloosecurity.co.kr:7051


# Use a channel which buffers events in memory
tier3.channels.cc4.type = memory
tier3.channels.cc4.capacity = 10000000
tier3.channels.cc4.transactionCapacity = 2000000

 

 

 

 

And also the code in getOperation method of SimpleKuduEventProducer.java was changed.

 

 

@Override
public List<Operation> getOperations() throws FlumeException {
try {
JSONObject data = (JSONObject) JSONValue.parse(new String(this.payload));
Insert insert = this.table.newInsert();
PartialRow row = insert.getRow();

long event_time_long = 0L;

long s_addr = 0L;
long d_addr = 0L;
short status = 0;
short evt_size = 0;
try {
event_time_long = Long.parseLong(data.get("event_time_long").toString());
} catch (NumberFormatException|NullPointerException e) {
LOGGER.warn(e.getMessage());
LOGGER.warn(data.toJSONString());
event_time_long = new Date().getTime();
}

try {
s_addr = Long.parseLong(data.get("s_addr").toString());
} catch (NumberFormatException|NullPointerException e) {
LOGGER.warn(e.getMessage());
LOGGER.warn(data.toJSONString());
}

try {
d_addr = Long.parseLong(data.get("s_addr").toString());
} catch (NumberFormatException|NullPointerException e) {
LOGGER.warn(e.getMessage());
LOGGER.warn(data.toJSONString());
}

try {
status = Short.parseShort(data.get("status").toString());
} catch (NumberFormatException|NullPointerException e) {
LOGGER.warn(e.getMessage());
LOGGER.warn(data.toJSONString());
}

try {
evt_size = Short.parseShort(data.get("evt_size").toString());
} catch (NumberFormatException|NullPointerException e) {
LOGGER.warn(e.getMessage());
LOGGER.warn(data.toJSONString());
}

row.addLong("event_time_long", event_time_long);
row.addLong("s_addr", s_addr);
row.addLong("d_addr", d_addr);
row.addString("origin", data.get("origin").toString());
row.addString("body", data.get("body").toString());
row.addString("s_info", data.get("s_info").toString());
row.addString("d_info", data.get("d_info").toString());
row.addString("host", data.get("host").toString());
row.addString("event_time", data.get("event_time").toString());
row.addString("ext2", data.get("ext2").toString());
row.addShort("status", status);
row.addShort("evt_size", evt_size);
return Collections.singletonList((Operation) insert);
} catch (Exception e){
LOGGER.warn(new String(payload));
LOGGER.warn(e.getMessage(), e);
throw new FlumeException("Failed to create Kudu Insert object!", e);
}
}

 

 

Thanks,

Park

 

New Contributor
Posts: 6
Registered: ‎06-23-2016

Re: Problem with kudu flume sink

Hey,

 

I think the problem is on tablet servers.

 

They have below logs.

 

T 9ca44a1b2c074db28d6a24598e5b9dfc P 1d455c0043af4fc8a7ba01c860c6d33d [term 49381 FOLLOWER]: Advancing to term 49382

T 9ca44a1b2c074db28d6a24598e5b9dfc P 1d455c0043af4fc8a7ba01c860c6d33d [term 49382 FOLLOWER]: Snoozing failure detection for election timeout plus an additional 14.414s

T 9ca44a1b2c074db28d6a24598e5b9dfc P 1d455c0043af4fc8a7ba01c860c6d33d [term 49382 FOLLOWER]: Starting election with config: local: false peers { permanent_uuid: "36db5ba67e56407a8249a7f9084e42db" member_type: VOTER last_known_addr { host: "sn5.igloosecurity.co.kr" port: 7050 } } peers { permanent_uuid: "1d455c0043af4fc8a7ba01c860c6d33d" member_type: VOTER last_known_addr { host: "sn3.igloosecurity.co.kr" port: 7050 } }

T 9ca44a1b2c074db28d6a24598e5b9dfc P 1d455c0043af4fc8a7ba01c860c6d33d [CANDIDATE]: Term 49382 election: Requesting vote from peer 36db5ba67e56407a8249a7f9084e42db

T 9ca44a1b2c074db28d6a24598e5b9dfc P 1d455c0043af4fc8a7ba01c860c6d33d [CANDIDATE]: Term 49382 election: RPC error from VoteRequest() call to peer 36db5ba67e56407a8249a7f9084e42db: Network error: Client connection negotiation failed: client connection to 192.168.80.22:7050: connect: Connection refused (error 111)

T 9ca44a1b2c074db28d6a24598e5b9dfc P 1d455c0043af4fc8a7ba01c860c6d33d [CANDIDATE]: Term 49382 election: Election decided. Result: candidate lost.

T 9ca44a1b2c074db28d6a24598e5b9dfc P 1d455c0043af4fc8a7ba01c860c6d33d [term 49382 FOLLOWER]: Snoozing failure detection for election timeout plus an additional 8.725s

T 9ca44a1b2c074db28d6a24598e5b9dfc P 1d455c0043af4fc8a7ba01c860c6d33d [term 49382 FOLLOWER]: Leader election lost for term 49382. Reason: None given

T e1ae3c1935ef4d0096e1d8e9df5e26c6 P 1d455c0043af4fc8a7ba01c860c6d33d [term 42720 FOLLOWER]: No leader contacted us within the election timeout. Triggering leader election

T e1ae3c1935ef4d0096e1d8e9df5e26c6 P 1d455c0043af4fc8a7ba01c860c6d33d [term 42720 FOLLOWER]: Advancing to term 42721

T e1ae3c1935ef4d0096e1d8e9df5e26c6 P 1d455c0043af4fc8a7ba01c860c6d33d [term 42721 FOLLOWER]: Snoozing failure detection for election timeout plus an additional 18.405s

T e1ae3c1935ef4d0096e1d8e9df5e26c6 P 1d455c0043af4fc8a7ba01c860c6d33d [term 42721 FOLLOWER]: Starting election with config: opid_index: 7452 local: false peers { permanent_uuid: "36db5ba67e56407a8249a7f9084e42db" member_type: VOTER last_known_addr { host: "sn5.igloosecurity.co.kr" port: 7050 } } peers { permanent_uuid: "1d455c0043af4fc8a7ba01c860c6d33d" member_type: VOTER last_known_addr { host: "sn3.igloosecurity.co.kr" port: 7050 } } peers { permanent_uuid: "b28a9662cfcd48189fc075491f106d91" member_type: VOTER last_known_addr { host: "sn6.igloosecurity.co.kr" port: 7050 } }

T e1ae3c1935ef4d0096e1d8e9df5e26c6 P 1d455c0043af4fc8a7ba01c860c6d33d [CANDIDATE]: Term 42721 election: Requesting vote from peer 36db5ba67e56407a8249a7f9084e42db

T e1ae3c1935ef4d0096e1d8e9df5e26c6 P 1d455c0043af4fc8a7ba01c860c6d33d [CANDIDATE]: Term 42721 election: Requesting vote from peer b28a9662cfcd48189fc075491f106d91

T e1ae3c1935ef4d0096e1d8e9df5e26c6 P 1d455c0043af4fc8a7ba01c860c6d33d [CANDIDATE]: Term 42721 election: RPC error from VoteRequest() call to peer 36db5ba67e56407a8249a7f9084e42db: Network error: Client connection negotiation failed: client connection to 192.168.80.22:7050: connect: Connection refused (error 111)

 

 

 

They keep trying to elect a leader.

 

I think that tablet servers are busy to elect a leader then they can't afford to store datas.

 

See this logs, please check!

Cloudera Employee
Posts: 70
Registered: ‎04-08-2014

Re: Problem with kudu flume sink

Hi Park,

Sorry, I went out of town for a while and then forgot to check back here.

 

You are right that the problem is not with Flume. The problem is with your Kudu tablet servers.

 

It appears that you have only defined a replication factor of 2, and one of the replicas is down. Kudu cannot function in that way. You need a strict majority online to do anything. That means that if you have 3 replicas, at least 2 must be online to make progress. If you have 2 replicas, you still need 2 replicas online to make progress.

 

Hope that helps, and apologies for the delay.

 

Mike

New Contributor
Posts: 6
Registered: ‎06-23-2016

Re: Problem with kudu flume sink

Hey Mike,

Thanks for the reply.

 

I'll look into the replca thing.

 

Thanks,

Park

New Contributor
Posts: 6
Registered: ‎06-23-2016

Re: Problem with kudu flume sink

[ Edited ]

Hey Mike,

 

Sorry for being late to update.

 

I solved the problem by removing tables which created with 6 tablet servers. (Now the cluster has 3 tablet servers)

 

I think that the table has a meta data about tablet info.

So, that's why it was keep trying to connect to nodes which eliminated. 

 

Then, I'm curious about that I should change some meta info of a table when number of nodes changed.

 

Can you explain about this? 

 

Thaks!

 

Cloudera Employee
Posts: 70
Registered: ‎04-08-2014

Re: Problem with kudu flume sink

Hi Park, you cannot simply remove half of the nodes and expect Kudu to keep running. It's likely that you will lose data. Imagine you have the following scenario:

 

2 tablets (I, II), 6 tablet servers (A, B, C, D, E, F)

 

Imagine also you have 3 replicas each. Imagine that, by luck, you have the following replicas hosted on tablet servers:

 

tablet I (A, B, C) and tablet II (D, E, F)

 

Now, take tablet servers D, E, F offline. There are no copies of tablet II and Kudu will not be able to operate. That table cannot recover.

 

Kudu is even more strict than that. It can only operate if a majority of the replicas remain online. One replica is not enough to recover if the replication factor is higher than 1.

 

We still need to create safe decommission tools for Kudu. Right now, you will have to shut down one machine every 5 minutes (by default) to get the effect of removing nodes from the cluster permanently in a safe manner. To work around this, you could set --follower_unavailable_considered_failed_sec on the master to a shorter value, say 2 or 3 minutes or something, to speed up the process: http://kudu.apache.org/docs/configuration_reference.html#kudu-master_follower_unavailable_considered...

 

Hope this helps,

Mike

New Contributor
Posts: 6
Registered: ‎06-23-2016

Re: Problem with kudu flume sink

Hey Mike,
Thanks for the fast reply.