Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Index topology stats failed to ACK

Highlighted

Index topology stats failed to ACK

Contributor

Hi,

I had a metron cluster working; but not sure what changes caused the indexing topology to fail on ACK. In the index topology stats 90% failed and about only 10% ACKED. However; when I go into indexingBolt, 0 failed. I checked into the storm logs on all my servers and it seems like the the connection between nodes are not established. Why is this?

o.a.s.m.n.Client [ERROR] connection attempt 5 to Netty-Client-hadoop-master/<ip>:6705 failed: java.net.ConnectException: Connection refused: hadoop-master/<ip>:6705

o.a.s.m.n.Client [ERROR] connection attempt 5 to Netty-Client-hadoop-master/<ip>:6702 failed: java.net.ConnectException: Connection refused: hadoop-master/<ip>:6702

I tried restarting the cluster servers and it didn't not yet. There are still some docs make it to ES, but not enough.

Any feedback is greatly appreciated.

8 REPLIES 8
Highlighted

Re: Index topology stats failed to ACK

Super Collaborator

Hi Arian,

A number of things you could check:

1. Do you have configured a number of ackers for your topology > 0 (via Ambari > Metron > indexing)

2. Indexing has 2 outputs (HDFS and ES), both channels have to succeed for an ack. It is mutually dependent so set the batch size of HDFS indexing to a small number ( maybe 5 or even 1 to disable batching) and the file rotation policy for HDFS to a small size as well ( 100 KB ?)

3. To simplify the setup you also disable hdfs indexing with the following config:

{
   "elasticsearch": {
      "index": "foo",
      "enabled" : true 
    },
   "hdfs": {
      "index": "foo",
      "batchSize": 100,
      "batchTimeout" : 0,
      "enabled" : false
    }
}

This way the acking will only be dependent on the ES writer.

Highlighted

Re: Index topology stats failed to ACK

Contributor

Thank you @Jasper

1. I have 16 Acker, 8 workers, 12 kafkaSpout, 12 indexingBolt, 12 hdfsIndexingBolt

2. I now disabled HDFS channel and focusing only to ES. I noticed there are three places we can modify indexing topology setting; flux/indexing/remote.yaml, config/elasticsearch.properties, and ambari > metron > indexing. I assume Ambari would take preceding and overwrite the other two? However, I need to look up the configuration name in the ambari.

3. I modified a few of the data type such as bro and yaf to disable the HDFS indexing. If I disable it, I don't need to set batch size or batchTimeout I assume.

My indexing topology got worse and now all failed and 0 ack; I turn on DEBUG mode and hopefully, i'll understand the issue better.

Will be back for more questions and as always thank you for your help!

Highlighted

Re: Index topology stats failed to ACK

Contributor

I'm struggling with indexing topology. On all of my servers, I keep seeing connection refused for this topology. It looks like the topology is receiving the data from kafka, but it fails to ACK or do anything with it. Why?

o.a.k.c.p.ProducerConfig [WARN] The configuration request.required.acks = 1 was supplied but isn't a known config.
o.a.k.c.u.AppInfoParser [INFO] Kafka version : 0.10.0.2.5.0.0-1245
o.a.k.c.u.AppInfoParser [INFO] Kafka commitId : dae559f56f07e2cd
o.a.s.d.executor [INFO] Prepared bolt indexingErrorBolt:(42)
o.a.s.m.n.Client [ERROR] connection attempt 9 to Netty-Client-hadoop-slave-1/<ip>:6706 failed: java.net.ConnectException: Connection refused: hadoop-slave-1/<ip>:6706
o.a.s.m.n.Client [ERROR] connection attempt 10 to Netty-Client-hadoop-slave-1/<ip>:6706 failed: java.net.ConnectException: Connection refused: hadoop-slave-1/<ip>:6706
o.e.plugins [INFO] [Plasma] modules [], plugins [], sites []
o.a.s.m.n.Client [ERROR] connection attempt 11 to Netty-Client-hadoop-slave-1/<ip>:6706 failed: java.net.ConnectException: Connection refused: hadoop-slave-1/<ip>:6706
o.a.s.m.n.Client [ERROR] connection attempt 12 to Netty-Client-hadoop-slave-1/:6706 failed: java.net.ConnectException: Connection refused: hadoop-slave-1/:6706
o.a.s.m.n.Client [ERROR] connection attempt 13 to Netty-Client-hadoop-slave-1/:6706 failed: java.net.ConnectException: Connection refused: hadoop-slave-1/:6706
o.a.s.m.n.Client [ERROR] connection attempt 14 to Netty-Client-hadoop-slave-1/:6706 failed: java.net.ConnectException: Connection refused: hadoop-slave-1/:6706
o.a.s.m.n.Client [ERROR] connection attempt 15 to Netty-Client-hadoop-slave-1/:6706 failed: java.net.ConnectException: Connection refused: hadoop-slave-1/:6706
Highlighted

Re: Index topology stats failed to ACK

Super Collaborator

@Arian Trayen

The "error" o.a.s.m.n.Client [ERROR] seems to be a benign error (see https://issues.apache.org/jira/browse/STORM-1382) that just points to the workers not being able to communicate amongst each other. The real problem might be that the workers keep on crashing for another reason.

The message actually points to a higher level problem on Storm. I suggest you take a look at the supervisor.log first. It might be that the supervisor keeps on spawning new instances of the indexing workers because the workers keep shutting down for some reason. It is the supervisors task to keep on trying.

Highlighted

Re: Index topology stats failed to ACK

Contributor

@Jasper

Thank you for your response. I looked into supervisor.log and nimbus.log

The odd thing is that I only have issue with the indexing topology. screen-shot-2017-12-07-at-95232-am.png

I used to be able to check kafka consumer group for "indexing" like you suggested. After I increased the partitions because I thought it would help processing all the backlog indicated by LAG. After increasing my partitions, I lost my "indexing" consumer group somehow.

The supervisor log keeps saying this message over and over and what does that mean? I assume the ID is referring to a host/worker?

o.a.s.d.supervisor [INFO] 17112828-b2a1-4db1-8a7e-880a618477ce still hasn't started 
o.a.s.d.supervisor [INFO] 63c44874-7c97-4c75-905f-f8e845625700 still hasn't started<br>

The nimbus log has messages like

o.a.s.d.nimbus [INFO] Executor indexing-3-1512599375:[32 32] not alive<br>o.a.s.d.nimbus [INFO] Executor indexing-3-1512599375:[64 64] not alive<br>o.a.s.d.nimbus [INFO] Executor indexing-3-1512599375:[56 56] not alive<br>o.a.s.d.nimbus [INFO] Executor indexing-3-1512599375:[24 24] not alive<br>o.a.s.d.nimbus [INFO] Executor indexing-3-1512599375:[40 40] not alive

In my indexing logs, I get a lot of messages like (Re-) joining group indexing but they are not error. Also if you look at my screenshot of the indexing topology, I have a large number of FAILED in kafkaSpout but my storm bolts have none FAILED. I tried to read up to understand Storm topology Stats, but it's still not very clear to me. There is a gap between these numbers that I don't know where those tuples went.

https://stackoverflow.com/questions/38891740/kafka-consumer-stuck-in-re-joining-group

o.a.s.k.s.KafkaSpout [INFO] Initialization complete
o.a.s.k.s.KafkaSpout [INFO] Initialization complete
o.a.k.c.c.i.ConsumerCoordinator [INFO] Revoking previously assigned partitions [indexing-7, indexing-6] for group indexing
o.a.s.k.s.KafkaSpout [INFO] Partitions revoked. [consumer-group=indexing, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@26e55ff, topic-partitions=[indexing-7, indexing-6]]
o.a.k.c.c.i.AbstractCoordinator [INFO] (Re-)joining group indexing
o.a.k.c.c.i.ConsumerCoordinator [INFO] Revoking previously assigned partitions [indexing-10] for group indexing
o.a.s.k.s.KafkaSpout [INFO] Partitions revoked. [consumer-group=indexing, consumer=org.apache.kafka.clients.consumer.KafkaConsumer@63e19a8d, topic-partitions=[indexing-10]]

As always, thank you for your help and insights.

Highlighted

Re: Index topology stats failed to ACK

Super Collaborator

@Arian Trayen

Based on the screenshot you provided I can tell that the HDFS indexing is not disabled. Actually it is throwing an error which is just off the screen.

Can you post what error is there?

I suggest you restart the indexing topology with a first poll offset strategy set to 'EARLIEST' to force the topology to start from the very beginning of your Kafka input topic. You can do that from Ambari. To answer your earlier questions; you don't want to change anything to the flux remote.yaml file. The file at config/elasticsearch.properties is rewritten based on the Ambari settings, so yes Ambari is in the lead for all these settings.

Re: Index topology stats failed to ACK

@Arian Trayen

Hi Arian,

I am facing somewhat similar issue. The hdfsIndexingBolt is not acking. Due to this pending uncommited tuples in kafka spout crosses 10000000 and topology hangs.

Can you please tell me how do we increase executor for each bolt?

Thanks,

Bharath

Highlighted

Re: Index topology stats failed to ACK

Contributor

@Bharath Phatak

Hi Bharath,

So sorry I didn't see your question earlier. To modify the storm topology you can modify two files

$METRON_HOME/config/<topology>.properties (topology like enrichment or elasticsearch for indexing)

$METRON_HOME/flux/indexing/remote.yml

With the latest release of metron 0.4.1 you can modify these setting from Ambari easily. For version 0.4.0, indexing topology you can modify it from ambari -> metron -> config -> advance tab.

good luck!

Don't have an account?
Coming from Hortonworks? Activate your account here