Support Questions
Find answers, ask questions, and share your expertise
Announcements
Check out our newest addition to the community, the Cloudera Innovation Accelerator group hub.

Atlas falling behind the Kafka workload

Contributor

We are having a problem with lagging and falling behind the messages in the ATLAS_HOOK Kafka topic. And I can understand that, as we ingest a large number of tables every day to the cluster. Basically, we are creating roughly 165000 entries in the ATLAS_HOOK topic every day. Primarily from sqoop and create/drop tables in Hive. Problem is that Atlas only process around 35-40000 entries per day, so it kind of builds up.

Many of the tables we import are quite wide, so it’s pretty common that the messages in the Kafka topic are between 600-800Kb each.

I have verified that I can consume the messages in the topic from a normal Kafka client, so it’s not a problem with Kafka.I have also cleared the two HBase tables and cleared the Kafka topic just to start over from the beginning., but the problem remains.

I would like to get some help with what kind of performance tuning I can do to make sure that Atlas can consume at least 200.000 entries from the ATLAS_HOOK topic per day (we are planning to add a lot more datasources over the next couple of month). What options do I have to make this happen?

The HDP version we are running is 2.6.3

//Berry

2 REPLIES 2

Expert Contributor

Can you please try setting this in atlas-application.properties (via Ambari):

atlas.use.index.query.to.find.entity.by.unique.attributes=true

This has shown significant improvement in one of the environments. I would suggest trying this in pre-production environment first and verifying the results before updating production.

Contributor

Thanks for the suggestions, but that actually made the processing a lot slower. I have the performance monitoring enabled, so I can see that the ENTITY_FULL_UPDATE takes between 7 and 8 seconds each. With the suggested parameter set, they where running for over 30 seconds each.

After debugging what is happening during the ingestion of data in Atlas, I seems to think that the calls to HBase is the once that takes time. After a very short look in HBase, I see that the atlas_titan table is running with just one region. After a check in the documentation, I can see that the atlas.graph.storage.hbase.region-count or the atlas.graph.storage.hbase.regions-per-server would change the number of regions on the table during creation. So I tried setting those, drop the HBase table and start everything again. But the HBase table is still created with just one region. Anybody have any info on how to create the initial atlas_titan HBase table with more than one region?