Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Ingesting data from large volumes of dynamically created tables

Highlighted

Ingesting data from large volumes of dynamically created tables

New Contributor

As part of a current ingestion project, I'm looking for suggestions as to the selection of tools for a specific case...

We have an enterprise system which performs analytical runs, triggered by users. For each run, the system can create between 100 and 3,000 individual tables in MS SQL Server. These tables have low volumes (50-500 records each), but are generated fairly rapidly. They do not necessarily have a common structure, though do share many common fields.

Tables for a specific run are named with a common prefix.

A run status table lists these prefixes, with a run completion datetime.

We're looking for a solution which will poll the run status table for run completion, then ingest from all tables which have names matching the run prefix.

I'm thinking that this is likely to be a Kafka based solution, but haven't been able to find examples of cases where data is ingested from a dynamically changing list of input tables.

Any suggestions on where to start investigating would be much appreciated!

Don't have an account?
Coming from Hortonworks? Activate your account here