Support Questions

alexmc6 · ‎07-18-2018

I am trying to get Hive replication working and am not yet fully sure I understand the options.

I can see that if you specify an individual database then you can leave the table specification blank - or specify a regular expression for the tables. I found that "*" was not an acceptable regular expression, so I wonder what rules they are using for that.

However I really need wild cards to specify databases as well as tables. Is this possible?

For instance imagine that I have 100 databases called

area1_something_db

and another 100 each called

area2_something_db

area3_something_db

area4_something_db

area5_something_db

My choices right now are to replicate all of them all at once, or replicate them one database at a time. This is a nightmare due to the large number of databases. Ideally I want a replication job which does one specific area which I can schedule according to some business decision.

Am I right in thinking that I cannot have multiple Hive replications going on at the same time even if they are totally different databases?

bgooley · ‎07-18-2018

@alexmc6

"*" is not a valid regex. ".*" may be what you were going for...

I am not quite clear on your business requirement, but I think you are saying that you want to maybe create 10 replication schedules that will replicate chunks of 10 of your area databases... akin to this:

area([0-9]|1[0])_.*db

Cloudera Community

Support Questions

Hive Replication: regular expressions on table names?