I am trying to get Hive replication working and am not yet fully sure I understand the options.
I can see that if you specify an individual database then you can leave the table specification blank - or specify a regular expression for the tables. I found that "*" was not an acceptable regular expression, so I wonder what rules they are using for that.
However I really need wild cards to specify databases as well as tables. Is this possible?
For instance imagine that I have 100 databases called
and another 100 each called
My choices right now are to replicate all of them all at once, or replicate them one database at a time. This is a nightmare due to the large number of databases. Ideally I want a replication job which does one specific area which I can schedule according to some business decision.
Am I right in thinking that I cannot have multiple Hive replications going on at the same time even if they are totally different databases?
"*" is not a valid regex. ".*" may be what you were going for...
I am not quite clear on your business requirement, but I think you are saying that you want to maybe create 10 replication schedules that will replicate chunks of 10 of your area databases... akin to this: