Reply
Explorer
Posts: 29
Registered: ‎02-19-2018

Hive Replication: regular expressions on table names?

I am trying to get Hive replication working and am not yet fully sure I understand the options. 

 

I can see that if you specify an individual database then you can leave the table specification blank - or specify a regular expression for the tables. I found that "*" was not an acceptable regular expression, so I wonder what rules they are using for that. 

 

However I really need wild cards to specify databases as well as tables. Is this possible? 

 

For instance imagine that I have 100 databases called

 

area1_something_db

 

and another 100 each called

 

area2_something_db

area3_something_db

area4_something_db

area5_something_db

 

My choices right now are to replicate all of them all at once, or replicate them one database at a time. This is a nightmare due to the large number of databases. Ideally I want a replication job which does one specific area which I can schedule according to some business decision. 

 

Am I right in thinking that I cannot have multiple Hive replications going on at the same time even if they are totally different databases?

 

 

Posts: 1,047
Topics: 1
Kudos: 263
Solutions: 131
Registered: ‎04-22-2014

Re: Hive Replication: regular expressions on table names?

@alexmc6

 

"*" is not a valid regex.  ".*" may be what you were going for...

 

I am not quite clear on your business requirement, but I think you are saying that you want to maybe create 10 replication schedules that will replicate chunks of 10 of your area databases... akin to this:

 

area([0-9]|1[0])_.*db

 

 

Announcements