I am looking for a best practice using Hive especially for database modeling, software development and if possible a version control. At the moment I struggle at a certain point where the logical world meets the code.
I have found tools which assist modelling databases in Hive, e.g. Embaracadero (Hortonworks Partner?). So I could model my databases there and create DDL scripts, I guess. To get version control I can add those scripts to git or something else. What happens if many users want to work on the logical model? How do you handle such problems? Jumping back and forth between database model versions is only possible with git versioning not only by the tool, or is it?
All other scripts regarding the hive databases and tables (ingestion and so on) live in a git repository. So they are under perfect version control but if something changes many adaptions have to be made (at least one in the config file and maybe in Insert statements etc.). What I am missing in the code world is a nice view of databases and tablss even a entity-relation-diagramm but that's not of main interest.
What is in your opinion a good way to tackle these problems? I mean someone like Facebook does not want to manage all tables and databases via a Hive View or solely based on code, or do they? How to keep the oversight in the big data world?
Any help is really appreciated! Thanks in advance!