We kind of built a data warehouse around the same idea that you have talked about in your article. Integrating Salesforce and Google analytics as data-warehouse @infocaptor http://www.infocaptor.com The benefit is you can also co-relate with your financial data
When you design using GA api, you need to load the initial historical data for a certain date range. This has its own complications as you might run into segmentation issues, loss of data etc. You need to handle pagination etc. Once the initial data load is complete, you then run it in incremental mode where you just bring new data only. This data gets appended to the same Data warehouse tables and does not cause duplicate with overlapping dates.
The minimum you would need to design is some kind of background daemon that runs everyday or at some frequency. You will need job tables to monitor the success and failure of the extracts so that it can resume from where the error occurred. Some of the other considerations 1. What happens if you run the extract for the same data range 2. What if a job fails for certain dates
It is important to set your primary keys for your DW target tables. The extracted data is stored as CSV files and these can be easily pushed to Hadoop file system.
... View more