Member since
02-28-2016
9
Posts
1
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2694 | 03-10-2016 03:24 PM |
03-10-2016
03:24 PM
I agree - just that some references were made to say flat file structures are efficient for Hadoop compare to start scheme structure interms of efficient for IO performance. But as you said it's very important to model it in a way it can work with BI tools.
... View more
03-10-2016
06:29 AM
1 Kudo
I've setup Cloudera Live instance on AWS and that works successfully now. But when I connect to Impala via Cloudera ODBC Impala driver it throws this error - not sure how to resolve this? Used ODBC Administrator to configure the connection to Impala daemon server FAILED! [Cloudera][ImpalaODBC] (100) Error from the Impala Thrift API: connect() failed: errno = 10060 thanks Suresh
... View more
Labels:
- Labels:
-
Apache Impala
03-09-2016
10:06 AM
What's the best approach to follow to structure data files for Impala tables either flat file fully denormalised into a single file vs star schema model? This use case for integration with BI tools like Microstrategy with Impala.
... View more
Labels:
- Labels:
-
Apache Impala
03-08-2016
09:26 AM
Thanks Alex. I can see some references to swapping tables/views and so on but it looks very complex to maintain I think. How about if we use two different HDFS locations say Location1 and Location2 - swap them to use the right location for the reporting tables once the data load is complete on one of the locations for the data processing? It looks like we can make use of Alter Table command to change the HDFS location for a table so that would nicely swap the location just a metadata operation. what do you think? 1st run: Location1 - use for Data Processing Location2 - use for Reporting 2nd run: Location1 - use for Reporting Location2 - use for Data Processing thanks Suresh
... View more
03-07-2016
02:20 PM
We have a use case to reload all transactions data every month for defined set of years. We are going to use Spark and create required reporting tables. Will use Impala for analytical workloads with BI tool. how do we separate the data processing tables vs reporting tables and then swap tables in Impala? We want to minimise the impact to users in terms of availability of BI system and to ensure read consistency. Any ideas? Ive come across couple of options - partitioning (but need to copy or move the files to the reporting directory location) so that for the next run we can make use of data processing tables and other option is lock table - remove files, and move files from working to reporting directory (again will cause an impact to users during that file removal and movement duration). ideally it would work if we can have two databases and swap them based on the data load completion. thanks. Suresh
... View more
Labels:
- Labels:
-
Apache Impala
-
Apache Spark