About Suresh12

Suresh12 · ‎03-10-2016

I agree - just that some references were made to say flat file structures are efficient for Hadoop compare to start scheme structure interms of efficient for IO performance. But as you said it's very important to model it in a way it can work with BI tools.

Suresh12 · ‎03-10-2016

I've setup Cloudera Live instance on AWS and that works successfully now. But when I connect to Impala via Cloudera ODBC Impala driver it throws this error - not sure how to resolve this? Used ODBC Administrator to configure the connection to Impala daemon server FAILED! [Cloudera][ImpalaODBC] (100) Error from the Impala Thrift API: connect() failed: errno = 10060 thanks Suresh

Suresh12 · ‎03-09-2016

What's the best approach to follow to structure data files for Impala tables either flat file fully denormalised into a single file vs star schema model? This use case for integration with BI tools like Microstrategy with Impala.

Suresh12 · ‎03-08-2016

Thanks Alex. I can see some references to swapping tables/views and so on but it looks very complex to maintain I think. How about if we use two different HDFS locations say Location1 and Location2 - swap them to use the right location for the reporting tables once the data load is complete on one of the locations for the data processing? It looks like we can make use of Alter Table command to change the HDFS location for a table so that would nicely swap the location just a metadata operation. what do you think? 1st run: Location1 - use for Data Processing Location2 - use for Reporting 2nd run: Location1 - use for Reporting Location2 - use for Data Processing thanks Suresh

Suresh12 · ‎03-07-2016

We have a use case to reload all transactions data every month for defined set of years. We are going to use Spark and create required reporting tables. Will use Impala for analytical workloads with BI tool. how do we separate the data processing tables vs reporting tables and then swap tables in Impala? We want to minimise the impact to users in terms of availability of BI system and to ensure read consistency. Any ideas? Ive come across couple of options - partitioning (but need to copy or move the files to the reporting directory location) so that for the next run we can make use of data processing tables and other option is lock table - remove files, and move files from working to reporting directory (again will cause an impact to users during that file removal and movement duration). ideally it would work if we can have two databases and swap them based on the data load completion. thanks. Suresh

Online	Offline
Last Visited	‎07-12-2016 11:29 AM

Member Since	‎02-28-2016 10:01 AM
Last Visited	‎07-12-2016 11:29 AM
Posts	9
Kudos received	1

Cloudera Community

Re: Impala data model

Re: Impala data model

Connection to Impala Failed

Impala data model

Re: Read consistency - Impala

Read consistency - Impala