About ChineduLB

ChineduLB · ‎05-19-2024

We have 3 regional intake tables partitioned by date and a client-view table partioned by date and region. is there a way to populate the client-view table with data from the 3 source tables in one atomic transaction instead of three separate insert commands: currently we do multiple insert statement like: insert into client_view_tbl ( col, col2, col3... )partition(cobdate='20240915', region='region1') select col2, col2, col3... from region1_table where cobdate='20240915'; insert into client_view_tbl ( col, col2, col3... )partition(cobdate='20240915', region='region2') select col2, col2, col3... from region2_table where cobdate='20240915'; insert into client_view_tbl ( col, col2, col3... )partition(cobdate='20240915', region='region3') select col2, col2, col3... from region3_table where cobdate='20240915';

ChineduLB · ‎05-11-2024

Is there a way to do something like: case when (select count(*) from table1 > 0) then (select * from table1) when (select count(*) from table2 > 0) and (select count(*) from table3 > 0) then (select * from table3) end

ChineduLB · ‎05-11-2024

Given 6 tables with identical column and all partitioned by date Need to get result of a union results from all 6 tables only if all tables have data for the given date i.e. partition else return nothing.

ChineduLB · ‎11-13-2020

could you give a working example of this in spark 2.4 using scala dataframe can't seem to find the correct syntax... val result = dataFrame.select(count(when( col("col_1") === "val_1" && col("col_2") === "val_2", 1)

ChineduLB · ‎11-13-2020

Please explain this " ... E.g. the following would require only a single scan of the table (although it might be more expensive cause you don't have filtering from the where clause)...." not sure what you mean by the comment in brackets how will it be more expensive, what can we do to fix that.

ChineduLB · ‎11-13-2020

Where clause relies on multiple columns

ChineduLB · ‎11-11-2020

Have a list of about 100+ SQL Count Queries to run against a Hive Data Table, Looking for the most efficient way to run these queries. Queries are accessed at runtime as a list of queries stored in another Hive Table, generated by a different process. Queries like these each with a different where clause, where clauses are complex: 1. Select count(1) as count1 from MyTable where (... complex where clause here...) 2. Select Count(1) as Count1 from MyTable where (... where clause here ...) 3. etc.. Environment: Cloudera CDH 6.2

ChineduLB · ‎04-15-2020

Thanks @pauldefusco I would like to do it in spark - scala

ChineduLB · ‎04-15-2020

I have a source table Like ID USER DEPT 1 User1 Admin 2 User1 Accounts 3 User2 Finance 4 User3 Sales 5 User3 Finance I want to generate a DataFrame like this ID USER DEPARTMENT 1 User1 Admin,Accounts 2 User2 Finance 3 User3 Sales,Finance

ChineduLB · ‎01-18-2020

What is the most efficient way to get count of records meeting different search criteria from a Hive table. 1. count all records where column-a = Null 2. count all records where column-b in [1, 3, 5] 3. count all records where column-c = 'xxx' etc. there are a couple hundred of these counts, in groups of 3 or 4.

Online	Offline
Last Visited	‎05-21-2024 09:00 AM

Member Since	‎02-11-2019 07:55 AM
Last Visited	‎05-21-2024 09:00 AM
Posts	81
Kudos received	3

Cloudera Community

Insert Into Multiple Partitions with one Query

Select Statement Inside Case Statement In Impala

Query To Return Result Only If Data Exists in Mult...

Re: Run Multiple Count Operation On Data Table

Re: Run Multiple Count Operation On Data Table

Re: Run Multiple Count Operation On Data Table

Run Multiple Count Operation On Data Table

Re: Get column values in comma separated value

Get column values in comma separated value

get counts of rows meeting different filter criter...