Member since
02-11-2019
81
Posts
3
Kudos Received
0
Solutions
05-19-2024
04:29 AM
1 Kudo
We have 3 regional intake tables partitioned by date and a client-view table partioned by date and region. is there a way to populate the client-view table with data from the 3 source tables in one atomic transaction instead of three separate insert commands: currently we do multiple insert statement like: insert into client_view_tbl
(
col, col2, col3...
)partition(cobdate='20240915', region='region1')
select col2, col2, col3... from region1_table where cobdate='20240915';
insert into client_view_tbl
(
col, col2, col3...
)partition(cobdate='20240915', region='region2')
select col2, col2, col3... from region2_table where cobdate='20240915';
insert into client_view_tbl
(
col, col2, col3...
)partition(cobdate='20240915', region='region3')
select col2, col2, col3... from region3_table where cobdate='20240915';
... View more
Labels:
- Labels:
-
Apache Impala
05-11-2024
04:03 AM
Is there a way to do something like: case when (select count(*) from table1 > 0) then (select * from table1) when (select count(*) from table2 > 0) and (select count(*) from table3 > 0) then (select * from table3) end
... View more
Labels:
- Labels:
-
Apache Impala
05-11-2024
03:35 AM
1 Kudo
Given 6 tables with identical column and all partitioned by date Need to get result of a union results from all 6 tables only if all tables have data for the given date i.e. partition else return nothing.
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Impala
11-13-2020
09:35 PM
could you give a working example of this in spark 2.4 using scala dataframe can't seem to find the correct syntax... val result = dataFrame.select(count(when( col("col_1") === "val_1" && col("col_2") === "val_2", 1)
... View more
11-13-2020
10:55 AM
Please explain this " ... E.g. the following would require only a single scan of the table (although it might be more expensive cause you don't have filtering from the where clause)...." not sure what you mean by the comment in brackets how will it be more expensive, what can we do to fix that.
... View more
11-13-2020
10:49 AM
Where clause relies on multiple columns
... View more
11-11-2020
08:15 AM
Have a list of about 100+ SQL Count Queries to run against a Hive Data Table,
Looking for the most efficient way to run these queries.
Queries are accessed at runtime as a list of queries stored in another Hive Table, generated by a different process.
Queries like these each with a different where clause, where clauses are complex:
1. Select count(1) as count1 from MyTable where (... complex where clause here...)
2. Select Count(1) as Count1 from MyTable where (... where clause here ...)
3. etc..
Environment:
Cloudera CDH 6.2
... View more
Labels:
04-15-2020
12:27 PM
Thanks @pauldefusco I would like to do it in spark - scala
... View more
04-15-2020
11:30 AM
I have a source table Like ID USER DEPT 1 User1 Admin 2 User1 Accounts 3 User2 Finance 4 User3 Sales 5 User3 Finance I want to generate a DataFrame like this ID USER DEPARTMENT 1 User1 Admin,Accounts 2 User2 Finance 3 User3 Sales,Finance
... View more
Labels:
- Labels:
-
Apache Spark
01-18-2020
10:39 PM
What is the most efficient way to get count of records meeting different search criteria from a Hive table.
1. count all records where column-a = Null
2. count all records where column-b in [1, 3, 5]
3. count all records where column-c = 'xxx'
etc.
there are a couple hundred of these counts, in groups of 3 or 4.
... View more
Labels:
- Labels:
-
Apache Hive