Member since
03-04-2021
5
Posts
2
Kudos Received
0
Solutions
04-07-2021
12:31 AM
Hi @Magudeswaran , I can rewrite the query using sub-queries, but I have the same error: select count(*) as num_all_changes from ( select s.cod_pers, count(*) as num_changes from ( select t.cod_pers, t.cod_address, count(*) as num_address from be_prd_prt.test_case as t group by t.cod_pers, t.cod_address ) as s group by s.cod_pers having count(*)>1 ) as gg ; +------------------+ | num_all_changes | +------------------+ | 63 | | 58 | | 64 | | 59 | +------------------+ 4 rows selected (18.077 seconds) As you can see, always 4 rows...
... View more
03-31-2021
07:59 AM
1 Kudo
The "msck repair table ..." command does not really read new data files, but adds new partitions (subdirectories in HDFS) in table metadata. What you could do is to create in advance all the partitions (for month or more) - initially empty- and run the "repair" command just once: hdfs dfs -mkdir /user/data/year=2021/month=04/day=1 ... hdfs dfs -mkdir /user/data/year=2021/month=04/day=30 hive>msck repair table <table_name> When You put your log files inside one of these directories, they will be immediately visible from Hive (just set the correct permissions using Ranger or hdfs). Maybe You can repeat this operations (create directories and "repair table") during logs maintenance, as you should have some policies to remove old logs Hope this helps
... View more