Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Huge number of RECORDS_OUT_OPERATOR_RS_22

avatar
Explorer

Hi, I have a SQL:

 

 

SELECT *
FROM
(
  SELECT a, b, ROW_NUMBER() OVER (PARTITON BY x, y ORDER BY create_time DESC) as rn
  FROM huge_table h
  LEFT JOIN small_table s ON h.c = s.id
  WHERE s.dt='2020-02-02'
)
WHERE rn=1

 

 

  • huge_table has 8 billion of rows, small_table has 1.5 million of rows after the dt filtering

But from the Tez Counters I see frequently:

  •  RECORDS_OUT_INTERMEDIATE_Map_1 and RS_22 RECORDS_OUT_OPERATOR_RS_27 goes to 300+ billion

Why is this happening?

 

I also see ADDITIONAL_SPILLS_BYTES_WRITTEN to be 500422346859 (~400GB), considering the total ORC files in huge_table is just ~500GB, is this weird? There are 4345 files and 5472 mappers, why does it require so much additional spills?

 

Thanks!

1 REPLY 1

avatar
Expert Contributor

Output of below to identify the exact ouptut records details, 

explain formatted <query>
explain extended <query>
explain analyze <query>