Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

query very slow, does HDFS Local Bytes Read Percentage matter?

query very slow, does HDFS Local Bytes Read Percentage matter?

New Contributor

I have a very simple query:

 

create table test2 as select * from mytable F where F.scandate_key between 20160316 and 20160318

 

the column scandate_key is the partition key.

 

It's taking a very long time for above to run. The 3 partitions above only have less than 4 million rows total.

 

Here's what I noticed, 89% of data was read remote..Is this why it's slow?

 

  • HDFS Bytes Read: 862.6 MiB
  • HDFS Bytes Read From Cache: 0 B
  • HDFS Bytes Read From Cache Percentage: 0
  • HDFS Bytes Written: 456.2 GiB
  • HDFS Local Bytes Read: 96.1 MiB
  • HDFS Local Bytes Read Percentage: 11
  • HDFS Remote Bytes Read: 766.4 MiB
  • HDFS Remote Bytes Read Percentage: 89
  • HDFS Scanner Average Read Throughput: 1.1 GiB/s

 

2 REPLIES 2

Re: query very slow, does HDFS Local Bytes Read Percentage matter?

Expert Contributor

Hi tsusanto,

 

Have you been able to solve your issue? Remote reads can indeed slow down query execution. What file format is your data in?

 

Cheers, Lars

Highlighted

Re: query very slow, does HDFS Local Bytes Read Percentage matter?

New Contributor

yes I resolved it...turns out one of the column had values with up to 60K characters...we truncated most of it and it wasn't needed at all.