Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Unable to cache full data in cache pool ?

Unable to cache full data in cache pool ?

Explorer

Hello,

 

I am trying to use HDFS cache to see some performance improvement however what I see is I am unable to use full cache pool defined.

 

I tried refresh table statement, I tried creating a new table with cache defined from start but none of it worked.
I had removed cache and assign again but no difference.

 

My table stats-

 

Query: show table stats tbl_parq_123
+-------+-------+--------+----------+--------------+-------------------+---------+-------------------+----------------------------------------------------------------------------+
| year  | #Rows | #Files | Size     | Bytes Cached | Cache Replication | Format  | Incremental stats | Location                                                                   |
+-------+-------+--------+----------+--------------+-------------------+---------+-------------------+----------------------------------------------------------------------------+
| 1990  | -1    | 2      | 338.45MB | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=1990 |
| 1993  | -1    | 6      | 1.32GB   | 0B           | 1                 | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=1993 |
| 1994  | -1    | 6      | 1.32GB   | 1010.95MB    | 1                 | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=1994 |
| 1995  | -1    | 14     | 3.24GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=1995 |
| 1996  | -1    | 14     | 3.30GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=1996 |
| 1997  | -1    | 14     | 3.30GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=1997 |
| 1998  | -1    | 27     | 6.60GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=1998 |
| 1999  | -1    | 14     | 3.30GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=1999 |
| 2000  | -1    | 14     | 3.30GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=2000 |
| 2001  | -1    | 14     | 3.30GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=2001 |
| 2002  | -1    | 23     | 5.48GB   | NOT CACHED   | NOT CACHED        | PARQUET | false             | hdfs://quickstart.cloudera:8020/user/hive/warehouse/tbl_parq_123/year=2002 |
| Total | -1    | 148    | 34.79GB  | 1010.95MB    |                   |         |                   |                                                                            |
+-------+-------+--------+----------+--------------+-------------------+---------+-------------------+----------------------------------------------------------------------------+

Pool size-

[root@quickstart ~]# hdfs cacheadmin -listPools
Found 1 result.
NAME            OWNER   GROUP  MODE             LIMIT  MAXTTL
three_gig_pool  impala  hdfs   rwxr-xr-x   3000000000   never

My method-

 

[quickstart.cloudera:21000] > alter table tbl_parq_123 set cached in 'three_gig_pool';
Query: alter table tbl_parq_123 set cached in 'three_gig_pool'
+---------------+
| summary       |
+---------------+
| Cached table. |
+---------------+
Fetched 1 row(s) in 1.98s

Sometime cached data would be 500mb, 800mb but it never crossed 1gb. Is there any parameter or something which I need to check ?

 

Thanks

 

5 REPLIES 5

Re: Unable to cache full data in cache pool ?

Expert Contributor

Hi @punshi 

How much cache space you have configured? please try this hdfs command to display the details of cache configured and used.

hdfs dfsadmin -report

Re: Unable to cache full data in cache pool ?

Explorer

Thanks @AcharkiMed 

 

That command really shows clear picture now but I still don't know why its value is set to 1gb.

Configured Cache capacity - 1gb

 

[root@quickstart ~]# sudo -u hdfs hdfs dfsadmin -report
Configured Capacity: 250717949952 (233.50 GB)
Present Capacity: 208179929088 (193.88 GB)
DFS Remaining: 88135348224 (82.08 GB)
DFS Used: 120044580864 (111.80 GB)
DFS Used%: 57.66%
Under replicated blocks: 4
Blocks with corrupt replicas: 0
Missing blocks: 0
Missing blocks (with replication factor 1): 0

-------------------------------------------------
Live datanodes (1):

Name: 172.16.8.177:50010 (quickstart.cloudera)
Hostname: quickstart.cloudera
Rack: /default
Decommission Status : Normal
Configured Capacity: 250717949952 (233.50 GB)
DFS Used: 120044580864 (111.80 GB)
Non DFS Used: 29259272192 (27.25 GB)
DFS Remaining: 88135348224 (82.08 GB)
DFS Used%: 47.88%
DFS Remaining%: 35.15%
Configured Cache Capacity: 1073741824 (1 GB)
Cache Used: 1060061184 (1010.95 MB)
Cache Remaining: 13680640 (13.05 MB)
Cache Used%: 98.73%
Cache Remaining%: 1.27%
Xceivers: 2
Last contact: Thu Jul 04 12:04:21 IST 2019

I checked hdfs-default.xml to see some parameter defining this value but couldn't find.

I saw one parameter dfs.datanode.max.locked.memory=0, but I feel its different.

Is it  automatic and depended on RAM or I can configure it. 

 

Thanks

Re: Unable to cache full data in cache pool ?

Expert Contributor

Hi @punshi 

Yes you can change it by editing this parameter:

Maximum Memory Used for Caching
dfs.datanode.max.locked.memory

 

But you need to know that data caching has moved from HDFS to memory (RAM), so you can not increase it considerably!

Re: Unable to cache full data in cache pool ?

Explorer

Thanks @AcharkiMed 

 

I am still thinking if I am providing cache pool 3gb using hdfs cacheadmin so why does it allocates only 1gb. Is it to do with RAM size? I was actually thinking the hdfs cache is using space from my hard disk and not RAM. 

Re: Unable to cache full data in cache pool ?

Expert Contributor

hi @punshi 
Try to read this to get more info about HDFS caching in Impala:
https://www.cloudera.com/documentation/enterprise/5-16-x/topics/impala_perf_hdfs_caching.html

Don't have an account?
Coming from Hortonworks? Activate your account here