Reply
New Contributor
Posts: 3
Registered: ‎11-22-2014

WARNINGS: Backend 0:Unknown disk id. This will negatively affect performance

Hi all,
When I use both hive and impala to query 10000 lines data from one table which has 80000 lines. The result is like following:

1) First query using Hive:

9995 name9995
9996 name9996
9997 name9997
9998 name9998
9999 name9999
Time taken: 1.242 seconds, Fetched: 10000 row(s)
hive>

2) Second query using Hive:

9995 name9995
9996 name9996
9997 name9997
9998 name9998
9999 name9999
Time taken: 0.092 seconds, Fetched: 10000 row(s)

3) First query using impala:

| 9995 | name9995 |
| 9996 | name9996 |
| 9997 | name9997 |
| 9998 | name9998 |
| 9999 | name9999 |
+------+----------+
WARNINGS: Backend 0:Unknown disk id. This will negatively affect performance. Check your hdfs settings to enable block location metadata.

Fetched 10000 row(s) in 2.45s

4) Second query using impala:

| 9995 | name9995 |
| 9996 | name9996 |
| 9997 | name9997 |
| 9998 | name9998 |
| 9999 | name9999 |
+------+----------+
WARNINGS: Backend 0:Unknown disk id. This will negatively affect performance. Check your hdfs settings to enable block location metadata.

Fetched 10000 row(s) in 2.45s

I noted the warnings, and i have already set the value of dfs.datanode.hdfs-blocks-metadata.enabled true.
This is the part of the configuration pasted from page https://192.168.10.11:25000/hadoop-varz
dfs.datanode.hdfs-blocks-metadata.enabled true
Sorry for that I cann't adjust the format of the form. I don't know why impala still reports this warnings when my hdfs settings to enable block location metadata
By the way, number of machines of impala cluster is 4, machine whose ip is 192.168.10.11 runs statestore, impalad, catalogd, machine whose ip are from 192.168.10.12-192.168.10.14 run impalad service. And I use the way of building tar ball of impala-2.0.0-cdh5.2.0 which is downloaded from github. Also I replace the cdh hadoop using apache hadoop.
Do I use impala rightly?

Regards

Cloudera Employee
Posts: 25
Registered: ‎11-12-2014

Re: WARNINGS: Backend 0:Unknown disk id. This will negatively affect performance

Hi, 

 

Have you computed stats for the table you're querying. Also, make sure you refresh table's metadata. Can you also post the query that you're running?

 

Thanks

Dimitris

Explorer
Posts: 12
Registered: ‎01-24-2017

Re: WARNINGS: Backend 0:Unknown disk id. This will negatively affect performance

[ Edited ]

We perform "compute stats" fairly regularly but still get this message. Besides, don't seems a very relevant messaged for dated stats. :p

 

Has this jira been resolved?

 

https://issues.apache.org/jira/browse/IMPALA-1427

 

Need to confirm if this not a "rogue" message. We have impala daemon versions.

 

[root@hostname~]# impalad --version
impalad version 2.5.0-cdh5.7.2 RELEASE (build 1140f8289dc0d2b1517bcf70454bb4575eb8cc70)
Built on Fri, 22 Jul 2016 12:30:57 PST

 

[root@hostname~]# catalogd --version
catalogd version 2.5.0-cdh5.7.2 RELEASE (build 1140f8289dc0d2b1517bcf70454bb4575eb8cc70)
Built on Fri, 22 Jul 2016 12:30:57 PST

Cloudera Employee
Posts: 268
Registered: ‎07-29-2015

Re: WARNINGS: Backend 0:Unknown disk id. This will negatively affect performance

Yes that JIRA has been resolved.

 

Another common cause is if HDFS rebalances data and moves blocks around. In that case you'd need to "refresh" the table (or partitions of the table) to pick up the new block locations

Cloudera Employee
Posts: 19
Registered: ‎02-20-2015

Re: WARNINGS: Backend 0:Unknown disk id. This will negatively affect performance

@BellRizz Adding to Tim's comment, we have seen this warning pop up if there are some issues with Catalog server connecting to datanodes to get block locations (either DNs are busy/not -responding for some reason etc.). This usually goes away when the load on DN is low. Might be worth checking this scenario as well (Usually something is logged into the Catalog server logs around the time this error occurs).

 

The later versions of Impala (shipped with CDH 5.12 and later) has a new way of fetching these block locations and has a lower likelyhood of this warning. 

Explorer
Posts: 12
Registered: ‎01-24-2017

Re: WARNINGS: Backend 0:Unknown disk id. This will negatively affect performance

Thanks everyone for their replies.

@Tim HDFS Rebalance process finishes under a minute. Seems that data is balanced.

 

@Bharathv re DN being busy. do you mean the data node daemon or machine?

For the coordinator host,
- JVM Heap Memory of DN process coordinator is 50% occupied.
- CPU of DN process is close to zero.
- HOST CPU is around 30%. uptime is low (as 24 core machine) 10:47:49 up 364 days, 22:34, 1 user, load average: 3.38, 5.46, 5.20
- HOST Physical Memory Used is around 130gb out of 250gb

I am inclined to rule out possibility of being busy.

 

 

 

Highlighted
Explorer
Posts: 12
Registered: ‎01-24-2017

Re: WARNINGS: Backend 0:Unknown disk id. This will negatively affect performance

[ Edited ]

Thanks everyone for their input. :-)

 

My peer, Rohit Narra, suggested that we *may* get "WARNINGS: Unknown disk id" when the stats on the paritions are NOT computed incrementally, and is reproduable even if we limit to 10 rows. However, when we compute stats incrementally, this error goes away. Stragne as it may seem, I can confirm I was able to reproduce the problem and the solution as shared below -- I masked some info for anonymity. It is a scary but really non-specific warning -- could be anything from catalogd unable to communicate impalad to some host problems. Cloudera might as well say that "something may be wrong" :p

 

Anyways, In this case, the warning was a false alarm. @Bharathv, thanks for confirming the warning has improved with CDH 5.12.x.

 

= process =
I simplified the query to generate the "WARNINGS: Unknown disk id" while querying a single partition.

 

[prodserver.ca:21000] > select dt_skey from serv_video_esd.cdn_integrated where dt_skey=20170824 and org_nm='media' and service='media' and cdn_nm='nakami' limit 3;
Query: select dt_skey from serv_video_esd.cdn_integrated where dt_skey=20170824 and org_nm='media' and service='media' and cdn_nm='nakami' limit 3
+----------+
| dt_skey |
+----------+
| 20170824 |
| 20170824 |
| 20170824 |
+----------+
WARNINGS: Unknown disk id. This will negatively affect performance. Check your hdfs settings to enable block location metadata.

 

The table is partitioned on four attributes:
| PARTITIONED BY (
| dt_skey INT,
| org_nm STRING,
| service STRING,
| cdn_nm STRING

 

Notice, the stats is "true" on this partition:
| dt_skey | org_nm | service | cdn_nm | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats | Location
| 20170824 | media | media | nakami | 257006421 | 1549 | 12.91GB | NOT CACHED | NOT CACHED | PARQUET | true | hdfs://nameservice1/user/hive/warehouse/serv_video_esd.db/cdn_integrated/dt_skey=20170824/org_nm=media/service=media/cdn_nm=nakami

 

I recomputed stats anyway.

COMPUTE INCREMENTAL STATS serv_video_esd.cdn_integrated PARTITION (dt_skey=20170824,org_nm='media',service='media',cdn_nm='nakami');
Query: compute INCREMENTAL STATS serv_video_esd.cdn_integrated PARTITION (dt_skey=20170824,org_nm='media',service='media',cdn_nm='nakami')
+------------------------------------------+
| summary |
+------------------------------------------+
| Updated 1 partition(s) and 26 column(s). |
+------------------------------------------+
Fetched 1 row(s) in 27.21s

 

[prodserver.ca:21000] > select dt_skey from serv_video_esd.cdn_integrated where dt_skey=20170824 and org_nm='media' and service='media' and cdn_nm='nakami' limit 3;
Query: select dt_skey from serv_video_esd.cdn_integrated where dt_skey=20170824 and org_nm='media' and service='media' and cdn_nm='nakami' limit 3
+----------+
| dt_skey |
+----------+
| 20170824 |
| 20170824 |
| 20170824 |
+----------+
Fetched 3 row(s) in 0.12s
<================================ "WARNINGS: Unknown disk id." disappers!!!

Announcements