Support Questions

Find answers, ask questions, and share your expertise

BlockMissingException hive query over tez or mr

avatar
Contributor

I am running an insert into table select from... query on Hive. Whether I set the execution engine to TEZ or MR I am getting BlockMissingException errors. They all look similar to this one:

Diagnostics: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-300459168-127.0.1.1-1478287363661:blk_1073741827_1003 file=/hdp/apps/2.5.0.0-1245/tez/tez.tar.gz

When I go into HDFS the files are there. They exist. So I thought maybe it is a permissions issue. But all my related proxyusers are set to hosts=* and groups=* just to try to rule this all out.

I have a 2.5 cluster hosted on Ubuntu 12.04.

Can anyone point me in a direction of what I might be missing here?

1 ACCEPTED SOLUTION

avatar
Contributor

I finally figured this out and thought it would be friendly of me to post the solution. One of those that when you finally get it you think, "Ugh, that was so obvious". One important note, if you are having trouble with Hive make sure to check the Yarn logs too!

My solution to this and so many other issues was ensuring all my nodes had all the other nodes ip addresses in their host files. This ensures Ambari picks up all the correct IPs by hostname.

I am on Ubuntu so I did the following:

$ vim /etc/hosts 

And then the file came out looking like this:

127.0.0.1       localhost
#127.0.1.1      ambarihost.com ambarihost
# Assigning static IP here so ambari gets it right
192.168.0.20    ambarihost.com ambarihost 

#Other hadoop nodes
192.168.0.21    kafkahost.com kafkahost
192.168.0.22    hdfshost.com hdfshost

View solution in original post

4 REPLIES 4

avatar
Contributor

I had permissions issues I had created here because the original issue is back. It is this one:

org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-300459168-127.0.1.1-1478287363661:blk_1073741947_1125 file=/tmp/hive/hiveuser/_tez_session_dir/34896acf-3209-4aa0-a244-5a28b5b15b92/hive-hcatalog-core.jar

So that is my real question, I guess!

avatar
Expert Contributor

Is there any HDFS Balancer process is running?...

avatar
Contributor

No, there isn't. That would have been nice, wouldn't it?

avatar
Contributor

I finally figured this out and thought it would be friendly of me to post the solution. One of those that when you finally get it you think, "Ugh, that was so obvious". One important note, if you are having trouble with Hive make sure to check the Yarn logs too!

My solution to this and so many other issues was ensuring all my nodes had all the other nodes ip addresses in their host files. This ensures Ambari picks up all the correct IPs by hostname.

I am on Ubuntu so I did the following:

$ vim /etc/hosts 

And then the file came out looking like this:

127.0.0.1       localhost
#127.0.1.1      ambarihost.com ambarihost
# Assigning static IP here so ambari gets it right
192.168.0.20    ambarihost.com ambarihost 

#Other hadoop nodes
192.168.0.21    kafkahost.com kafkahost
192.168.0.22    hdfshost.com hdfshost