Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

CREATE TABLE AS SELECT returns error 'Failed to open HDFS file for writing'

avatar
Explorer

I created a Cloudera cluster for Impala.

  • Cloudera version: Cloudera Express 5.8.1
  • Impala version: 2.6.0-cdh5.8.0 RELEASE

If I run the following command via impala-shell:

create table test as select 1;


The following error is returned:

WARNINGS: Failed to open HDFS file for writing: hdfs://[DNhostname]:8020/user/hive/warehouse/test/_impala_insert_staging/[...]/[...].0.
Error(255): Unknown error 255

 

However, if I run:

create table test (testcol int);
insert into test select 1;


...The table is created and data is inserted without a hitch.

 


Any ideas on why the first statement might fail while the second set of commands would succeed, and what I could do to fix it? I might have messed something up with directory permissions, either locally or on HDFS, however I've set 'dfs.permissions' to false to turn off HDFS permissions. I don't know what to check on the local folders to ensure the correct user(s) have the right permissions. In either case, I don't know why the permissions would cause the 'CREATE TABLE AS SELECT' statement to fail but not the 'CREATE... INSERT'.

 


I should also mention that 'DNhostname' is the hostname of the HDFS datanode/impala daemon that I'm SSHed into, not the hostname of the namenode. This worries me because 'DNhostname' was originally where my namenode was located; I moved it to a different host for reasons outside the scope of this question. Is it possible that 'CREATE TABLE AS SELECT' is still expecting the namenode to be 'DNhostname' for some reason?

1 ACCEPTED SOLUTION

avatar
Master Collaborator

@ski309

 

Was the below action performed after moving the namenode?

 

https://www.cloudera.com/documentation/enterprise/5-7-x/topics/admin_nn_migrate_roles.html#concept_f...

 

The HiveMetastoreDatabase maintains the location of tables and databases. So once after moving the namenode, it is necessary to perform the above step to update the locations in HMS.

View solution in original post

14 REPLIES 14

avatar
Champion

@ski309

 

This is nothing to do with Impala, If I am correct, The query "create table test as select 1" will not work in any DB (at least 95% of DB), because the query "select 1"  will return the data & column name as '1'. But this is not valid column name

 

create table test (1 int); --This is invalid column name

 

 

Also I put the data type 'int' on my own, but "select 1" will not return any datatype. As everyone know, "Column name" and "data type" are mandatory to create any table. But "Select 1" will neither return valid ColumnName nor datatype

 

But the below query will work, because it will get the column name and datatype from the base table

create table db.table2 as select * from db.table1

 

 

Hope this will help you!!

 

Thanks

Kumar

avatar
Explorer
Thanks for the quick reply. The query 'create table test as select 1' works in a second Impala cluster I have in place, which uses the same Cloudera and Impala versions. It appears the unnamed column in the create statement is automatically named '_c0'. The query 'create table db.table2 as select * from db.table1' fails in the same way in my problem cluster but works correctly in my second, working cluster.

avatar
Explorer

After messing around a little more, I found something strange.

 

If I switch to a different database before running the 'create table as select' statement, the table is created and populated without error:

 

create database testdb;
use testdb; create table test as select 1;

 

avatar
Champion
This may be a silly question, but does the test table exist prior to running the CTAS statement?

avatar
Explorer

I think I've found the problem but I'm unsure how to best fix it.

 

If I run:

desc database default;

The 'location' property shows the hdfs: url to the datanode and not the namenode.  My second, working cluster's "location" property shows the url to the namenode.

 

The problem is there doesn't appear to be an ALTER DATABASE statement.  How can I easily change the location property of the default database? The database already has many internal and external tables; I can try to back them up and recreate the default database from scratch, but if there's a way to modify the location property that would be great.

avatar
Champion
Ah that will do it as all new tables inherent the DB path unless specified in the Create table statement. There is no way to alter it through HIve/Impala. You will need to log into the metastore DB and change it there.

You can find it in the <metastore_db_name>.DBS and I believe the column is just called LOCATION. Find the id for the default DB and run something like 'update DBS set LOCATION = 'hdfs://NN_URI:8020/user/hive/warehouse' where id = <default_db_id>;'

avatar

Are you running with HDFS HA (i.e. a setup with an active and a standby NN)? It is a known issue that Impala CTAS may not work with HDFS HA. As a workaround, CREATE+INSERT should work.

avatar
Explorer
@alex.behm

No, I'm not. I just missed a final step when I moved our primary NN, which @venkatsambath pointed out.

avatar
Thanks!