Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

[CDH 5.4 Upgrade] cannot select from Hive tables

avatar
Contributor

I have upgraded from CDH 5.3 to CDH 5.4.

While executing a simple select statement in Hive it is erroring out:

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

hive> select * from employees;
FAILED: SemanticException Unable to determine if hdfs://<hive-master>:8020:8020/user/hive/warehouse/employees is encrypted:
java.lang.IllegalArgumentException: Wrong FS: hdfs://<hive-master>:8020:8020/user/hive/warehouse/employees, expected: hdfs://<hive-master>:8020

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

 

1 ACCEPTED SOLUTION

avatar
Contributor
If you don't see the problem with tables created after the upgrade, it
could be because something went wrong during upgrade.

View solution in original post

13 REPLIES 13

avatar
Contributor

Port 8020 is used by NameNode.

I am not sure why the HDFS path has it twice!!!

Does it pick up the "fs.defaultFS" property from HDFS service?

 

 

avatar
Contributor
What's your setting "hive.metastore.warehouse.dir"?

avatar
Contributor

/user/hive/warehouse

avatar
Contributor
What's your setting "fs.defaultFS"? What's the output of "describe
formatted employees"?

avatar
Contributor

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://<FQN-host-name>:8020</value>
  </property>

.................................

.................................

.................................

 

hive> describe formatted employees;
OK
# col_name              data_type               comment

emp_id                  int
name                    string
salary                  double

# Detailed Table Information
Database:               default
Owner:                  dast
CreateTime:             Thu Apr 09 14:57:46 EDT 2015
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               hdfs://<host-name>:8020:8020/user/hive/warehouse/employees
Table Type:             MANAGED_TABLE
Table Parameters:
        COLUMN_STATS_ACCURATE   true
        comment                 This is the employees table
        numFiles                1
        numRows                 0
        rawDataSize             0
        totalSize               142
        transient_lastDdlTime   1428607184

# Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:            org.apache.hadoop.mapred.TextInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        field.delim             ,
        serialization.format    ,
Time taken: 0.405 seconds, Fetched: 35 row(s)

 

avatar
Contributor
For this table, you can use alter table employees set location '
hdfs://:8020/user/hive/warehouse/employees' to fix it. Do you
see the problem for other tables? If you create a new table, does it have
this problem too?

avatar
Contributor

I have just created a brand new table.

The HDFS location/path has the port 8020 ONE time!!!!

How can I revert the existing tables back to report port 8020 one time and not twice????

 

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

hive> select * from users;
OK
100      User1   passwd1
200      User2   passwd2
300      User3   passwd3
400      User4   passwd4
500      User5   passwd5
600      User6   passwd6
Time taken: 0.073 seconds, Fetched: 6 row(s)
hive> describe formatted users;
OK
# col_name              data_type               comment

user_id                 int
username                string
passwd                  string

# Detailed Table Information
Database:               default
Owner:                  dast
CreateTime:             Wed May 06 13:32:05 EDT 2015
LastAccessTime:         UNKNOWN
Protect Mode:           None
Retention:              0
Location:               hdfs://<host-name>:8020/user/hive/warehouse/users
Table Type:             MANAGED_TABLE
Table Parameters:
        COLUMN_STATS_ACCURATE   true
        comment                 This is the users table
        numFiles                1
        totalSize               121
        transient_lastDdlTime   1430933556

# Storage Information
SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat:            org.apache.hadoop.mapred.TextInputFormat
OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed:             No
Num Buckets:            -1
Bucket Columns:         []
Sort Columns:           []
Storage Desc Params:
        field.delim             ,
        serialization.format    ,
Time taken: 0.067 seconds, Fetched: 33 row(s)

<><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><><>

avatar
Contributor

All the existing Hive tables showing up the 8020 port twice in their HDFS Location!!!

What might have caused this, during the CDH 5.4 upgrade???

 

Thanks for your assistance!

 

avatar
Contributor
If you don't see the problem with tables created after the upgrade, it
could be because something went wrong during upgrade.