Support Questions
Find answers, ask questions, and share your expertise

Table location - CREATE TABLE vs. ALTER TABLE...SET LOCATION

Table location - CREATE TABLE vs. ALTER TABLE...SET LOCATION

Hi,

When creating Hive table with LOCATION clause, the location is transformed to the full URI:

hive> create table a1 (i int) location 'hdfs:/tmp/a1';

Describing the table, the location is:

hdfs://sandbox.hortonworks.com:8020/tmp/a1

But, when changing the location with SET LOCATION command:

hive> alter table a1 set location 'hdfs:/tmp/a2';

The location remain as typed in the command, i.e. hdfs:/tmp/a2

Now, running SELECT with ORDER BY fails:

hive> select * from a1 order by i;

Query ID = root_20170604162926_0ef4e586-0eae-4db6-9b4f-0bc35a993187 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> java.io.IOException: cannot find dir = hdfs://sandbox.hortonworks.com:8020/tmp/a2/000000_0_copy_1 in pathToPartitionInfo: [hdfs:/tmp/a2] at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:352) at org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getPartitionDescFromPathRecursively(HiveFileFormatUtils.java:314) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat$CombineHiveInputSplit.<init>(CombineHiveInputFormat.java:161) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getCombineSplits(CombineHiveInputFormat.java:490) at org.apache.hadoop.hive.ql.io.CombineHiveInputFormat.getSplits(CombineHiveInputFormat.java:571) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:328) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290) at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575) at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561) at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:403) at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:140) at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160) at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:89) at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1748) at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1494) at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1291) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1158) at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1148) at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:217) at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:169) at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:380) at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:740) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hadoop.util.RunJar.run(RunJar.java:233) at org.apache.hadoop.util.RunJar.main(RunJar.java:148) Job Submission failed with exception 'java.io.IOException(cannot find dir = hdfs://sandbox.hortonworks.com:8020/tmp/a2/000000_0_copy_1 in pathToPartitionInfo: [hdfs:/tmp/a2])' FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. cannot find dir = hdfs://sandbox.hortonworks.com:8020/tmp/a2/000000_0_copy_1 in pathToPartitionInfo: [hdfs:/tmp/a2]

What is the recommended way to enter the location in ALTER TABLE?

The full location is not always available, e.g. when using Knox or HDFS high availability.

Thanks

8 REPLIES 8

Re: Table location - CREATE TABLE vs. ALTER TABLE...SET LOCATION

When using ALTER TABLE ... SET LOCATION the target directory must already exist (your syntax above should be fine). Please try creating the HDFS directory first and then running the ALTER TABLE command.

Re: Table location - CREATE TABLE vs. ALTER TABLE...SET LOCATION

The directory already exists when I run the alter table.

BTW, I can run INSERT, it will create new file in the directory.

Also, SELECT without ORDER BY works.

Re: Table location - CREATE TABLE vs. ALTER TABLE...SET LOCATION

Expert Contributor

@ozac

When you alter the location for hive table, it will not move the data from old location to new location (if any data present). It will only load the data from the new location.

The error shows that the required data file is not present. Make sure that the data file is also present in the new location.

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. cannot find dir =hdfs://sandbox.hortonworks.com:8020/tmp/a2/000000_0_copy_1 in pathToPartitionInfo: [hdfs:/tmp/a2]

Re: Table location - CREATE TABLE vs. ALTER TABLE...SET LOCATION

Hi, I renamed the HDFS directory before running the "alter table" command, so a directory with the new name exists and it has files. The problem is that given the same location string (e.g. 'hdfs:/tmp/a1'), CREATE TABLE and ALTER TABLE behave differently. - The create table expands the location to full URI (e.g. 'hdfs://sandbox.hortonworks.com:8020/tmp/a1'), and everything works fine. - The alter table does not expand the location. Now, with the short location, some operations are failing. For example, I can run INSERT and SELECT commands, but SELECT with ORDER BY fails with the above error. Also, if I use Hue, navigate to a table that was altered and click on "View File Location" - it fails.

So the short location works for some operations and doesn't work for others. Thanks

Re: Table location - CREATE TABLE vs. ALTER TABLE...SET LOCATION

Expert Contributor

@ozac

Can you try without renaming the directory from a1 to a2? Create a separate directory with name as a2 and then alter the table.

Re: Table location - CREATE TABLE vs. ALTER TABLE...SET LOCATION

I'm getting the same results as before.

I used an empty directory for alter table. I can do insert and select, but not select order by.

Re: Table location - CREATE TABLE vs. ALTER TABLE...SET LOCATION

New Contributor

I' am facing exactly the same problem. Changing location (ALTER TABLE ... SET LOCATION) to an exisiting (renamed directory) allows me to run "SELECT * FROM [table]" but attaching an "ORDER BY" fails with "cannot find dir" (location pointing to a file, not directory). Dropping the external table and creating with the new location works fine.

Any hint on this would be highly appreciated, if anyone already found the cause or a solution/ workaround .

Re: Table location - CREATE TABLE vs. ALTER TABLE...SET LOCATION

New Contributor

Adding the clustername in the ALTER table statement seems to solve it in my case:

So create table seems to work without cluster name, but ALTER TABLE requires the clustername in the SET LOCATION.