Support Questions

cjervis · ‎01-23-2015

Good morning. After my local Hadoop User Group meeting last night, I decided to switch over from using the native "hive" shell to "beeline." I can't remember the exact reasons, but the wonderful speaker made a point of saying users needed to back way from using the native "hive" shell for some very good reasons that I've forgotten after three beers last night.

In anysense, I took the advice and fired up beeline this morning. Everything seems to be working well, but when trying to load data, I get an "Invalid Path" error. Below you can see that when not fully qualifying the file name, the working directory is set to where the properties and such are stored. That's fine.

Connected to: Apache Hive (version 0.13.1-cdh5.2.0)
Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.13.1-cdh5.2.0 by Apache Hive
0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> load data local inpath 'hourly_TEMP_2014.csv' into table temps_txt;
Error: Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ''hourly_TEMP_2014.csv'': No files matching path file:/var/run/cloudera-scm-agent/process/24-hive-HIVESERVER2/hourly_TEMP_2014.csv (state=42000,code=40000)

This is fine, I can fully qualify the file. However, even when I do that, I still get that the file is not found.

0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> load data local inpath '/home/cloud/hourly_TEMP_2014.csv' into table temps_txt;
Error: Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ''/home/cloud/hourly_TEMP_2014.csv'': No files matching path file:/home/cloud/hourly_TEMP_2014.csv (state=42000,code=40000)
0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> !quit
Closing: 0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud:10000/default
cloud@c-10-206-76-8:~> ls -l /home/cloud/* | grep TEMP
-rw-rw-r-- 1 cloud cloud 1180101268 Jan 22 21:28 /home/cloud/hourly_TEMP_2014.csv

When I issue these commands via the "hive" shell, the file location resolves fine - both relatively and fully qualified. I'm going to upgrade my small cluster to CDH 5.3.0 to see if the Hive version + backports change the behavior, but figured I'd post this to see if anyone has seen this issue with the 5.2.0 release.

Thanks for your time.

(Oh, also in the "Labels" section of this forum, there is only CDH 4.x options to choose from and it's a required field. So I selected 4.6.x, even though this is realted to CDH 5.2.0. Just thought I'd note that. Or it could be that I'm in the wrong area. Wouldn't be the first time.)

mcdonaldn · ‎01-26-2015

Just wanted to finalize this thread. I was able to successfully reference a file in HDFS for the data load. So it seems like that is a good option in replace of LOCAL. Not sure what you wanted to do about the nuance, but maybe an indicator in the 'file not found' error message indicating that beeline won't be able to accept LOCAL if the beeline client is not running on the Hive2 server? (At my level of undertanding, I'd imagine most beeline clients won't be running on the same node as the Hive2 server.)

"....file not found. NOTE: LOCAL is not supported in beeline unless the beeline client is running on the same node as the HIVESERVER2. A work around is to load the file to HDFS and then load from there." Just a thought.

Thanks for your help and prompt response.

cloud@c-192-199-76-8:~> hadoop fs -ls /tmp
Found 7 items
drwxrwxrwx - hdfs supergroup 0 2015-01-26 14:54 /tmp/.cloudera_health_monitoring_canary_files
drwxr-xr-x - cloud supergroup 0 2015-01-22 21:47 /tmp/hive-cloud
drwxrwxrwx - hive supergroup 0 2014-10-28 21:34 /tmp/hive-hive
-rw-r--r-- 3 meee meee 1180101268 2015-01-23 19:52 /tmp/hourly_TEMP_2014.csv
drwxr-xr-x - hdfs supergroup 0 2014-11-07 18:06 /tmp/input
drwxrwxrwt - mapred hadoop 0 2014-11-19 16:45 /tmp/logs
drwxr-xr-x - hdfs supergroup 0 2014-11-07 18:33 /tmp/output
cloud@c-10-206-76-8:~> beeline -u jdbc:hive2://c-192-199-76-8.int.cis.trcloud:10000/default --verbose=true -n meee
issuing: !connect jdbc:hive2://c-192-199-76-8.int.cis.trcloud:10000/default meee''
scan complete in 3ms
Connecting to jdbc:hive2://c-192-199-76-8.int.cis.trcloud:10000/default
Connected to: Apache Hive (version 0.13.1-cdh5.2.0)
Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.13.1-cdh5.2.0 by Apache Hive
0: jdbc:hive2://c-192-199-76-8.int.cis.trcloud> load data inpath '/tmp/hourly_TEMP_2014.csv' into table temps_txt;
No rows affected (0.556 seconds)
0: jdbc:hive2://c-192-199-76-8.int.cis.trcloud> select avg(degrees) from temps_txt;
+--------------------+--+
| _c0 |
+--------------------+--+
| 56.87016100866962 |
+--------------------+--+
1 row selected (77.389 seconds)

View solution in original post

szehon · ‎01-23-2015

Hi, if its a local file, then it has to be on the machine that HiveServer2 is running in, as its the process that is going to do the loading. Can you check that? Thanks.

mcdonaldn · ‎01-23-2015

Thanks Szehon. The file is not on the HiveServer2 machine. I will try to
move the file into HDFS and then load from there to see if that works
instead. Might be that the "LOCAL" option is not an option with beeline
clients not running on the HiveServer2.

mcdonaldn · ‎01-26-2015

Just wanted to finalize this thread. I was able to successfully reference a file in HDFS for the data load. So it seems like that is a good option in replace of LOCAL. Not sure what you wanted to do about the nuance, but maybe an indicator in the 'file not found' error message indicating that beeline won't be able to accept LOCAL if the beeline client is not running on the Hive2 server? (At my level of undertanding, I'd imagine most beeline clients won't be running on the same node as the Hive2 server.)

"....file not found. NOTE: LOCAL is not supported in beeline unless the beeline client is running on the same node as the HIVESERVER2. A work around is to load the file to HDFS and then load from there." Just a thought.

Thanks for your help and prompt response.

cloud@c-192-199-76-8:~> hadoop fs -ls /tmp
Found 7 items
drwxrwxrwx - hdfs supergroup 0 2015-01-26 14:54 /tmp/.cloudera_health_monitoring_canary_files
drwxr-xr-x - cloud supergroup 0 2015-01-22 21:47 /tmp/hive-cloud
drwxrwxrwx - hive supergroup 0 2014-10-28 21:34 /tmp/hive-hive
-rw-r--r-- 3 meee meee 1180101268 2015-01-23 19:52 /tmp/hourly_TEMP_2014.csv
drwxr-xr-x - hdfs supergroup 0 2014-11-07 18:06 /tmp/input
drwxrwxrwt - mapred hadoop 0 2014-11-19 16:45 /tmp/logs
drwxr-xr-x - hdfs supergroup 0 2014-11-07 18:33 /tmp/output
cloud@c-10-206-76-8:~> beeline -u jdbc:hive2://c-192-199-76-8.int.cis.trcloud:10000/default --verbose=true -n meee
issuing: !connect jdbc:hive2://c-192-199-76-8.int.cis.trcloud:10000/default meee''
scan complete in 3ms
Connecting to jdbc:hive2://c-192-199-76-8.int.cis.trcloud:10000/default
Connected to: Apache Hive (version 0.13.1-cdh5.2.0)
Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.13.1-cdh5.2.0 by Apache Hive
0: jdbc:hive2://c-192-199-76-8.int.cis.trcloud> load data inpath '/tmp/hourly_TEMP_2014.csv' into table temps_txt;
No rows affected (0.556 seconds)
0: jdbc:hive2://c-192-199-76-8.int.cis.trcloud> select avg(degrees) from temps_txt;
+--------------------+--+
| _c0 |
+--------------------+--+
| 56.87016100866962 |
+--------------------+--+
1 row selected (77.389 seconds)

srini.ramineni · ‎06-26-2015

McDonald,

I get the same error even in the Cloudera quick start VM 5.3

As you know, it is a pseudo-distributed hadoop cluster, meaning, all services including HiveServer2 are on one machine only.

So it seems more like a beeline bug, rather than the explanation you provided.

Same works with the hive CLI, but fails with beeline on Cloudera 5.3 Quick start VM

beeline

0: jdbc:hive2://localhost:10000> LOAD DATA LOCAL INPATH '/home/cloudera/datasets/ml-100k/u.data' INTO TABLE u_data;
Error: Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ''/home/cloudera/datasets/ml-100k/u.data'': No files matching path file:/home/cloudera/datasets/ml-100k/u.data (state=42000,code=40000)
0: jdbc:hive2://localhost:10000> !quit
Closing: 0: jdbc:hive2://localhost:10000

The /home/cloudera/datasets/ml-100k/u.data was owned by cloudera:cloudera. I even gave chmod o+r to this file. But still same problems.

hive

Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
hive> LOAD DATA LOCAL INPATH '/home/cloudera/datasets/ml-100k/u.data' INTO TABLE u_data;
Copying data from file:/home/cloudera/datasets/ml-100k/u.data
Copying file: file:/home/cloudera/datasets/ml-100k/u.data
Loading data to table default.u_data
Table default.u_data stats: [numFiles=1, numRows=0, totalSize=1979173, rawDataSize=0]
OK
Time taken: 3.136 seconds
hive>

-Thanks

Srini

HARUMUG · ‎07-09-2015

I have the same issue with beeline when loading local file. The file is local to HiveServer2. It works with apache and other distributions. This seems to be bug with beeline in Cloudera. Any fix available for this.

udaymannam · ‎11-19-2015

I am also seeing the same exception though Hive, Beeline...etc are all in the same machine!

HARUMUG · ‎11-19-2015

This error could be due to the execute permission missing for the file path (all directories in the path including). If permission for the complete path is set to 755, it will work then.

priyanka1_munja · ‎09-20-2016

Hi

Uploading the data from HDFS or from local works but it shows NULL value in every column instead of data. Kindly help :-

while loading the data from HDFS:-

2) while loading the data from local:-

NaveenGangam · ‎09-21-2016

Hey,

I just posted a reply to the other thread you created.

Cloudera Community

Support Questions

beeline "Invalid Path" "No files matching path file"