Support Questions

Find answers, ask questions, and share your expertise

beeline "Invalid Path" "No files matching path file"

avatar
Explorer

Good morning.  After my local Hadoop User Group meeting last night, I decided to switch over from using the native "hive" shell to "beeline."  I can't remember the exact reasons, but the wonderful speaker made a point of saying users needed to back way from using the native "hive" shell for some very good reasons that I've forgotten after three beers last night.

 

In anysense, I took the advice and fired up beeline this morning.  Everything seems to be working well, but when trying to load data, I get an "Invalid Path" error.  Below you can see that when not fully qualifying the file name, the working directory is set to where the properties and such are stored.  That's fine.

 

Connected to: Apache Hive (version 0.13.1-cdh5.2.0)
Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.13.1-cdh5.2.0 by Apache Hive
0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> load data local inpath 'hourly_TEMP_2014.csv' into table temps_txt;
Error: Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ''hourly_TEMP_2014.csv'': No files matching path file:/var/run/cloudera-scm-agent/process/24-hive-HIVESERVER2/hourly_TEMP_2014.csv (state=42000,code=40000)

 

This is fine, I can fully qualify the file.  However, even when I do that, I still get that the file is not found.

 

0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> load data local inpath '/home/cloud/hourly_TEMP_2014.csv' into table temps_txt;
Error: Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ''/home/cloud/hourly_TEMP_2014.csv'': No files matching path file:/home/cloud/hourly_TEMP_2014.csv (state=42000,code=40000)
0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> !quit
Closing: 0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud:10000/default
cloud@c-10-206-76-8:~> ls -l /home/cloud/* | grep TEMP
-rw-rw-r-- 1 cloud cloud 1180101268 Jan 22 21:28 /home/cloud/hourly_TEMP_2014.csv

 

When I issue these commands via the "hive" shell, the file location resolves fine - both relatively and fully qualified.  I'm going to upgrade my small cluster to CDH 5.3.0 to see if the Hive version + backports change the behavior, but figured I'd post this to see if anyone has seen this issue with the 5.2.0 release.

 

Thanks for your time.

 

(Oh, also in the "Labels" section of this forum, there is only CDH 4.x options to choose from and it's a required field.  So I selected 4.6.x, even though this is realted to CDH 5.2.0.  Just thought I'd note that.  Or it could be that I'm in the wrong area.  Wouldn't be the first time.)

 

1 ACCEPTED SOLUTION

avatar
Explorer

Just wanted to finalize this thread.  I was able to successfully reference a file in HDFS for the data load.  So it seems like that is a good option in replace of LOCAL.  Not sure what you wanted to do about the nuance, but maybe an indicator in the 'file not found' error message indicating that beeline won't be able to accept LOCAL if the beeline client is not running on the Hive2 server?  (At my level of undertanding, I'd imagine most beeline clients won't be running on the same node as the Hive2 server.)

 

"....file not found.  NOTE:  LOCAL is not supported in beeline unless the beeline client is running on the same node as the HIVESERVER2.  A work around is to load the file to HDFS and then load from there."  Just a thought.

 

Thanks for your help and prompt response.

 

cloud@c-192-199-76-8:~> hadoop fs -ls /tmp
Found 7 items
drwxrwxrwx - hdfs supergroup 0 2015-01-26 14:54 /tmp/.cloudera_health_monitoring_canary_files
drwxr-xr-x - cloud supergroup 0 2015-01-22 21:47 /tmp/hive-cloud
drwxrwxrwx - hive supergroup 0 2014-10-28 21:34 /tmp/hive-hive
-rw-r--r-- 3 meee meee 1180101268 2015-01-23 19:52 /tmp/hourly_TEMP_2014.csv
drwxr-xr-x - hdfs supergroup 0 2014-11-07 18:06 /tmp/input
drwxrwxrwt - mapred hadoop 0 2014-11-19 16:45 /tmp/logs
drwxr-xr-x - hdfs supergroup 0 2014-11-07 18:33 /tmp/output
cloud@c-10-206-76-8:~> beeline -u jdbc:hive2://c-192-199-76-8.int.cis.trcloud:10000/default --verbose=true -n meee
issuing: !connect jdbc:hive2://c-192-199-76-8.int.cis.trcloud:10000/default meee''
scan complete in 3ms
Connecting to jdbc:hive2://c-192-199-76-8.int.cis.trcloud:10000/default
Connected to: Apache Hive (version 0.13.1-cdh5.2.0)
Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.13.1-cdh5.2.0 by Apache Hive
0: jdbc:hive2://c-192-199-76-8.int.cis.trcloud> load data inpath '/tmp/hourly_TEMP_2014.csv' into table temps_txt;
No rows affected (0.556 seconds)
0: jdbc:hive2://c-192-199-76-8.int.cis.trcloud> select avg(degrees) from temps_txt;
+--------------------+--+
| _c0 |
+--------------------+--+
| 56.87016100866962 |
+--------------------+--+
1 row selected (77.389 seconds)

View solution in original post

12 REPLIES 12

avatar
Thanks

avatar
Rising Star
replied to the other thread regarding the Qs you posted. Hope this helps. Lets move this conversation to the other thread. Please accept the solution to close this thread. Thanks

avatar
Expert Contributor

This (NULL issue) can be caused by data schema mismatch.