Good morning. After my local Hadoop User Group meeting last night, I decided to switch over from using the native "hive" shell to "beeline." I can't remember the exact reasons, but the wonderful speaker made a point of saying users needed to back way from using the native "hive" shell for some very good reasons that I've forgotten after three beers last night.
In anysense, I took the advice and fired up beeline this morning. Everything seems to be working well, but when trying to load data, I get an "Invalid Path" error. Below you can see that when not fully qualifying the file name, the working directory is set to where the properties and such are stored. That's fine.
Connected to: Apache Hive (version 0.13.1-cdh5.2.0)
Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.13.1-cdh5.2.0 by Apache Hive
0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> load data local inpath 'hourly_TEMP_2014.csv' into table temps_txt;
Error: Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ''hourly_TEMP_2014.csv'': No files matching path file:/var/run/cloudera-scm-agent/process/24-hive-HIVESERVER2/hourly_TEMP_2014.csv (state=42000,code=40000)
This is fine, I can fully qualify the file. However, even when I do that, I still get that the file is not found.
0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> load data local inpath '/home/cloud/hourly_TEMP_2014.csv' into table temps_txt;
Error: Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ''/home/cloud/hourly_TEMP_2014.csv'': No files matching path file:/home/cloud/hourly_TEMP_2014.csv (state=42000,code=40000)
0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud> !quit
Closing: 0: jdbc:hive2://c-10-206-76-8.int.cis.trcloud:10000/default
cloud@c-10-206-76-8:~> ls -l /home/cloud/* | grep TEMP
-rw-rw-r-- 1 cloud cloud 1180101268 Jan 22 21:28 /home/cloud/hourly_TEMP_2014.csv
When I issue these commands via the "hive" shell, the file location resolves fine - both relatively and fully qualified. I'm going to upgrade my small cluster to CDH 5.3.0 to see if the Hive version + backports change the behavior, but figured I'd post this to see if anyone has seen this issue with the 5.2.0 release.
Thanks for your time.
(Oh, also in the "Labels" section of this forum, there is only CDH 4.x options to choose from and it's a required field. So I selected 4.6.x, even though this is realted to CDH 5.2.0. Just thought I'd note that. Or it could be that I'm in the wrong area. Wouldn't be the first time.)
Created 01-26-2015 07:56 AM
Just wanted to finalize this thread. I was able to successfully reference a file in HDFS for the data load. So it seems like that is a good option in replace of LOCAL. Not sure what you wanted to do about the nuance, but maybe an indicator in the 'file not found' error message indicating that beeline won't be able to accept LOCAL if the beeline client is not running on the Hive2 server? (At my level of undertanding, I'd imagine most beeline clients won't be running on the same node as the Hive2 server.)
"....file not found. NOTE: LOCAL is not supported in beeline unless the beeline client is running on the same node as the HIVESERVER2. A work around is to load the file to HDFS and then load from there." Just a thought.
Thanks for your help and prompt response.
cloud@c-192-199-76-8:~> hadoop fs -ls /tmp
Found 7 items
drwxrwxrwx - hdfs supergroup 0 2015-01-26 14:54 /tmp/.cloudera_health_monitoring_canary_files
drwxr-xr-x - cloud supergroup 0 2015-01-22 21:47 /tmp/hive-cloud
drwxrwxrwx - hive supergroup 0 2014-10-28 21:34 /tmp/hive-hive
-rw-r--r-- 3 meee meee 1180101268 2015-01-23 19:52 /tmp/hourly_TEMP_2014.csv
drwxr-xr-x - hdfs supergroup 0 2014-11-07 18:06 /tmp/input
drwxrwxrwt - mapred hadoop 0 2014-11-19 16:45 /tmp/logs
drwxr-xr-x - hdfs supergroup 0 2014-11-07 18:33 /tmp/output
cloud@c-10-206-76-8:~> beeline -u jdbc:hive2://c-192-199-76-8.int.cis.trcloud:10000/default --verbose=true -n meee
issuing: !connect jdbc:hive2://c-192-199-76-8.int.cis.trcloud:10000/default meee''
scan complete in 3ms
Connecting to jdbc:hive2://c-192-199-76-8.int.cis.trcloud:10000/default
Connected to: Apache Hive (version 0.13.1-cdh5.2.0)
Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.13.1-cdh5.2.0 by Apache Hive
0: jdbc:hive2://c-192-199-76-8.int.cis.trcloud> load data inpath '/tmp/hourly_TEMP_2014.csv' into table temps_txt;
No rows affected (0.556 seconds)
0: jdbc:hive2://c-192-199-76-8.int.cis.trcloud> select avg(degrees) from temps_txt;
+--------------------+--+
| _c0 |
+--------------------+--+
| 56.87016100866962 |
+--------------------+--+
1 row selected (77.389 seconds)
Created 01-23-2015 12:04 PM
Hi, if its a local file, then it has to be on the machine that HiveServer2 is running in, as its the process that is going to do the loading. Can you check that? Thanks.
Created 01-23-2015 01:06 PM
Created 01-26-2015 07:56 AM
Just wanted to finalize this thread. I was able to successfully reference a file in HDFS for the data load. So it seems like that is a good option in replace of LOCAL. Not sure what you wanted to do about the nuance, but maybe an indicator in the 'file not found' error message indicating that beeline won't be able to accept LOCAL if the beeline client is not running on the Hive2 server? (At my level of undertanding, I'd imagine most beeline clients won't be running on the same node as the Hive2 server.)
"....file not found. NOTE: LOCAL is not supported in beeline unless the beeline client is running on the same node as the HIVESERVER2. A work around is to load the file to HDFS and then load from there." Just a thought.
Thanks for your help and prompt response.
cloud@c-192-199-76-8:~> hadoop fs -ls /tmp
Found 7 items
drwxrwxrwx - hdfs supergroup 0 2015-01-26 14:54 /tmp/.cloudera_health_monitoring_canary_files
drwxr-xr-x - cloud supergroup 0 2015-01-22 21:47 /tmp/hive-cloud
drwxrwxrwx - hive supergroup 0 2014-10-28 21:34 /tmp/hive-hive
-rw-r--r-- 3 meee meee 1180101268 2015-01-23 19:52 /tmp/hourly_TEMP_2014.csv
drwxr-xr-x - hdfs supergroup 0 2014-11-07 18:06 /tmp/input
drwxrwxrwt - mapred hadoop 0 2014-11-19 16:45 /tmp/logs
drwxr-xr-x - hdfs supergroup 0 2014-11-07 18:33 /tmp/output
cloud@c-10-206-76-8:~> beeline -u jdbc:hive2://c-192-199-76-8.int.cis.trcloud:10000/default --verbose=true -n meee
issuing: !connect jdbc:hive2://c-192-199-76-8.int.cis.trcloud:10000/default meee''
scan complete in 3ms
Connecting to jdbc:hive2://c-192-199-76-8.int.cis.trcloud:10000/default
Connected to: Apache Hive (version 0.13.1-cdh5.2.0)
Driver: Hive JDBC (version 0.13.1-cdh5.2.0)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 0.13.1-cdh5.2.0 by Apache Hive
0: jdbc:hive2://c-192-199-76-8.int.cis.trcloud> load data inpath '/tmp/hourly_TEMP_2014.csv' into table temps_txt;
No rows affected (0.556 seconds)
0: jdbc:hive2://c-192-199-76-8.int.cis.trcloud> select avg(degrees) from temps_txt;
+--------------------+--+
| _c0 |
+--------------------+--+
| 56.87016100866962 |
+--------------------+--+
1 row selected (77.389 seconds)
Created 06-26-2015 09:25 AM
McDonald,
I get the same error even in the Cloudera quick start VM 5.3
As you know, it is a pseudo-distributed hadoop cluster, meaning, all services including HiveServer2 are on one machine only.
So it seems more like a beeline bug, rather than the explanation you provided.
Same works with the hive CLI, but fails with beeline on Cloudera 5.3 Quick start VM
beeline
0: jdbc:hive2://localhost:10000> LOAD DATA LOCAL INPATH '/home/cloudera/datasets/ml-100k/u.data' INTO TABLE u_data;
Error: Error while compiling statement: FAILED: SemanticException Line 1:23 Invalid path ''/home/cloudera/datasets/ml-100k/u.data'': No files matching path file:/home/cloudera/datasets/ml-100k/u.data (state=42000,code=40000)
0: jdbc:hive2://localhost:10000> !quit
Closing: 0: jdbc:hive2://localhost:10000
The /home/cloudera/datasets/ml-100k/u.data was owned by cloudera:cloudera. I even gave chmod o+r to this file. But still same problems.
hive
Logging initialized using configuration in file:/etc/hive/conf.dist/hive-log4j.properties
hive> LOAD DATA LOCAL INPATH '/home/cloudera/datasets/ml-100k/u.data' INTO TABLE u_data;
Copying data from file:/home/cloudera/datasets/ml-100k/u.data
Copying file: file:/home/cloudera/datasets/ml-100k/u.data
Loading data to table default.u_data
Table default.u_data stats: [numFiles=1, numRows=0, totalSize=1979173, rawDataSize=0]
OK
Time taken: 3.136 seconds
hive>
-Thanks
Srini
Created 07-09-2015 11:05 AM
I have the same issue with beeline when loading local file. The file is local to HiveServer2. It works with apache and other distributions. This seems to be bug with beeline in Cloudera. Any fix available for this.
Created 11-19-2015 04:09 AM
I am also seeing the same exception though Hive, Beeline...etc are all in the same machine!
Created 11-19-2015 11:26 AM
Created 09-20-2016 05:51 PM
Hi
Uploading the data from HDFS or from local works but it shows NULL value in every column instead of data. Kindly help :-
while loading the data from HDFS:-
0: jdbc:hive2://localhost:10000> create table transaction(sr int,id int,amount int,product string,city string,date string);
No rows affected (0.114 seconds)
0: jdbc:hive2://localhost:10000> show tables;
+---------------+--+
| tab_name |
+---------------+--+
| transaction |
| transaction1 |
+---------------+--+
2 rows selected (0.046 seconds)
0: jdbc:hive2://localhost:10000> load data inpath '/priyanka/txn' into table transaction;
No rows affected (0.568 seconds)
0: jdbc:hive2://localhost:10000> select * from transaction;
+-----------------+-----------------+---------------------+----------------------+-------------------+-------------------+--+
| transaction.sr | transaction.id | transaction.amount | transaction.product | transaction.city | transaction.date |
+-----------------+-----------------+---------------------+----------------------+-------------------+-------------------+--+
| NULL | NULL | NULL | NULL | NULL | NULL |
| NULL | NULL | NULL | NULL | NULL | NULL |
| NULL | NULL | NULL | NULL | NULL | NULL |
| NULL | NULL | NULL | NULL | NULL | NULL |
| NULL | NULL | NULL | NULL | NULL | NULL |
| NULL | NULL | NULL | NULL | NULL | NULL |
| NULL | NULL | NULL | NULL | NULL | NULL |
+-----------------+-----------------+---------------------+----------------------+-------------------+-------------------+--+
7 rows selected (0.154 seconds)
2) while loading the data from local:-
0: jdbc:hive2://localhost:10000> LOAD DATA LOCAL INPATH 'home/cloudera/txn' INTO table transaction1;
No rows affected (2.394 seconds)
0: jdbc:hive2://localhost:10000> select * from transaction1;
+------------------+------------------+----------------------+-----------------------+--------------------+--------------------+--+
| transaction1.sr | transaction1.id | transaction1.amount | transaction1.product | transaction1.city | transaction1.date |
+------------------+------------------+----------------------+-----------------------+--------------------+--------------------+--+
| NULL | NULL | NULL | NULL | NULL | NULL |
| NULL | NULL | NULL | NULL | NULL | NULL |
| NULL | NULL | NULL | NULL | NULL | NULL |
| NULL | NULL | NULL | NULL | NULL | NULL |
| NULL | NULL | NULL | NULL | NULL | NULL |
| NULL | NULL | NULL | NULL | NULL | NULL |
| NULL | NULL | NULL | NULL | NULL | NULL |
+------------------+------------------+----------------------+-----------------------+--------------------+--------------------+--+
7 rows selected (1.279 seconds)
Created 09-21-2016 01:47 PM
Hey,
I just posted a reply to the other thread you created.