Created 10-01-2017 01:47 PM
I created a table using java client:
CREATE TABLE csvdemo (id Int, name String, email String) STORED AS PARQUET
I use the java hadoop file system to copy the csv file from local into hdfs
When I run this load command it looks successful (running from ambari):
load data inpath '/user/admin/MOCK_DATA.csv' into table csvdemo;
But when I try to read from it using:
select * from csvdemo limit 1;
I get this error:
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.RuntimeException: hdfs://my-host:8020/apps/hive/warehouse/csvdemo/MOCK_DATA.csv is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [103, 111, 118, 10] org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.RuntimeException: hdfs://my-host:8020/apps/hive/warehouse/csvdemo/MOCK_DATA.csv is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [103, 111, 118, 10] at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:264) at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:250) at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:373) at org.apache.ambari.view.hive20.actor.ResultSetIterator.getNext(ResultSetIterator.java:119) at org.apache.ambari.view.hive20.actor.ResultSetIterator.handleMessage(ResultSetIterator.java:78) at org.apache.ambari.view.hive20.actor.HiveActor.onReceive(HiveActor.java:38) at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167) at akka.actor.Actor$class.aroundReceive(Actor.scala:467) at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238) at akka.dispatch.Mailbox.run(Mailbox.scala:220) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.RuntimeException: hdfs://my-host:8020/apps/hive/warehouse/csvdemo/MOCK_DATA.csv is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [103, 111, 118, 10] at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:414) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:233) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:784) at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78) at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36) at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866) at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59) at com.sun.proxy.$Proxy29.fetchResults(Unknown Source) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:520) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:709) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1557) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1542) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: java.lang.RuntimeException: hdfs://my-host:8020/apps/hive/warehouse/csvdemo/MOCK_DATA.csv is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [103, 111, 118, 10] at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:520) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:427) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1765) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:409) ... 24 more Caused by: java.lang.RuntimeException: hdfs://my-host:8020/apps/hive/warehouse/csvdemo/MOCK_DATA.csv is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [103, 111, 118, 10] at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:423) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:386) at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:372) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:255) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:97) at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:83) at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:71) at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:694) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:332) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:458) ... 28 more
Created 10-03-2017 04:36 AM
So - now it works. I just connected to the admin user instead of hive user and now it all works fine.
Thanks for you help.
Created 10-02-2017 12:34 PM
Thanks @Yair Ogen,
1.Please share the output of below command
hadoop fs -ls /apps/hive/
2.Run insert statement again and attach hiveserver2.log
hiveserver2.log
Created 10-02-2017 01:10 PM
hadoop fs -ls /apps/hive/
Yields
Found 1 items drwxrwxrwx - hive hadoop 0 2017-10-02 14:00 /apps/hive/warehouse
You mean the Load statement and not the insert, right?
Attached is the error and last lines from log.
Created 10-03-2017 04:27 AM
In this case we see that user "hive" is actually trying to write data inside the "" directory
Permission denied: user=hive, access=WRITE, inode="/user/admin/MOCK_DATA.csv":admin:hadoop:drwxr-xr-x
.
So either you should give write access to the "hive" user on the mentioned directory "/user/admin/" As we see that it does not have the "WRITE" permission Or you should run the job using "admin" user with the following setup:
If you want to run as "admin"
# hdfs dfs -chown admin:hadoop /user/admin # hdfs dfs -chmod 777 /user/admin
.
Or else if you want to run the hive job using "hive" user then you should change the ownership to "hive" user and the permission accordingly.
# hdfs dfs -chown hive:hadoop /user/admin # hdfs dfs -chmod 777 /user/admin
.
Created 10-03-2017 04:36 AM
So - now it works. I just connected to the admin user instead of hive user and now it all works fine.
Thanks for you help.
Created 10-03-2017 04:59 AM
Good to know that it works now. It will be great if you can mark this HCC Thread as "Accepted" (Answered) so that other HCC users can quickly find the solution for this issue , instead of reading the whole thread.