Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Error loading csv file using hive

avatar
Contributor

I created a table using java client:

CREATE TABLE csvdemo (id Int, name String, email String) STORED AS PARQUET

I use the java hadoop file system to copy the csv file from local into hdfs

When I run this load command it looks successful (running from ambari):

load data inpath '/user/admin/MOCK_DATA.csv' into table csvdemo;

But when I try to read from it using:

select * from csvdemo limit 1;

I get this error:

org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.RuntimeException: hdfs://my-host:8020/apps/hive/warehouse/csvdemo/MOCK_DATA.csv is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [103, 111, 118, 10]


org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.RuntimeException: hdfs://my-host:8020/apps/hive/warehouse/csvdemo/MOCK_DATA.csv is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [103, 111, 118, 10]
	at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:264)
	at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:250)
	at org.apache.hive.jdbc.HiveQueryResultSet.next(HiveQueryResultSet.java:373)
	at org.apache.ambari.view.hive20.actor.ResultSetIterator.getNext(ResultSetIterator.java:119)
	at org.apache.ambari.view.hive20.actor.ResultSetIterator.handleMessage(ResultSetIterator.java:78)
	at org.apache.ambari.view.hive20.actor.HiveActor.onReceive(HiveActor.java:38)
	at akka.actor.UntypedActor$$anonfun$receive$1.applyOrElse(UntypedActor.scala:167)
	at akka.actor.Actor$class.aroundReceive(Actor.scala:467)
	at akka.actor.UntypedActor.aroundReceive(UntypedActor.scala:97)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
	at akka.actor.ActorCell.invoke(ActorCell.scala:487)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:238)
	at akka.dispatch.Mailbox.run(Mailbox.scala:220)
	at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:397)
	at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
Caused by: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.RuntimeException: hdfs://my-host:8020/apps/hive/warehouse/csvdemo/MOCK_DATA.csv is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [103, 111, 118, 10]
	at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:414)
	at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:233)
	at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:784)
	at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:78)
	at org.apache.hive.service.cli.session.HiveSessionProxy.access$000(HiveSessionProxy.java:36)
	at org.apache.hive.service.cli.session.HiveSessionProxy$1.run(HiveSessionProxy.java:63)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
	at org.apache.hive.service.cli.session.HiveSessionProxy.invoke(HiveSessionProxy.java:59)
	at com.sun.proxy.$Proxy29.fetchResults(Unknown Source)
	at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:520)
	at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:709)
	at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1557)
	at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1542)
	at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
	at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
	at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
	at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: java.lang.RuntimeException: hdfs://my-host:8020/apps/hive/warehouse/csvdemo/MOCK_DATA.csv is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [103, 111, 118, 10]
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:520)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:427)
	at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:146)
	at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1765)
	at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:409)
	... 24 more
Caused by: java.lang.RuntimeException: hdfs://my-host:8020/apps/hive/warehouse/csvdemo/MOCK_DATA.csv is not a Parquet file. expected magic number at tail [80, 65, 82, 49] but found [103, 111, 118, 10]
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:423)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:386)
	at org.apache.parquet.hadoop.ParquetFileReader.readFooter(ParquetFileReader.java:372)
	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.getSplit(ParquetRecordReaderWrapper.java:255)
	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:97)
	at org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.<init>(ParquetRecordReaderWrapper.java:83)
	at org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat.getRecordReader(MapredParquetInputFormat.java:71)
	at org.apache.hadoop.hive.ql.exec.FetchOperator$FetchInputFormatSplit.getRecordReader(FetchOperator.java:694)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:332)
	at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:458)
	... 28 more

1 ACCEPTED SOLUTION

avatar
Contributor

@Jay SenSharma @Shu

So - now it works. I just connected to the admin user instead of hive user and now it all works fine.

Thanks for you help.

View solution in original post

14 REPLIES 14

avatar
Master Guru

Thanks @Yair Ogen,

1.Please share the output of below command

hadoop fs -ls /apps/hive/

2.Run insert statement again and attach hiveserver2.log

hiveserver2.log

avatar
Contributor

@Shu

hadoop fs -ls /apps/hive/

Yields

Found 1 items
drwxrwxrwx   - hive hadoop          0 2017-10-02 14:00 /apps/hive/warehouse

You mean the Load statement and not the insert, right?

Attached is the error and last lines from log.


avatar
Master Mentor

@Yair Ogen

In this case we see that user "hive" is actually trying to write data inside the "" directory

Permission denied: user=hive, access=WRITE, inode="/user/admin/MOCK_DATA.csv":admin:hadoop:drwxr-xr-x

.

So either you should give write access to the "hive" user on the mentioned directory "/user/admin/" As we see that it does not have the "WRITE" permission Or you should run the job using "admin" user with the following setup:

If you want to run as "admin"

# hdfs dfs -chown admin:hadoop /user/admin
# hdfs dfs -chmod 777 /user/admin

.

Or else if you want to run the hive job using "hive" user then you should change the ownership to "hive" user and the permission accordingly.

# hdfs dfs -chown hive:hadoop /user/admin
# hdfs dfs -chmod 777 /user/admin

.

avatar
Contributor

@Jay SenSharma @Shu

So - now it works. I just connected to the admin user instead of hive user and now it all works fine.

Thanks for you help.

avatar
Master Mentor
@Yair Ogen

Good to know that it works now. It will be great if you can mark this HCC Thread as "Accepted" (Answered) so that other HCC users can quickly find the solution for this issue , instead of reading the whole thread.