- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Hive INSERT-SELECT: cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow
- Labels:
-
Apache Hive
Created ‎04-04-2017 07:53 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Scenario
l've created three tables with different SerDe, the intention is to insert record from table test_7 to table test_8 and test_9.
CREATE TABLE default.test_7 (col INT); CREATE TABLE default.test_8 (col INT) STORED AS ORC tblproperties ('orc.compress'='ZLIB'); CREATE TABLE default.test_9 (col INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u001C' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
l managed to insert record from test_7 to test_8, but no test_7 to test_9 due to the error (Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow)
-- Successful INSERT INTO default.test_7 VALUES (1); -- Successful INSERT INTO default.test_8 SELECT * FROM default.test_7; -- Failed with Error: Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow INSERT INTO default.test_9 SELECT * FROM default.test_7; -- Failed -- Failed with Error: Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow INSERT INTO default.test_9 SELECT CAST(col AS INT) FROM default.test_7; -- Failed
Questions
1. What is the relationship between STORED AS and ROW FORMATS?
2. Are there any compatible/incompatible combination of SerDe and Storage Format? (i.e. Table default.test_9 use LazySimpleSerDe with OrcInputFormat/OrcOutputFormat as Storage Foramt)
Describe Table default.test_8
hive> describe extended default.test_8; OK col int Detailed Table Information Table(tableName:test_8, dbName:default, owner:SRV-TEST, createTime:1491286688, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:int, comment:null)], location:hdfs://DEVHDP/apps/hive/warehouse/test_8, inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=206, rawDataSize=4, numRows=1, COLUMN_STATS_ACCURATE=true, numFiles=1, orc.compress=ZLIB, transient_lastDdlTime=1491286709}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) Time taken: 0.336 seconds, Fetched: 3 row(s)
Describe Table default.test_9
hive> describe extended default.test_9; OK col int Detailed Table Information Table(tableName:test_9, dbName:default, owner:SRV-TEST, createTime:1491286787, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:int, comment:null)], location:hdfs://DEVHDP/apps/hive/warehouse/test_9, inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=, field.delim=}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{transient_lastDdlTime=1491286787}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE) Time taken: 0.337 seconds, Fetched: 3 row(s)
Complete Error
], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"col":1} at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139) at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172) at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168) at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"col":1} at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68) at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:328) at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150) ... 14 more Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"col":1} at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562) at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83) ... 17 more Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:81) at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:753) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88) at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838) at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:117) at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:167) at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552) ... 18 more ]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1488529549431_0188_8_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
Created ‎04-07-2017 02:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Created ‎03-20-2018 11:27 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi Yong,
Were you able to resolve the exception and insert the data into table test_9?
Created ‎03-20-2018 02:15 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
The problem is with your create table statement for test_9. Just have a look at your syntax.
CREATE TABLE default.test_9 (col INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u001C' STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';
Now when you will describe this table, you will see something like
| # Storage Information | NULL | NULL | | SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL | | InputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | NULL | | OutputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | NULL |
That's so not right!!! Your SerDe library is LazySimpleSerde and your Input Format and Output Format are ORC. Totally not gonna work!
Now let's say you tweak your CREATE TABLE STATEMENT to look something like
CREATE TABLE default.test_9 (col INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u001C' STORED AS ORC;
A describe formatted table statement will show the storage information as
| # Storage Information | NULL | NULL | | SerDe Library: | org.apache.hadoop.hive.ql.io.orc.OrcSerde | NULL | | InputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat | NULL | | OutputFormat: | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | NULL |
And now if you try to write data into test_9 from anywhere, you would be able to.
Hope that helps!
