Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Hive INSERT-SELECT: cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow

avatar
New Contributor

Scenario

l've created three tables with different SerDe, the intention is to insert record from table test_7 to table test_8 and test_9.

CREATE TABLE default.test_7 (col INT);
CREATE TABLE default.test_8 (col INT) STORED AS ORC tblproperties ('orc.compress'='ZLIB');
CREATE TABLE default.test_9 (col INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u001C' 
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';

l managed to insert record from test_7 to test_8, but no test_7 to test_9 due to the error (Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow)

-- Successful
INSERT INTO default.test_7 VALUES (1);
-- Successful
INSERT INTO default.test_8 SELECT * FROM default.test_7; 
-- Failed with Error: Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow


INSERT INTO default.test_9 SELECT * FROM default.test_7; -- Failed
-- Failed with Error: Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow
INSERT INTO default.test_9 SELECT CAST(col AS INT) FROM default.test_7; -- Failed

Questions

1. What is the relationship between STORED AS and ROW FORMATS?

2. Are there any compatible/incompatible combination of SerDe and Storage Format? (i.e. Table default.test_9 use LazySimpleSerDe with OrcInputFormat/OrcOutputFormat as Storage Foramt)

Describe Table default.test_8

hive> describe extended default.test_8;
OK
col                     int
Detailed Table Information      Table(tableName:test_8, dbName:default, owner:SRV-TEST, createTime:1491286688, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:int, comment:null)], location:hdfs://DEVHDP/apps/hive/warehouse/test_8, inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.ql.io.orc.OrcSerde, parameters:{serialization.format=1}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=206, rawDataSize=4, numRows=1, COLUMN_STATS_ACCURATE=true, numFiles=1, orc.compress=ZLIB, transient_lastDdlTime=1491286709}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.336 seconds, Fetched: 3 row(s)


Describe Table default.test_9

hive> describe extended default.test_9;
OK
col                     int


Detailed Table Information      Table(tableName:test_9, dbName:default, owner:SRV-TEST, createTime:1491286787, lastAccessTime:0, retention:0, sd:StorageDescriptor(cols:[FieldSchema(name:col, type:int, comment:null)], location:hdfs://DEVHDP/apps/hive/warehouse/test_9, inputFormat:org.apache.hadoop.hive.ql.io.orc.OrcInputFormat, outputFormat:org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat, compressed:false, numBuckets:-1, serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, parameters:{serialization.format=, field.delim=}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[], skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{transient_lastDdlTime=1491286787}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE)
Time taken: 0.337 seconds, Fetched: 3 row(s)


Complete Error

], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"col":1}
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:173)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:139)
        at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:344)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:181)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$1.run(TezTaskRunner.java:172)

        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:172)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(TezTaskRunner.java:168)
        at org.apache.tez.common.CallableWithNdc.call(CallableWithNdc.java:36)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"col":1}
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:91)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.pushRecord(MapRecordSource.java:68)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:328)
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(TezProcessor.java:150)
        ... 14 more
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"col":1}
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:562)
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(MapRecordSource.java:83)
        ... 17 more
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.ql.io.orc.OrcSerde$OrcSerdeRow
        at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:81)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:753)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
        at org.apache.hadoop.hive.ql.exec.SelectOperator.process(SelectOperator.java:88)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:117)
        at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:167)
        at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:552)
        ... 18 more
]], Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:0, Vertex vertex_1488529549431_0188_8_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0


3 REPLIES 3

avatar
Super Guru

@Yong Boon Lim

What did you try to accomplish creating that table stored in ORC format, but have the row stored in that format? Not related, but why would cast INT to INT too?

avatar
New Contributor

Hi Yong,

Were you able to resolve the exception and insert the data into table test_9?

avatar

@Yong Boon Lim

The problem is with your create table statement for test_9. Just have a look at your syntax.

CREATE TABLE default.test_9 (col INT) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u001C'
STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat';

Now when you will describe this table, you will see something like

| # Storage Information | NULL | NULL | 
| SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL | 
| InputFormat:   | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat    | NULL | 
| OutputFormat:  | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat   | NULL |

That's so not right!!! Your SerDe library is LazySimpleSerde and your Input Format and Output Format are ORC. Totally not gonna work!

Now let's say you tweak your CREATE TABLE STATEMENT to look something like

CREATE TABLE default.test_9 (col INT) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\u001C'
STORED AS ORC;

A describe formatted table statement will show the storage information as

| # Storage Information | NULL | NULL | 
| SerDe Library: | org.apache.hadoop.hive.ql.io.orc.OrcSerde        | NULL | 
| InputFormat:   | org.apache.hadoop.hive.ql.io.orc.OrcInputFormat  | NULL | 
| OutputFormat:  | org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat | NULL |

And now if you try to write data into test_9 from anywhere, you would be able to.

Hope that helps!