Support Questions
Find answers, ask questions, and share your expertise

ORC does not support type conversion from VARCHAR to STRING


I have copied data from one cluster to another cluster, later I got the DDL from the existing cluster and ran the same DDL on newly copied data cluster. When I'm trying to check the data I'm getting below error message.

Any needful help is highly appreciated. Thanks in advance.

Error Message:

 Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:17, Vertex vertex_1487041727386_3755_1_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]
ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1487041727386_3755_1_00, diagnostics=[Task failed, taskId=task_1487041727386_3755_1_00_000016, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: ORC does not support type conversion from VARCHAR to STRING
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable$
        at Method)
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(
        at org.apache.tez.runtime.task.TezTaskRunner$TaskRunnerCallable.callInternal(
        at java.util.concurrent.ThreadPoolExecutor.runWorker(
        at java.util.concurrent.ThreadPoolExecutor$
Caused by: java.lang.RuntimeException: ORC does not support type conversion from VARCHAR to STRING
        at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader(
        at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.<init>(
        at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat.getRecordReader(
        at org.apache.tez.mapreduce.lib.MRReaderMapred.setupOldRecordReader(
        at org.apache.tez.mapreduce.lib.MRReaderMapred.setSplit(
        at org.apache.tez.mapreduce.input.MRInput.initFromEventInternal(
        at org.apache.tez.mapreduce.input.MRInput.initFromEvent(
        at org.apache.tez.mapreduce.input.MRInputLegacy.checkAndAwaitRecordReaderInitialization(
        at org.apache.tez.mapreduce.input.MRInputLegacy.init(
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.getMRInput(
        at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.init(
        at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.initializeAndRunProcessor(
        ... 14 more

Super Mentor

Sometimes the ORC input files has the columns as VARCHAR columns instead of STRING. This can be identified easily by running hive orc dump for input files utility.

Many times it happens that the input files are generated by mapreduce job, It is recommended to check the mapreduce program to consistency generate files of STRING type columns.

@Jay SenSharma

We have blocked Hive CLI due to security reasons, do you have something similar for beeline ?

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.