Support Questions
Find answers, ask questions, and share your expertise

Issues post hdp 3.1 migration

New Contributor

We have used union all to join 2 data sets with different datatypes. It was working before hdp 3.1 migration. But after migration this is failing with Unsupported operation exception. I completely understand datatypes should be matching for Union or Union ALL. But how it was working before migration. Even after migration if we are writing the result from Driver, it is working. If we are caching intermediate result, it is working. But if we are writing the result set outside driver class, it is not working. I didn't understand the difference.


New Contributor

Please find the below error we are getting

{"level":"ERROR","timestamp":"2020-03-06 03:18:50,038","thread":"Executor task launch worker for task 14","className":"Logging.scala", "line":"91","ApplicationInfo":Exception in task 0.0 in stage 13.0 (TID 14)}
at org.apache.spark.sql.vectorized.ArrowColumnVector$ArrowVectorAccessor.getLong(
at org.apache.spark.sql.vectorized.ArrowColumnVector.getLong(
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage3.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
at org.apache.spark.executor.Executor$
at java.util.concurrent.ThreadPoolExecutor.runWorker(
at java.util.concurrent.ThreadPoolExecutor$

New Contributor


I'm facing same problem, have you found a solution ?