Support Questions

Find answers, ask questions, and share your expertise

how to compress bzip2 format and insert into hive

avatar

hi,

I am trying to insert my dataframe using orc and bzip2 compression but it is throwing me the error

java.lang.IllegalArgumentException: Codec [bzip2] is not available. Available codecs are uncompressed, lzo, snappy, zlib, none.
  at org.apache.spark.sql.hive.orc.OrcOptions.<init>(OrcOptions.scala:49)
  at org.apache.spark.sql.hive.orc.OrcOptions.<init>(OrcOptions.scala:32)
  at org

My code is

fields.write.format("orc").option("compression","bzip2").saveAsTable("prasadtest.descargatest")

I am using spark 2 for this.
3 REPLIES 3

avatar

Hi Prasad

You need to import the BZip2Codec class in your code. Simply add the following line to your code and it should work fine.

import org.apache.hadoop.io.compress.BZip2Codec;

avatar
Expert Contributor

Hi, @prasad raju

Unfortunately, ORC doesn't support BZip2, so Hive and Spark doesn't.

- ORC Source Code

- HIVE-5067

avatar
Master Guru

Use Snappyas your compression