Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

how to compress bzip2 format and insert into hive

Highlighted

how to compress bzip2 format and insert into hive

New Contributor

hi,

I am trying to insert my dataframe using orc and bzip2 compression but it is throwing me the error

java.lang.IllegalArgumentException: Codec [bzip2] is not available. Available codecs are uncompressed, lzo, snappy, zlib, none.
  at org.apache.spark.sql.hive.orc.OrcOptions.<init>(OrcOptions.scala:49)
  at org.apache.spark.sql.hive.orc.OrcOptions.<init>(OrcOptions.scala:32)
  at org

My code is

fields.write.format("orc").option("compression","bzip2").saveAsTable("prasadtest.descargatest")

I am using spark 2 for this.
3 REPLIES 3

Re: how to compress bzip2 format and insert into hive

Hi Prasad

You need to import the BZip2Codec class in your code. Simply add the following line to your code and it should work fine.

import org.apache.hadoop.io.compress.BZip2Codec;

Re: how to compress bzip2 format and insert into hive

Expert Contributor

Hi, @prasad raju

Unfortunately, ORC doesn't support BZip2, so Hive and Spark doesn't.

- ORC Source Code

- HIVE-5067

Re: how to compress bzip2 format and insert into hive

Super Guru

Use Snappyas your compression