Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Installing LZO compression broke Hive completely

avatar
Expert Contributor

After following the directions here (I'm on Linux, but could not locate the page pertinent to the Linux HDP):

http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.2-Win/bk_HDP_Install_Win/content/LZOCompressio...

All attempts at inserting into existing Hive tables (which are NOT setup for LZO compression) yield a long traceback featuring this :

Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.LzoCodec not found.
   at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139)

at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:179) at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45) ... 21 more Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.LzoCodec not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132) ... 23 more

Why on earth is Hive even trying to use LZO? Very frustrating to find this level of fragility. Any way to get LZO to coexist with a functional Hive?

Update: I removed any and all mention of LZO from core-site.xml and Hive is still blowing up while search for codecs. Looks like we now have a completely hosed cluster.

1 ACCEPTED SOLUTION

avatar
Expert Contributor

@Timothy Spann

I was finally able to get running again:

- Removed references to lzo from all configurations (using Ambari)

- Manually removed all RPM packages on all machines that match *lzo*

Then I re-read the Ambari instructions for about the 40th time and realized where communication was breaking down. The only installation of packages I ever observed from Ambari was during initial installation and all of it was triggered from dialogs. It may be obvious to some folks that Ambari uses the presence of the code in the io.codecs list as a trigger for silent package install on restart, but it certainly wasn't to me (since NOTHING else I've encountered in the system works in that manner. All other installs have a progress indication and function test). Once I added the configuration (without having manually installed packages first), it indeed installed them itself during restart and everything worked when complete.

I would strongly suggest adding a small paragraph to the lzo configuration page to explicitly and clearly explain that this process physically installs the packages with no visual indication that this is occurring.

View solution in original post

9 REPLIES 9

avatar
Master Guru

For manually installing LZO

https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.4.3/bk_installing_manually_book/content/install...

3.2. Install LZO

Execute the following command at all the nodes in your cluster:

  • RHEL/CentOS/Oracle Linux:

    yum install lzo lzo-devel hadooplzo hadooplzo-native

  • For SLES:

    zypper install lzo lzo-devel hadooplzo hadooplzo-native

  • For Ubuntu/Debian:

    HDP support for Debian 6 is deprecated with HDP 2.4.2. Future versions of HDP will no longer be supported on Debian 6.

    apt-get install liblzo2-2 liblzo2-dev hadooplzo

If you install with Ambari, you don't need to manually install LZO that should be done by the wizard.

avatar
Expert Contributor

@Timothy Spann

And that is precisely what I had done - to the letter. If there was an option for installation from Ambari, it is not evident. Where exactly is this "wizard" you refer to? Your web documentation states clearly that Ambari neither installs nor configures LZO.

The proximate issue is that Hive is totally broken now - even after removal of the two changes made to core-site.xml. Why is Hive even TRYING to use LZO? I did not configure that - I did not so much as touch Hive.

avatar
Master Guru

it's on a table by table basis, Hive will just add it to the list of available compression algos available. It is not trying to use it, just make it available if you need it.

Did you run from Beeline or just hive cli?

Make sure the LZO jar is in your path

In Hive CLI https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli

Use the add jar syntax to point to that lzo jar then run your query

you can also make sure Hive, Hive Thrift, HDFS and other tools that may be referencing that are out of memory and restarted

Make sure settings are done on all nodes.

Without knowing your full environment, how it was setup / directory structure / cluster structure, it is very difficult to troubleshoot. If things were setup in a non-standard way it can be difficult to find things in the PATH that maybe needed for Hive.

avatar
Master Guru

restart the servers

is this installed through Ambari? What version are you running?

Any other messages in the logs after you restarted.

avatar
Expert Contributor

@Timothy Spann

The cluster itself was installed through Ambari and has been running about a year. One of my users needed LZO compression enabled several days ago. You web site told me that Ambari does not install or configure LZO, so I followed the instructions as you entered them above. I added two changes to core-site.xml that were similarly documented in the HDP 2.3.2 web pages. After fixing an initial typo, we had working LZO and could explicitly invoke LzoIndexer on files in HDFS. Shortly after that I started receiving reports about Hive being broken. Originally it was complaining that it could not find the LzoCodec. I never told it to use the LzoCodec. I did not change Hive configuration. After removing the entries in core-site.xml, the Hive problems continued but it now tells me it cannot find "com" - a nonsense class name.

I did restart everything that needed to be restarted - several times, in fact. The only thing amiss in the Hive logs is the same traceback the user gets on a failed query:

2016-11-02 12:57:18,278 WARN  [HiveServer2-Handler-Pool: Thread-2740]: thrift.ThriftCLIService (ThriftCLIService.java:FetchResults(681)) - Error fetching results: 
org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.RuntimeException: Error in configuring object
    at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:352)
    at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:221)
    at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:685)
    at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:454)
    at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:672)
    at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553)
    at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538)
    at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
    at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
    at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
    at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: java.lang.RuntimeException: Error in configuring object
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:415)
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140)
    at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1672)
    at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:347)
    ... 13 more
Caused by: java.lang.RuntimeException: Error in configuring object
    at org.apache.hive.common.util.ReflectionUtil.setJobConf(ReflectionUtil.java:115)
    at org.apache.hive.common.util.ReflectionUtil.setConf(ReflectionUtil.java:103)
    at org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:87)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getInputFormatFromCache(FetchOperator.java:207)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:361)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:295)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:446)
    ... 17 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.GeneratedMethodAccessor194.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:606)
    at org.apache.hive.common.util.ReflectionUtil.setJobConf(ReflectionUtil.java:112)
    ... 23 more
Caused by: java.lang.IllegalArgumentException: Compression codec com not found.
    at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139)
    at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:179)
    at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45)
    ... 27 more
Caused by: java.lang.ClassNotFoundException: Class com not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101)
    at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132)
    ... 29 more
2016-11-02 12:57:18,281 INFO  [HiveServer2-Handler-Pool: Thread-2740]: exec.ListSinkOperator (Operator.java:close(613)) - 10800 finished. closing... 

We are running HDP-2.3.2 on Centos 6.7.

I do not know where to start troubleshooting this, particularly since it's not deterministic. Only some queries are blowing up with no obvious common denominator across them. Again, we made no changes to Hive and my users have made no changes in the way they are querying it.

avatar
Master Guru

you need the configuration and you also need to make sure the LZO software is installed:

  • RHEL/CentOS/Oracle Linux:

    yum install lzo lzo-devel hadooplzo hadooplzo-native

  • For SLES:

    zypper install lzo lzo-devel hadooplzo hadooplzo-native

  • For Ubuntu/Debian:

    HDP support for Debian 6 is deprecated with HDP 2.4.2. Future versions of HDP will no longer be supported on Debian 6.

    apt-get install liblzo2-2 liblzo2-dev hadooplzo

http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/Ambari-1.6.0.0/bk_ambari_reference/content/...

https://docs.hortonworks.com/HDPDocuments/Ambari-2.2.1.1/bk_ambari_reference_guide/content/_configur...

http://dev.hortonworks.com.s3.amazonaws.com/HDPDocuments/Ambari-1.6.0.0/bk_ambari_reference/content/...

You must Stop, then Start the HDFS service for Ambari to install the necessary LZO packages. Performing a Restart or a Restart All will not start the required package install.

avatar
Expert Contributor

@Timothy Spann

- Ambari on our system does not provide any facility to install lzo. You keep referring to this, but it isn't there. If you believe it should be, please tell me where I might find the dialog?

- I followed ALL the steps you outlined above, except for Hive. I DO NOT WANT LZO COMPRESSION ON HIVE. If that's not optional, then it should be documented as such.

- I did have things stopped when I installed the RPMs and updated configuration.

We're in a real mess here and currently trying to find someone to help us recover. I wish your company provided per-incident support, but that doesn't seem to be the case.

avatar
Expert Contributor

@Timothy Spann

I was finally able to get running again:

- Removed references to lzo from all configurations (using Ambari)

- Manually removed all RPM packages on all machines that match *lzo*

Then I re-read the Ambari instructions for about the 40th time and realized where communication was breaking down. The only installation of packages I ever observed from Ambari was during initial installation and all of it was triggered from dialogs. It may be obvious to some folks that Ambari uses the presence of the code in the io.codecs list as a trigger for silent package install on restart, but it certainly wasn't to me (since NOTHING else I've encountered in the system works in that manner. All other installs have a progress indication and function test). Once I added the configuration (without having manually installed packages first), it indeed installed them itself during restart and everything worked when complete.

I would strongly suggest adding a small paragraph to the lzo configuration page to explicitly and clearly explain that this process physically installs the packages with no visual indication that this is occurring.

avatar
Master Guru

I'll pass this on to documentation team.