Created 11-02-2016 02:13 PM
After following the directions here (I'm on Linux, but could not locate the page pertinent to the Linux HDP):
All attempts at inserting into existing Hive tables (which are NOT setup for LZO compression) yield a long traceback featuring this :
Caused by: java.lang.IllegalArgumentException: Compression codec com.hadoop.compression.LzoCodec not found.
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139)
at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:179) at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45) ... 21 more Caused by: java.lang.ClassNotFoundException: Class com.hadoop.compression.LzoCodec not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132) ... 23 more
Why on earth is Hive even trying to use LZO? Very frustrating to find this level of fragility. Any way to get LZO to coexist with a functional Hive?
Update: I removed any and all mention of LZO from core-site.xml and Hive is still blowing up while search for codecs. Looks like we now have a completely hosed cluster.
Created 11-03-2016 01:22 PM
I was finally able to get running again:
- Removed references to lzo from all configurations (using Ambari)
- Manually removed all RPM packages on all machines that match *lzo*
Then I re-read the Ambari instructions for about the 40th time and realized where communication was breaking down. The only installation of packages I ever observed from Ambari was during initial installation and all of it was triggered from dialogs. It may be obvious to some folks that Ambari uses the presence of the code in the io.codecs list as a trigger for silent package install on restart, but it certainly wasn't to me (since NOTHING else I've encountered in the system works in that manner. All other installs have a progress indication and function test). Once I added the configuration (without having manually installed packages first), it indeed installed them itself during restart and everything worked when complete.
I would strongly suggest adding a small paragraph to the lzo configuration page to explicitly and clearly explain that this process physically installs the packages with no visual indication that this is occurring.
Created 11-02-2016 03:03 PM
For manually installing LZO
Execute the following command at all the nodes in your cluster:
yum install lzo lzo-devel hadooplzo hadooplzo-native
zypper install lzo lzo-devel hadooplzo hadooplzo-native
HDP support for Debian 6 is deprecated with HDP 2.4.2. Future versions of HDP will no longer be supported on Debian 6.
apt-get install liblzo2-2 liblzo2-dev hadooplzo
If you install with Ambari, you don't need to manually install LZO that should be done by the wizard.
Created 11-02-2016 03:10 PM
And that is precisely what I had done - to the letter. If there was an option for installation from Ambari, it is not evident. Where exactly is this "wizard" you refer to? Your web documentation states clearly that Ambari neither installs nor configures LZO.
The proximate issue is that Hive is totally broken now - even after removal of the two changes made to core-site.xml. Why is Hive even TRYING to use LZO? I did not configure that - I did not so much as touch Hive.
Created 11-02-2016 06:09 PM
it's on a table by table basis, Hive will just add it to the list of available compression algos available. It is not trying to use it, just make it available if you need it.
Did you run from Beeline or just hive cli?
Make sure the LZO jar is in your path
In Hive CLI https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli
Use the add jar syntax to point to that lzo jar then run your query
you can also make sure Hive, Hive Thrift, HDFS and other tools that may be referencing that are out of memory and restarted
Make sure settings are done on all nodes.
Without knowing your full environment, how it was setup / directory structure / cluster structure, it is very difficult to troubleshoot. If things were setup in a non-standard way it can be difficult to find things in the PATH that maybe needed for Hive.
Created 11-02-2016 03:06 PM
restart the servers
is this installed through Ambari? What version are you running?
Any other messages in the logs after you restarted.
Created 11-02-2016 05:09 PM
The cluster itself was installed through Ambari and has been running about a year. One of my users needed LZO compression enabled several days ago. You web site told me that Ambari does not install or configure LZO, so I followed the instructions as you entered them above. I added two changes to core-site.xml that were similarly documented in the HDP 2.3.2 web pages. After fixing an initial typo, we had working LZO and could explicitly invoke LzoIndexer on files in HDFS. Shortly after that I started receiving reports about Hive being broken. Originally it was complaining that it could not find the LzoCodec. I never told it to use the LzoCodec. I did not change Hive configuration. After removing the entries in core-site.xml, the Hive problems continued but it now tells me it cannot find "com" - a nonsense class name.
I did restart everything that needed to be restarted - several times, in fact. The only thing amiss in the Hive logs is the same traceback the user gets on a failed query:
2016-11-02 12:57:18,278 WARN [HiveServer2-Handler-Pool: Thread-2740]: thrift.ThriftCLIService (ThriftCLIService.java:FetchResults(681)) - Error fetching results: org.apache.hive.service.cli.HiveSQLException: java.io.IOException: java.lang.RuntimeException: Error in configuring object at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:352) at org.apache.hive.service.cli.operation.OperationManager.getOperationNextRowSet(OperationManager.java:221) at org.apache.hive.service.cli.session.HiveSessionImpl.fetchResults(HiveSessionImpl.java:685) at org.apache.hive.service.cli.CLIService.fetchResults(CLIService.java:454) at org.apache.hive.service.cli.thrift.ThriftCLIService.FetchResults(ThriftCLIService.java:672) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1553) at org.apache.hive.service.cli.thrift.TCLIService$Processor$FetchResults.getResult(TCLIService.java:1538) at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:285) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:508) at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:415) at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:140) at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1672) at org.apache.hive.service.cli.operation.SQLOperation.getNextRowSet(SQLOperation.java:347) ... 13 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hive.common.util.ReflectionUtil.setJobConf(ReflectionUtil.java:115) at org.apache.hive.common.util.ReflectionUtil.setConf(ReflectionUtil.java:103) at org.apache.hive.common.util.ReflectionUtil.newInstance(ReflectionUtil.java:87) at org.apache.hadoop.hive.ql.exec.FetchOperator.getInputFormatFromCache(FetchOperator.java:207) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextSplits(FetchOperator.java:361) at org.apache.hadoop.hive.ql.exec.FetchOperator.getRecordReader(FetchOperator.java:295) at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:446) ... 17 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.GeneratedMethodAccessor194.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hive.common.util.ReflectionUtil.setJobConf(ReflectionUtil.java:112) ... 23 more Caused by: java.lang.IllegalArgumentException: Compression codec com not found. at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:139) at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:179) at org.apache.hadoop.mapred.TextInputFormat.configure(TextInputFormat.java:45) ... 27 more Caused by: java.lang.ClassNotFoundException: Class com not found at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2101) at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132) ... 29 more 2016-11-02 12:57:18,281 INFO [HiveServer2-Handler-Pool: Thread-2740]: exec.ListSinkOperator (Operator.java:close(613)) - 10800 finished. closing...
We are running HDP-2.3.2 on Centos 6.7.
I do not know where to start troubleshooting this, particularly since it's not deterministic. Only some queries are blowing up with no obvious common denominator across them. Again, we made no changes to Hive and my users have made no changes in the way they are querying it.
Created 11-02-2016 05:22 PM
you need the configuration and you also need to make sure the LZO software is installed:
yum install lzo lzo-devel hadooplzo hadooplzo-native
zypper install lzo lzo-devel hadooplzo hadooplzo-native
HDP support for Debian 6 is deprecated with HDP 2.4.2. Future versions of HDP will no longer be supported on Debian 6.
apt-get install liblzo2-2 liblzo2-dev hadooplzo
You must Stop, then Start the HDFS service for Ambari to install the necessary LZO packages. Performing a Restart or a Restart All will not start the required package install.
Created 11-02-2016 05:54 PM
- Ambari on our system does not provide any facility to install lzo. You keep referring to this, but it isn't there. If you believe it should be, please tell me where I might find the dialog?
- I followed ALL the steps you outlined above, except for Hive. I DO NOT WANT LZO COMPRESSION ON HIVE. If that's not optional, then it should be documented as such.
- I did have things stopped when I installed the RPMs and updated configuration.
We're in a real mess here and currently trying to find someone to help us recover. I wish your company provided per-incident support, but that doesn't seem to be the case.
Created 11-03-2016 01:22 PM
I was finally able to get running again:
- Removed references to lzo from all configurations (using Ambari)
- Manually removed all RPM packages on all machines that match *lzo*
Then I re-read the Ambari instructions for about the 40th time and realized where communication was breaking down. The only installation of packages I ever observed from Ambari was during initial installation and all of it was triggered from dialogs. It may be obvious to some folks that Ambari uses the presence of the code in the io.codecs list as a trigger for silent package install on restart, but it certainly wasn't to me (since NOTHING else I've encountered in the system works in that manner. All other installs have a progress indication and function test). Once I added the configuration (without having manually installed packages first), it indeed installed them itself during restart and everything worked when complete.
I would strongly suggest adding a small paragraph to the lzo configuration page to explicitly and clearly explain that this process physically installs the packages with no visual indication that this is occurring.
Created 11-03-2016 01:23 PM
I'll pass this on to documentation team.