Support Questions
Find answers, ask questions, and share your expertise

Problems after upgrade from CDH4.2.1 (rpms) to CDH4.6.0 (parcels)

Explorer

We've recently upgraded from CDH4.2.1(rpm installation) to CDH4.6.0 (using parcel installation). 

 

We removed the rpms using the script from the install manual. 

 

For the most part it was seemless. Everything worked without a problem.

 

But after the upgrade PIG's illustrate command fails. 

 

It seems to fail in the ExampleGenerator class at the getExamples method. 

 

I've tried it in both local and mapreduce modes. 

 

I even tried putting a symlink in /usr/lib/pig to the /opt/cloudera/parcels/CDH/lib/pig dir.

 

I've tried specifying PIG_INSTALL environment variable too.

 

I think it is a basic config missing somewhere. The scripts are the same as before the upgrade.

 

I simplified the problem and here are the steps to reproduce:

 

1. create a file test.txt

              3,4,5,6

              t,6,7,7

              7,t,r,6

2. start pig local mode.   pig -x local

3. run this command:     A = load 'test.txt' using PigStorage(',') as (f1,f2,f3,f4);

4, then run:  illustrate A;

 

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions

Re: Problems after upgrade from CDH4.2.1 (rpms) to CDH4.6.0 (parcels)

cssh is handy for accessing all machines, assuming your cluster security is set up in a convenient fashion.

 

The goal of the restart is to fix symlinks. You should not restart the OS. I'm not 100% sure if restarting the cluster is necessary. I'd only do that if your problem doesn't go away after you've restarted all agents.

 

To roll back to prior parcels, just activate a parcel of an older version. This is fine for minor version changes, such as 4.6.0 to 4.2.1.

 

If you need to run a custom binary for pig, this will probably be tricky. The best thing to do would be to make your own custom parcel with the pig binaries swapped out. Let's hope that isn't needed, since that is both difficult and could expose you to unexpected compatibility issues since Cloudera only tests the combinations of binaries that we ship in a CDH release.

View solution in original post

7 REPLIES 7

Re: Problems after upgrade from CDH4.2.1 (rpms) to CDH4.6.0 (parcels)

Can you include the failure message?

 

Also, try restarting all of your CM agents (on each host, service cloudera-scm-agent restart). This is helpful whenever you have packages and parcels installed at the same time, then remove packages. Restarting the agents will populate your symlinks to point to the new parcel binaries.

 

Thanks,

Darren

Re: Problems after upgrade from CDH4.2.1 (rpms) to CDH4.6.0 (parcels)

Explorer
grunt> a = load 'test.txt' using PigStorage(',') as (f1,f2,f3,f4);
2014-04-21 16:56:01,940 [main] WARN  org.apache.hadoop.conf.Configuration - dfs.umaskmode is deprecated. Instead, use fs.permissions.umask-mode
2014-04-21 16:56:01,941 [main] WARN  org.apache.hadoop.conf.Configuration - topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl
2014-04-21 16:56:01,941 [main] WARN  org.apache.hadoop.conf.Configuration - dfs.df.interval is deprecated. Instead, use fs.df.interval
2014-04-21 16:56:01,941 [main] WARN  org.apache.hadoop.conf.Configuration - topology.script.number.args is deprecated. Instead, use net.topology.script.number.args
2014-04-21 16:56:01,942 [main] WARN  org.apache.hadoop.conf.Configuration - hadoop.native.lib is deprecated. Instead, use io.native.lib.available
grunt> illustrate a;
2014-04-21 16:56:05,676 [main] WARN  org.apache.hadoop.conf.Configuration - io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
2014-04-21 16:56:05,678 [main] WARN  org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
2014-04-21 16:56:05,680 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:///
2014-04-21 16:56:06,026 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false
2014-04-21 16:56:06,050 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1
2014-04-21 16:56:06,050 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1
2014-04-21 16:56:06,063 [main] INFO  org.apache.pig.tools.pigstats.ScriptState - Pig script settings are added to the job
2014-04-21 16:56:06,069 [main] ERROR org.apache.pig.pen.ExampleGenerator - Error reading data. Internal error creating job configuration.
java.lang.RuntimeException: Internal error creating job configuration.
        at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:160)
        at org.apache.pig.PigServer.getExamples(PigServer.java:1182)
        at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:739)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.Illustrate(PigScriptParser.java:626)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:323)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:69)
        at org.apache.pig.Main.run(Main.java:538)
        at org.apache.pig.Main.main(Main.java:157)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
2014-04-21 16:56:06,072 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2997: Encountered IOException. Exception
Details at logfile: /home/jhendric/pig_1398117340397.log

 I restarted the cluster and I thought that would include the agents. I'll restart the agents too. 

 

Re: Problems after upgrade from CDH4.2.1 (rpms) to CDH4.6.0 (parcels)

Explorer

I restarted the cloudera-scm-agent on the box I was running the pig  in -x local mode on.

 

I have the exact same results. No imporvement. Same errors.

Re: Problems after upgrade from CDH4.2.1 (rpms) to CDH4.6.0 (parcels)

CM Agents manage other daemons, and are not restarted when you restart the cluster.

I would try restarting the agent on all hosts, but this sounds like it's not an issue with Cloudera Manager, but rather with the underlying pig code you're running. Still worth doing the agent restart to be safe.

Try posting in the pig forum:
http://community.cloudera.com/t5/Data-Ingestion-Integration/bd-p/SqoopFlume

Re: Problems after upgrade from CDH4.2.1 (rpms) to CDH4.6.0 (parcels)

Explorer

Thanks for the input. I'll be restarting the agents across the cluster today. It takes a bit to set the up.

 

Three more questions about this:  

 

1. After restarting the agents, do I need to restart the server OS and/or cluster?

2. If I need to roll back to prior parcels, is there a procedure for that?

 3. If I need to run a different version of PIG to fix this, what is the Cloudera parcel process for this?

Re: Problems after upgrade from CDH4.2.1 (rpms) to CDH4.6.0 (parcels)

cssh is handy for accessing all machines, assuming your cluster security is set up in a convenient fashion.

 

The goal of the restart is to fix symlinks. You should not restart the OS. I'm not 100% sure if restarting the cluster is necessary. I'd only do that if your problem doesn't go away after you've restarted all agents.

 

To roll back to prior parcels, just activate a parcel of an older version. This is fine for minor version changes, such as 4.6.0 to 4.2.1.

 

If you need to run a custom binary for pig, this will probably be tricky. The best thing to do would be to make your own custom parcel with the pig binaries swapped out. Let's hope that isn't needed, since that is both difficult and could expose you to unexpected compatibility issues since Cloudera only tests the combinations of binaries that we ship in a CDH release.

View solution in original post

Re: Problems after upgrade from CDH4.2.1 (rpms) to CDH4.6.0 (parcels)

Explorer

Just and update:

 

I was able to replicate the entire issue in our dev cluster. 

 

I installed parcels on the v4.3.0 dev cluster. 

Retested Pig and the illustrate command worked.

Upgraded parcels to 4.6.0 (latest) 

Pig illustrate does not work.

Downgraded to v4.5.0 

Pig illustrate works.

 

I'm actually starting to do diffs between the 2 Pig code bases to see if I can see the change.

 

I don't know how I could tell if there is a general config change but it is looking like this is isolated to the v4.6.0 Pig distro.

 

 

Thanks for your help.