Reply
New Contributor
Posts: 2
Registered: ‎08-21-2013

cna't connect w/ Amazon S3

I am having a horrible time trying to connect with Amazon S3 public datasets.

 

When I do the folloiwng in Pig I get an error.

 

 hdfs@ip-172-31-35-24:~$ pig
2013-08-22 00:43:00,343 [main] INFO  org.apache.pig.Main - Apache Pig version 0.11.0-cdh4.3.0 (rexported) compiled May 27 2013, 20:40:22
2013-08-22 00:43:00,344 [main] INFO  org.apache.pig.Main - Logging error messages to: /var/lib/hadoop-hdfs/pig_1377132180339.log
2013-08-22 00:43:00,371 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /var/lib/hadoop-hdfs/.pigbootup not found
2013-08-22 00:43:00,696 [main] WARN  org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
2013-08-22 00:43:00,696 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://ip-172-31-35-24.us-west-2.compute.internal:8020
2013-08-22 00:43:01,633 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: ip-172-31-35-24.us-west-2.compute.internal:8021
2013-08-22 00:43:01,635 [main] WARN  org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
grunt> cd s3://datasets.elasticmapreduce/ngrams/books
2013-08-22 00:43:13,413 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2999: Unexpected internal error. Wrong FS: s3://datasets.elasticmapreduce/ngrams/books, expected: hdfs://ip-172-31-35-24.us-west-2.compute.internal:8020
Details at logfile: /var/lib/hadoop-hdfs/pig_1377132180339.log
grunt>
grunt>
grunt>

 

Can someone please help me as I'm a newbie to this stuff..

 

Thanks.

Expert Contributor
Posts: 63
Registered: ‎08-06-2013

Re: cna't connect w/ Amazon S3

Use the Pig version packaged with the EMR (Elastic MapReduce). Is the Pig version installed separately?

New Contributor
Posts: 2
Registered: ‎08-21-2013

Re: cna't connect w/ Amazon S3

The version of pig that's I'm using is the one that is installed with CDH.

 

hdfs@ip-172-31-35-24:/tmp/pig-0.11.1$ pig
13/08/22 17:23:25 WARN pig.Main: Cannot write to log file: /tmp/pig-0.11.1/pig_1377192205317.log
2013-08-22 17:23:25,323 [main] INFO  org.apache.pig.Main - Apache Pig version 0.11.0-cdh4.3.0 (rexported) compiled May 27 2013, 20:40:22
2013-08-22 17:23:25,351 [main] INFO  org.apache.pig.impl.util.Utils - Default bootup file /var/lib/hadoop-hdfs/.pigbootup not found
2013-08-22 17:23:25,681 [main] WARN  org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
2013-08-22 17:23:25,681 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: hdfs://ip-172-31-35-24.us-west-2.compute.internal:8020
2013-08-22 17:23:26,625 [main] INFO  org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: ip-172-31-35-24.us-west-2.compute.internal:8021
2013-08-22 17:23:26,627 [main] WARN  org.apache.hadoop.conf.Configuration - fs.default.name is deprecated. Instead, use fs.defaultFS
grunt>
grunt>

I looked at the Amazon EMR page ( http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/Pig_SupportedVersions.html ) and it looks like the version shipped with CDH is not compatible?

 

BTW, I do apprecitate the help. I'm a total newbie at this stuff.

 

Thanks.

 

-brad w.

Highlighted
Posts: 1,760
Kudos: 379
Solutions: 282
Registered: ‎07-31-2013

Re: cna't connect w/ Amazon S3

Are you facing issues only with commands such as "cd" or when running actual Pig queries too? If the latter was tried, can you post the error you get from queries such as STORE or LOAD operators?

Announcements