Member since
10-01-2015
3933
Posts
1150
Kudos Received
374
Solutions
My Accepted Solutions
| Title | Views | Posted |
|---|---|---|
| 3574 | 05-03-2017 05:13 PM | |
| 2945 | 05-02-2017 08:38 AM | |
| 3196 | 05-02-2017 08:13 AM | |
| 3158 | 04-10-2017 10:51 PM | |
| 1632 | 03-28-2017 02:27 AM |
01-29-2016
06:06 PM
@John Smith the code work-around works, I was running in tez mode by the way. outputSet = foreach outputSet generate $0 as (name:chararray) , $1 as (customerId:chararray), $2 as (VIN:chararray) , $3 as (Birthdate:chararray), $4 as (Mileage:chararray) ,$5 as (Fuel_Consumption:chararray);
store outputSet into 'avroout2' using AvroStorage(); Input(s):
Successfully read 100 records (15099 bytes) from: "/user/root/Test-20160129-1401822-lake.avro"
Successfully read 100 records (12703 bytes) from: "/user/root/Test-20160129-1401822-ttp.avro"
Output(s):
Successfully stored 100 records (7703 bytes) in: "hdfs://sandbox.hortonworks.com:8020/user/root/avroout2"
grunt> 2016-01-29 18:04:19,978 [main] INFO org.apache.pig.Main - Pig script completed in 1 minute, 52 seconds and 249 milliseconds (112249 ms)
2016-01-29 18:04:19,978 [main] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezLauncher - Shutting down thread pool
2016-01-29 18:04:20,008 [Thread-1] ERROR org.apache.pig.impl.io.FileLocalizer - java.io.IOException: Filesystem closed
2016-01-29 18:04:20,025 [Thread-23] INFO org.apache.pig.backend.hadoop.executionengine.tez.TezSessionManager - Shutting down Tez session org.apache.tez.client.TezClient@2c8b16b6
2016-01-29 18:04:20,025 [Thread-23] INFO org.apache.tez.client.TezClient - Shutting down Tez Session, sessionName=PigLatin:DefaultJobName, applicationId=application_1454090472993_0001
[root@sandbox pig-upload]# hdfs dfs -ls avroout2
Found 2 items
-rw-r--r-- 3 root hdfs 0 2016-01-29 18:03 avroout2/_SUCCESS
-rw-r--r-- 3 root hdfs 7703 2016-01-29 18:03 avroout2/part-v003-o000-r-00000.avro
[root@sandbox pig-upload]# hdfs dfs -cat avroout2/part-v003-o000-r-00000.avro | less
... View more
01-29-2016
05:54 PM
@Ram D run "df -h" to find out the mount points with most used disk space. then run "du -hs *" to get the directory with most used space. Then start purging whatever is not necessary or move to another directory or compress.
... View more
01-29-2016
04:54 PM
@John Smith time to file a jira, great job investigating this.
... View more
01-29-2016
04:17 PM
yes I got the same. One more thing to try is store as PigStorage, then in another script load that dataset using Pigstorage and store as Avro. I'm wondering if that will work. @John Smith
... View more
01-29-2016
04:17 PM
@John Smith excellent, you went to the source code. It's actually [A-Za-z_][A-Za-z0-9_]* so plus asterisc. So if you did check and got no results of it, perhaps you discovered a bug? Once you're 100% sure, I suggest you file a Jira with Pig project.
... View more
01-29-2016
03:56 PM
@John Smith like I said, I tried with PigStorage and it worked fine, take a look at OrcStorage, which is pretty good columnar format for Pig, Hive and Spark (meaning you can query the same table from either tool natively), there are many formats, I can't recommend anything unless we know your use case. I do like Avro but sometimes it's driving me insane :). Try looking at the schemas, you can probably still get it working, I just don't have time to look at it. If you do find a solution, post here so we could all learn!
... View more
01-29-2016
03:52 PM
same error as I pasted. @John Smith
... View more
01-29-2016
03:39 PM
@Suresh Bonam let me know if that works for you and close the thread :).
... View more
01-29-2016
03:38 PM
@John Smith I just read the AvroStorage wiki, they do say they have limited support for union schemas and record types, I guess the only thing I can comment on is that AvroStorage is limited in its functionality. Perhaps you'd want to look at other Storage Formats.
... View more
01-29-2016
03:31 PM
@John Smith your issue is with some reserved word in avro schema. Here's what I'm getting grunt> nonSensSet = load '/user/root/Test-20160129-1401822-lake.avro' USING AvroStorage();
grunt> sensitiveSet = load '/user/root/Test-20160129-1401822-ttp.avro' using AvroStorage();
grunt> outputSet = join sensitiveSet by Row_ID, nonSensSet by Row_ID;
grunt> outputSet = distinct outputSet;
grunt> outputSet = foreach outputSet generate nonSensSet::name,nonSensSet::customerId,nonSensSet::Mileage,nonSensSet::Fuel_Consumption,sensitiveSet::VIN,sensitiveSet::Birthdate;
grunt> store outputSet into 'avrostorage' using AvroStorage();
2016-01-29 15:27:00,682 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2116:
<line 6, column 0> Output Location Validation Failed for: 'hdfs://sandbox.hortonworks.com:8020/user/root/avrostorage More info to follow:
Pig Schema contains a name that is not allowed in Avro
Details at logfile: /root/pig-upload/pig_1454081182813.log
I saved the outputSet successfully as PigStorage(','); so I can't comment what the issue is. Something intricate about Avro.
... View more