Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

Pig data load problem

avatar
Expert Contributor

Below is my source in HDFS: /abc/

Hadoop is an open source

MR is to process data in hadoop.

Hadoop has a good eco system.

I want to do below opearation
filter_records = FILTER ya BY $0 MATCHES '.*Hadoop.*'; 
but load command is unsuccessful.Could anybody provide input on load statement?

grunt> ya = load '/abc/' USING TextLoader();
2016-11-17 21:00:14,470 [main] WARN  org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).
grunt> yab = load '/abc/';
2016-11-17 21:00:50,199 [main] WARN  org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).
grunt> 
1 ACCEPTED SOLUTION

avatar
Guru

From the information given, there is not a load problem just an explicit warning that the data loaded is being cast to chararray (string) during the filter operation.

A couple points:

  1. If you do not specify type on load the default is bytearray. When you are filtering, you are treating it as a string (chararray type in pig) and pig will convert the bytearray to charraray during this operation.
  2. TextLoader() will load all data as a single record (no delimiters).
  3. If you want to load delimited file (fields, eg a CSV) then you use PigStorage(). You can specify the delimiter, e.g. PigStorage(',') and if not specified it uses the default of tab delim.

http://pig.apache.org/docs/r0.16.0/basic.html#Data+Types+and+More

Not sure if that is what you were looking for ... if so, let me know by accepting the answer; if not, let me know more specifics.

View solution in original post

5 REPLIES 5

avatar
Guru

From the information given, there is not a load problem just an explicit warning that the data loaded is being cast to chararray (string) during the filter operation.

A couple points:

  1. If you do not specify type on load the default is bytearray. When you are filtering, you are treating it as a string (chararray type in pig) and pig will convert the bytearray to charraray during this operation.
  2. TextLoader() will load all data as a single record (no delimiters).
  3. If you want to load delimited file (fields, eg a CSV) then you use PigStorage(). You can specify the delimiter, e.g. PigStorage(',') and if not specified it uses the default of tab delim.

http://pig.apache.org/docs/r0.16.0/basic.html#Data+Types+and+More

Not sure if that is what you were looking for ... if so, let me know by accepting the answer; if not, let me know more specifics.

avatar
Expert Contributor

Hi @Greg Keys

Thanks for input.your input is always appreciated.one clarification

Then I should get warning during below filter statement but why i got warning during load statement.In load statement i am not converting bytearray to chararray. Then why i got warning during load statement?

filter_records = FILTER ya BY $0 MATCHES '.*Hadoop.*';

avatar
Guru

Yes, I saw that. In my environment I got it only during filter and not the load.

  • what pig version are you using?
  • what happens when you do: USING PigStorage() as (str:chararray);

In any case, it is just a warning to let you know nothing invisible is happening under the scenes.

avatar
Expert Contributor

Hi @Greg Keys.

1)after using USING PigStorage() as (str:chararray); Issue is resolved.Thanks for your valuable time.

avatar
Guru

Glad it worked out Vamsi 🙂