Support Questions

vamsi123 · ‎11-18-2016

Below is my source in HDFS: /abc/

Hadoop is an open source

MR is to process data in hadoop.

Hadoop has a good eco system.

I want to do below opearation
filter_records = FILTER ya BY $0 MATCHES '.*Hadoop.*'; 
but load command is unsuccessful.Could anybody provide input on load statement?

grunt> ya = load '/abc/' USING TextLoader();
2016-11-17 21:00:14,470 [main] WARN  org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).
grunt> yab = load '/abc/';
2016-11-17 21:00:50,199 [main] WARN  org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).
grunt>

gkeys · ‎11-18-2016

From the information given, there is not a load problem just an explicit warning that the data loaded is being cast to chararray (string) during the filter operation.

A couple points:

If you do not specify type on load the default is bytearray. When you are filtering, you are treating it as a string (chararray type in pig) and pig will convert the bytearray to charraray during this operation.
TextLoader() will load all data as a single record (no delimiters).
If you want to load delimited file (fields, eg a CSV) then you use PigStorage(). You can specify the delimiter, e.g. PigStorage(',') and if not specified it uses the default of tab delim.

http://pig.apache.org/docs/r0.16.0/basic.html#Data+Types+and+More

Not sure if that is what you were looking for ... if so, let me know by accepting the answer; if not, let me know more specifics.

View solution in original post

gkeys · ‎11-18-2016

From the information given, there is not a load problem just an explicit warning that the data loaded is being cast to chararray (string) during the filter operation.

A couple points:

If you do not specify type on load the default is bytearray. When you are filtering, you are treating it as a string (chararray type in pig) and pig will convert the bytearray to charraray during this operation.
TextLoader() will load all data as a single record (no delimiters).
If you want to load delimited file (fields, eg a CSV) then you use PigStorage(). You can specify the delimiter, e.g. PigStorage(',') and if not specified it uses the default of tab delim.

http://pig.apache.org/docs/r0.16.0/basic.html#Data+Types+and+More

Not sure if that is what you were looking for ... if so, let me know by accepting the answer; if not, let me know more specifics.

vamsi123 · ‎11-18-2016

Hi @Greg Keys

Thanks for input.your input is always appreciated.one clarification

Then I should get warning during below filter statement but why i got warning during load statement.In load statement i am not converting bytearray to chararray. Then why i got warning during load statement?

filter_records = FILTER ya BY $0 MATCHES '.*Hadoop.*';

gkeys · ‎11-18-2016

Yes, I saw that. In my environment I got it only during filter and not the load.

what pig version are you using?
what happens when you do: USING PigStorage() as (str:chararray);

In any case, it is just a warning to let you know nothing invisible is happening under the scenes.

vamsi123 · ‎11-21-2016

Hi @Greg Keys.

1)after using USING PigStorage() as (str:chararray); Issue is resolved.Thanks for your valuable time.

gkeys · ‎11-21-2016

Glad it worked out Vamsi 🙂

Cloudera Community

Support Questions

Pig data load problem