Support Questions
Find answers, ask questions, and share your expertise

Pig data load problem

Solved Go to solution
Highlighted

Pig data load problem

Contributor

Below is my source in HDFS: /abc/

Hadoop is an open source

MR is to process data in hadoop.

Hadoop has a good eco system.

I want to do below opearation
filter_records = FILTER ya BY $0 MATCHES '.*Hadoop.*'; 
but load command is unsuccessful.Could anybody provide input on load statement?

grunt> ya = load '/abc/' USING TextLoader();
2016-11-17 21:00:14,470 [main] WARN  org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).
grunt> yab = load '/abc/';
2016-11-17 21:00:50,199 [main] WARN  org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s).
grunt> 
1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: Pig data load problem

Guru

From the information given, there is not a load problem just an explicit warning that the data loaded is being cast to chararray (string) during the filter operation.

A couple points:

  1. If you do not specify type on load the default is bytearray. When you are filtering, you are treating it as a string (chararray type in pig) and pig will convert the bytearray to charraray during this operation.
  2. TextLoader() will load all data as a single record (no delimiters).
  3. If you want to load delimited file (fields, eg a CSV) then you use PigStorage(). You can specify the delimiter, e.g. PigStorage(',') and if not specified it uses the default of tab delim.

http://pig.apache.org/docs/r0.16.0/basic.html#Data+Types+and+More

Not sure if that is what you were looking for ... if so, let me know by accepting the answer; if not, let me know more specifics.

View solution in original post

5 REPLIES 5
Highlighted

Re: Pig data load problem

Guru

From the information given, there is not a load problem just an explicit warning that the data loaded is being cast to chararray (string) during the filter operation.

A couple points:

  1. If you do not specify type on load the default is bytearray. When you are filtering, you are treating it as a string (chararray type in pig) and pig will convert the bytearray to charraray during this operation.
  2. TextLoader() will load all data as a single record (no delimiters).
  3. If you want to load delimited file (fields, eg a CSV) then you use PigStorage(). You can specify the delimiter, e.g. PigStorage(',') and if not specified it uses the default of tab delim.

http://pig.apache.org/docs/r0.16.0/basic.html#Data+Types+and+More

Not sure if that is what you were looking for ... if so, let me know by accepting the answer; if not, let me know more specifics.

View solution in original post

Re: Pig data load problem

Contributor

Hi @Greg Keys

Thanks for input.your input is always appreciated.one clarification

Then I should get warning during below filter statement but why i got warning during load statement.In load statement i am not converting bytearray to chararray. Then why i got warning during load statement?

filter_records = FILTER ya BY $0 MATCHES '.*Hadoop.*';

Highlighted

Re: Pig data load problem

Guru

Yes, I saw that. In my environment I got it only during filter and not the load.

  • what pig version are you using?
  • what happens when you do: USING PigStorage() as (str:chararray);

In any case, it is just a warning to let you know nothing invisible is happening under the scenes.

Highlighted

Re: Pig data load problem

Contributor

Hi @Greg Keys.

1)after using USING PigStorage() as (str:chararray); Issue is resolved.Thanks for your valuable time.

Highlighted

Re: Pig data load problem

Guru

Glad it worked out Vamsi :)

Don't have an account?