- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Pig data load problem
- Labels:
-
Apache Hadoop
-
Apache Pig
Created ‎11-18-2016 09:44 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Below is my source in HDFS: /abc/
Hadoop is an open source
MR is to process data in hadoop.
Hadoop has a good eco system.
I want to do below opearation filter_records = FILTER ya BY $0 MATCHES '.*Hadoop.*'; but load command is unsuccessful.Could anybody provide input on load statement? grunt> ya = load '/abc/' USING TextLoader(); 2016-11-17 21:00:14,470 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s). grunt> yab = load '/abc/'; 2016-11-17 21:00:50,199 [main] WARN org.apache.pig.newplan.BaseOperatorPlan - Encountered Warning IMPLICIT_CAST_TO_CHARARRAY 1 time(s). grunt>
Created ‎11-18-2016 02:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
From the information given, there is not a load problem just an explicit warning that the data loaded is being cast to chararray (string) during the filter operation.
A couple points:
- If you do not specify type on load the default is bytearray. When you are filtering, you are treating it as a string (chararray type in pig) and pig will convert the bytearray to charraray during this operation.
- TextLoader() will load all data as a single record (no delimiters).
- If you want to load delimited file (fields, eg a CSV) then you use PigStorage(). You can specify the delimiter, e.g. PigStorage(',') and if not specified it uses the default of tab delim.
http://pig.apache.org/docs/r0.16.0/basic.html#Data+Types+and+More
Not sure if that is what you were looking for ... if so, let me know by accepting the answer; if not, let me know more specifics.
Created ‎11-18-2016 02:16 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
From the information given, there is not a load problem just an explicit warning that the data loaded is being cast to chararray (string) during the filter operation.
A couple points:
- If you do not specify type on load the default is bytearray. When you are filtering, you are treating it as a string (chararray type in pig) and pig will convert the bytearray to charraray during this operation.
- TextLoader() will load all data as a single record (no delimiters).
- If you want to load delimited file (fields, eg a CSV) then you use PigStorage(). You can specify the delimiter, e.g. PigStorage(',') and if not specified it uses the default of tab delim.
http://pig.apache.org/docs/r0.16.0/basic.html#Data+Types+and+More
Not sure if that is what you were looking for ... if so, let me know by accepting the answer; if not, let me know more specifics.
Created ‎11-18-2016 05:20 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Greg Keys
Thanks for input.your input is always appreciated.one clarification
Then I should get warning during below filter statement but why i got warning during load statement.In load statement i am not converting bytearray to chararray. Then why i got warning during load statement?
filter_records = FILTER ya BY $0 MATCHES '.*Hadoop.*';
Created ‎11-18-2016 05:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Yes, I saw that. In my environment I got it only during filter and not the load.
- what pig version are you using?
- what happens when you do: USING PigStorage() as (str:chararray);
In any case, it is just a warning to let you know nothing invisible is happening under the scenes.
Created ‎11-21-2016 10:29 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi @Greg Keys.
1)after using USING PigStorage() as (str:chararray); Issue is resolved.Thanks for your valuable time.
Created ‎11-21-2016 12:10 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Glad it worked out Vamsi 🙂
