Member since
11-14-2016
5
Posts
1
Kudos Received
0
Solutions
12-06-2016
04:41 PM
@Pooja Sahu In the source file, the original '\001' character code has been replaced with the string representation "^A". One way to process the file is to convert it back to \001: CREATE EXTERNAL TABLE fix_raw (line string)
ROW FORMAT DELIMITED
LOCATION '/user/pooja/fix/';
CREATE TABLE fix_map (tag MAP<STRING, STRING>)
STORED AS ORC;
INSERT INTO TABLE fix_map
SELECT str_to_map( replace(line, '^A', '\001'), '\001', '=') tag from fix_raw;
-- query
SELECT tag[49] FROM fix_map;
... View more
11-14-2016
05:48 PM
2 Kudos
PigStorage The PigStorage needs to know the delimiter of your fields. The default delimiter is tab, which is used when you use PigStorage(). You can specify the delimiter, like you did when you used PigStorage(','). If your file is comma-delim and you use PigStorage() it will ignore the commas and see only one field (because it cannot find a tab) ... the commas just happen to be characters in a string. By correctly specifying PigStorage(','), it breaks each line into fields separated by the comma. https://pig.apache.org/docs/r0.9.1/func.html#pigstorage Register The link you mention (https://pig.apache.org/docs/r0.15.0/basic.html#register) is to register UDFs. There are two types of functions in pig: native functions and user define functions (UDFs). Native functions come native to the pig binaries and you do not need to do anything to call them. UDFs you build yourself, into a jar file, and register them so the script can find them. Since PigStorage is a native function, you do not need to register them ... pig will find them. (Thus the link is not relevant to your script). If this is what you were looking for, please let me know by accepting the answer; else, let me know of remaining gaps.
... View more