Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

LoadFunction UDF PIG

Highlighted

LoadFunction UDF PIG

New Contributor

Hi guys,

I need to create a UDF that defines custom load location for example:

before attempting UDF i tried to do parameter substitution inside of pig script which does not work:

 

--myscript.pig
time = LOAD 'hdfs:/home/raw/report/last_process_time/part-r-00000' AS DATE;
start_ts = foreach time generate startTS(DATE);
raw = LOAD '/home/raw/report/$END' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]);
run -param PATH='/home/raw/reports/$END/*' hdfs:/home/raw/pig-script/update_test.pig

expecting PATH would become the content of start_ts.

so here's an attempt to a solution that i have in mind:
- creating a customLoad() UDF that accept a tuple as input:
-constructor

public customLoad(Tuple input) throws ExecException {
String str = input.get(0).toString();
Date date = new Date(((Long.parseLong(str) * 1000)) + (60 * 60 * 1000));
SimpleDateFormat sdf = new SimpleDateFormat("YYYY/MM/dd/HH");
newpath = sdf.format(date);
}

and updating path's location assuming default location is /home/raw/report

@Override
public void setLocation(String location, Job job) throws IOException {
FileInputFormat.setInputPaths(job, location + newpath + "/*");
}
raw = LOAD '/home/raw/report/' USING customLoad(start_ts);

But this gives me an error:

ERROR 1200: <line 7, column 51> mismatched input 'start_ts' expecting RIGHT_PAREN

I wonder what have i done wrong?

Thanks alot