Support Questions
Find answers, ask questions, and share your expertise
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

LoadFunction UDF PIG


LoadFunction UDF PIG

New Contributor

Hi guys,

I need to create a UDF that defines custom load location for example:

before attempting UDF i tried to do parameter substitution inside of pig script which does not work:


time = LOAD 'hdfs:/home/raw/report/last_process_time/part-r-00000' AS DATE;
start_ts = foreach time generate startTS(DATE);
raw = LOAD '/home/raw/report/$END' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS (json:map[]);
run -param PATH='/home/raw/reports/$END/*' hdfs:/home/raw/pig-script/update_test.pig

expecting PATH would become the content of start_ts.

so here's an attempt to a solution that i have in mind:
- creating a customLoad() UDF that accept a tuple as input:

public customLoad(Tuple input) throws ExecException {
String str = input.get(0).toString();
Date date = new Date(((Long.parseLong(str) * 1000)) + (60 * 60 * 1000));
SimpleDateFormat sdf = new SimpleDateFormat("YYYY/MM/dd/HH");
newpath = sdf.format(date);

and updating path's location assuming default location is /home/raw/report

public void setLocation(String location, Job job) throws IOException {
FileInputFormat.setInputPaths(job, location + newpath + "/*");
raw = LOAD '/home/raw/report/' USING customLoad(start_ts);

But this gives me an error:

ERROR 1200: <line 7, column 51> mismatched input 'start_ts' expecting RIGHT_PAREN

I wonder what have i done wrong?

Thanks alot


Don't have an account?
Coming from Hortonworks? Activate your account here