Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

oozie workflow coordinates hive job want to create a external table with the ​data directory change with the time

Highlighted

oozie workflow coordinates hive job want to create a external table with the ​data directory change with the time

New Contributor

Hi! I want to make a process that takes data from a file (hdfs// /%Y%m%D/%H) that flume creates. The problem is that the data directory change with the time, so when I try to make the oozie workflow coordinator work I get an error.

I follow the instructions in from this article http://blog.cloudera.com/blog/2013/01/how-to-schedule-recurring-hadoop-jobs-with-apache-oozie/ .

The hiveQuery.hql is:

CREATE EXTERNAL TABLE IF NOT EXISTS  default.log_files2 (Id STRING, PortalId int, CampaignId int,BannerAssetId int, BannerAssetFileId int, Time_stamp int, EventType int, Price float,Msisdn STRING,UserId STRING,hitDate STRING, hitTime int)
COMMENT "log_files"
ROW FORMAT DELIMITED FIELDS TERMINATED BY "|"
STORED AS TEXTFILE
LOCATION "${filepath}

And the filepath takes the value:

filepath=${coord:dataIn('logs_input')}

But I endup with an error:

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(mess
age:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: hd
fs://sandbox-hdp.hortonworks.com:8020./$%7Bworkflowfilepath%7D)

I also tried to declare the path as:

filepath=${coord:formatTime(coord:dateOffset(coord:nominalTime(), tzOffset, 'HOUR'), 'yyyyMMdd/HH')}

But then the coordinator doesn't even get submitted.

I wanted to have something like that as an output:

CREATE EXTERNAL TABLE IF NOT EXISTS  default.log_files2 (Id STRING, PortalId int, CampaignId int,BannerAssetId int, BannerAssetFileId int, Time_stamp int, EventType int, Price float,Msisdn STRING,UserId STRING,hitDate STRING, hitTime int)
COMMENT "log_files"
ROW FORMAT DELIMITED FIELDS TERMINATED BY "|"
STORED AS TEXTFILE
LOCATION "hdfs//.../flume/20180925/10" 

So I can submit a coordinator to run this query every hour (and take data from a different file).

Is this possible? Any sagetion?