About daniel_perry

daniel_perry · ‎07-30-2018

Hi Nikhil, Can you perform any commands on that table? Perhaps try dropping the partition. It seems that the data was removed at some point on HDFS but the hive tables metadata still thinks those partitions exist ALTER TABLE *tableName* drop if exists PARTITION(date_part="2018022313");

daniel_perry · ‎04-11-2018

daniel_perry · ‎10-13-2017

Thought as much, thanks for confirming

daniel_perry · ‎10-13-2017

Hi all, Sorry for the basic question i've had little success searching online, i just need clarification whether something is possible. Can i run a beeline command against a file that is in HDFS. I know we use -f in beeline to specify a file when its on the local file system but can this also be done against a file on HDFS. My use case is that i'd like to run a beeline command through a shell action in Oozie. I'm hitting some issues using Hive2 actions so i wanted to try using a shell action instead Any help is much appreciated, Thanks

daniel_perry · ‎09-15-2017

Hi all, I was hoping someone could confirm whether something i'm trying to do is possible because i'm currently hitting multiple issues. I would like to add a record into a hive table using and insert statement, within this insert statement I have one column which should add a count value to the table based off of the result of a query. My Hive SQL is below.... use ${database}; set hivevar:deltaCount = select count(*) from ${database}.${hive_table}; DROP TABLE IF EXISTS ${database}.process_status_stg_${hive_table}; create table ${database}.process_status_stg_${hive_table} ( taskName varchar(50) COMMENT 'Name of the task being run to populate data', starttime varchar(50) COMMENT 'time of record addition', status varchar(50) COMMENT 'status of the task', workflowID varchar(50) COMMENT 'workflow ID that is running the task', oozieErrorCode varchar(50) COMMENT 'Error code returned by Oozie', recordsLoadedCount varchar(50) COMMENT 'records pulled in previous load') ; insert into table ${database}.process_status_stg_${hive_table} values ('${hive_table}','${current_time}','${taskStatus}','${workflowID}','${errorCode}', (CASE ${taskStatus} WHEN 'COMPLETED' THEN '${hiveconf:deltaCount}' ELSE 'N/A' end as recordsLoadedCount)); Any help is much appreciated, Thanks

daniel_perry · ‎06-13-2016

Thanks for the response Ben. The changes you have suggested have worked to a degree and I know due to the use of partitions that there should be no further degradation to the speed of the query, but this query can still take up to a minute to complete. I shall continue to look into other solutions to this issue and post them if I find any.

daniel_perry · ‎06-08-2016

Hi all, I am currently pulling the max value of a timestamp column from my tables in Hive and using this to pull data after this date using Sqoop, i am using Oozie in order to perform these steps. This is currently done by running a query against the Hive table to put this value into hdfs and then this is picked up in another Oozie action before being passed to the Sqoop action. This all run perfectly fine, however retrieving the Max timestamp value and putting this into HDFS is currently very slow and I can only see this getting slower as more data is inserted into the table. The Hive SQL I am using to pull this value is as below: INSERT OVERWRITE DIRECTORY '${lastModifiedDateSaveLocation}' select max(${timestamp_column}) from ${hive_table}; Can anyone suggest a more optimized solution to retrieve this max timestamp? Thanks for your help, Dan

daniel_perry · ‎06-03-2016

I am trying to obtain the date for which the falcon process is running in my oozie workflow. Any idea on how this can be passed to the workflow or obtained directly? Any help is much appreciated.

daniel_perry · ‎04-15-2016

I ended up going with your approach Ben as it suited what I was trying to do a bit better and after much fiddling around I managed to get it working. However, I am getting the value from my query back like this lastModified=+------------------------+--+ | 2016-03-31 21:59:57.0 | +------------------------+--+ Whereas all I really want is the date value not the extra jargon, is this something i can use regex for to get rid of? Thanks

daniel_perry · ‎04-12-2016

Hi all, I was hoping someone might be able to detail whether what I am attempting to do is currently possible in Oozie and if so how it could be done. I have seen many sources about getting an output from a shell action and inputting it into a Hive action I have however not seen much on whether this can be done the other way around. So my issue is that I would like to run a hive action which will capture the most recent field in a table based off of the max timestamp. I would then like to pass this timestamp value over to a shell action which will take this value and put it in the where statement for a Sqoop Extract. How would I go about passing this value from the Hive action to the Shell action? Is this possible? Please let me know if you need any additional information, thanks in advance.

Online	Offline
Last Visited	‎12-10-2019 07:59 AM

Member Since	‎01-28-2016 05:08 PM
Last Visited	‎12-10-2019 07:59 AM
Posts	38
Kudos received	14

Cloudera Community

Re: Convert sqoop job upper bound and lower bound ...

Re: Hive query error on partition table file not ...

What is causing Hive tables to become locked?

Re: How to run beeline command against file on HDF...

How to run beeline command against file on HDFS?

Insert result of select statement into a Hive tabl...

Re: What is the fastest way to get the most recent...

What is the fastest way to get the most recent rec...

How can we pass or obtain the nominal time from fa...

Re: Capture output from Hive action and use that a...

Capture output from Hive action and use that as in...