Created 01-18-2016 10:31 AM
Hi everyone,
I just started working with Hadoop a few weeks ago and already accoutered an issue with the date format that I am not able to solve even so it is probably quite simple.
The input looks as follows:
102,2009-10-08 12:00:00,3,3000100,2009-10-08 15:11:00,3,1500101,2009-11-20 23:59:00,2,1560103,2008-05-20 01:00:00,4,2060
The code I am using:
a = LOAD '/user/xyz/Orders2.txt' USING PigStorage(',') as (id:int, date:chararray,kid:int,volume:double);b = Foreach a Generate ToDate(date, 'yyyy/MM/dd HH:mm:ss') as dateString;DUMP b;
After the execution I am retrieving the following message:
2016-01-18 11:14:07,743 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2016-01-18 11:14:07,766 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias e. Backend error : org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.ToDate2ARGS)[datetime] - scope-4 Operator Key: scope-4) children: null at []]: java.lang.IllegalArgumentException: Invalid format: "2009-10-08 12:00:00" is malformed at "-10-08 12:00:00"
Do I have to use a UDF to fix the issue or can it be solved using simple pig commands?
I am grateful for any advice.
Created 01-18-2016 12:26 PM
Thanks for sharing the data format. I was able to reproduce it
2016-01-18 12:20:55,528 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!2016-01-18 12:20:55,537 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias b. Backend error : org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.ToDate2ARGS)[datetime] - scope-4 Operator Key: scope-4) children: null at []]: java.lang.IllegalArgumentException: Invalid format: "2009-10-08 12:00:00" is malformed at "-10-08 12:00:00"
Details at logfile: /home/hdfs/pig_1453119614848.log
grunt>
It worked with this
b = Foreach a Generate ToDate(date, 'yyyy-MM-dd HH:mm:ss') as dateString;
2016-01-18 12:25:57,495 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 2016-01-18 12:25:57,496 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050 2016-01-18 12:25:57,500 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-01-18 12:25:57,604 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 2016-01-18 12:25:57,604 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050 2016-01-18 12:25:57,609 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-01-18 12:25:57,647 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! 2016-01-18 12:25:57,650 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2016-01-18 12:25:57,663 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2016-01-18 12:25:57,664 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(2009-10-08T12:00:00.000Z)
grunt> b = Foreach a Generate ToDate(date, 'yyyy-MM-dd HH:mm:ss') as dateString;
Created 01-18-2016 12:26 PM
Thanks for sharing the data format. I was able to reproduce it
2016-01-18 12:20:55,528 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed!2016-01-18 12:20:55,537 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias b. Backend error : org.apache.pig.backend.executionengine.ExecException: ERROR 0: Exception while executing [POUserFunc (Name: POUserFunc(org.apache.pig.builtin.ToDate2ARGS)[datetime] - scope-4 Operator Key: scope-4) children: null at []]: java.lang.IllegalArgumentException: Invalid format: "2009-10-08 12:00:00" is malformed at "-10-08 12:00:00"
Details at logfile: /home/hdfs/pig_1453119614848.log
grunt>
It worked with this
b = Foreach a Generate ToDate(date, 'yyyy-MM-dd HH:mm:ss') as dateString;
2016-01-18 12:25:57,495 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 2016-01-18 12:25:57,496 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050 2016-01-18 12:25:57,500 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-01-18 12:25:57,604 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://sandbox.hortonworks.com:8188/ws/v1/timeline/ 2016-01-18 12:25:57,604 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at sandbox.hortonworks.com/10.0.2.15:8050 2016-01-18 12:25:57,609 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=SUCCEEDED. Redirecting to job history server 2016-01-18 12:25:57,647 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! 2016-01-18 12:25:57,650 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2016-01-18 12:25:57,663 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2016-01-18 12:25:57,664 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1
(2009-10-08T12:00:00.000Z)
grunt> b = Foreach a Generate ToDate(date, 'yyyy-MM-dd HH:mm:ss') as dateString;
Created 01-18-2016 01:22 PM
Hi Neeraj,
thank you for the quick reply and help. It worked and I was able to convert the format.
Big thanks. 🙂
Created 07-23-2016 07:20 PM
How to insert data into hive table in a particular date format (DD/MM/YY) from the below hive table
1904287 | Christopher Rodriguez | Jan 11, 2003 | |
96391595 | Thomas Stewart | 6/17/1969 | |
2236067 | John Nelson | 08/22/54 |