I have 2 Impala tables. 1st table T1 (additional columns are there but I am interested in only date and day type as weekday): date day_type
04/06/2020 Weekday 2nd table T2: process date status
A 04/01/2020 finished
A 04/02/2020 finished
A 04/03/2020 run_again Using impala queries i have to get the maximum date from second table T2 and get its status. According to the above table 04/03 is the maximum date. If the status is finished on 04/03, then my query should return the next available weekday date from T1 which is 04/06/2020. But if the status is run_again, then the query should return the same date. In the above table, 04/03 has run_again and when my query runs the output should be 04/03/2020 and not 04/06/2020. What I tried so far: I ran a subquery from second table and got the maximum date and its status. i tried to run a case in my main query and gave t1 as subselect in Case statement but its not working. Looks like case statement does not allow a select statement within it. Is it possible to achieve this through Impala query?
... View more
I am trying to import data from Oracle to HDFS using sqoop java 1.4.6
My hadoop version 2.6.0-cdh 5.14.4
Sqoop version 1.4.6-cdh 5.14.4
Dependencies i have used in pom:
sqoop 1.4.6-cdh 5.14.4
hadoop-mapreduce-client-jobclient 2.6.0-cdh 5.14.4
hadoop-mapreduce-client-common 2.6.0-cdh 5.14.4
kite-data-mapreduce 1.0.0-cdh 5.14.4
kite-data-code 1.0.0-cdh 5.14.4
kite-hadoop-compatibility 1.0.0-cdh 5.14.4
The Error I am getting while writing as parquet file:
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.RuntimeException: java.lang.ClassNotFoundException: class org.kitesdk.data.mapreduce.DatasetKeyOutputFormat not found
Please note that I am able to write it to output when I pass argument as textfile (--textfile).
The Error is only encountered when i try to write it as parquetfile (--parquetfile).
Kindly let me know how to correct this.
... View more