I am new to Hadoop ecosystem. For data ingestion into hive, we are using the sqoop import commands. Data ingestion populates data into staging tables. Now we need to clean up data & insert into the production hive tables. I have written the hive udf to simulate the auto increment feature which works fine in hive shell. Hive query is taking very long to clean up data, generate auto_incremented number. Impala queries are working good. I m wondering if i can use the same hive defined udf in the impala .
Is there any way to use the hive udf in the impala shell to generate the auto_incremented number?
Answer is Yes ..Link
We don't support impala so I suggest to ask this question in CDH community forum.
I highly recommend to stick with Hive and Tez & buckle up for LLAP :)
Java UDF's in Impala work differently than Hive. Since Hive is Java based, it will load all the jars in the Aux directory to the classpath, so any dependency jars are also picked up. Impala backends are C++ based, and it only picks up the specific jar configured when the UDF is defined, so you will need to bundle all dependency jars inside the UDF jar as well so the UDF is self contained in a single jar. There are certain restrictions of using UDFs for Hive in Impala. These restrictions are listed under "Using Hive UDFs with Impala" section in the link .