Member since
06-23-2014
17
Posts
0
Kudos Received
1
Solution
My Accepted Solutions
Title | Views | Posted |
---|---|---|
4020 | 09-09-2014 04:36 AM |
03-20-2019
05:07 AM
Any solution to this issue? using --append in place of --lastmodified is not the correct solution as it won't update the record but create new record in hive. --delete-target-dir defeats the purpose to update data as it will create new directory everytime which is same as importing entire source table into hdfs-hive everytime. I tried using --merge-key but it gives following error: 19/03/20 07:07:41 ERROR tool.ImportTool: Import failed: java.io.IOException: Could not load jar /tmp/sqoop-gfctwnsg/compile/c63dd58c7ae7aa383d4fe8e795fd8604/FRESH.EMPLOYEERUSHI.jar into JVM. (Could not find class FRESH.EMPLOYEERUSHI.) at org.apache.sqoop.util.ClassLoaderStack.addJarFile(ClassLoaderStack.java:92) at com.cloudera.sqoop.util.ClassLoaderStack.addJarFile(ClassLoaderStack.java:36) at org.apache.sqoop.tool.ImportTool.loadJars(ImportTool.java:120) at org.apache.sqoop.tool.ImportTool.lastModifiedMerge(ImportTool.java:456) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:522) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252) Caused by: java.lang.ClassNotFoundException: FRESH.EMPLOYEERUSHI at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:814) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.sqoop.util.ClassLoaderStack.addJarFile(ClassLoaderStack.java:88) My sqoop command is as follows: sqoop import \ --connect "jdbc:oracle:thin:@oraasmwd17-scan.nam.nsroot.net:8889/GENIFRD" \ --username FRESH \ --password C1T12016 \ --table FRESH.EMPLOYEERUSHI \ --merge-key id \ --target-dir /data/gfctwnsg/staging/hive/gfctwnsg_staging/rp86813/sqoopimportdir \ --incremental lastmodified \ --check-column MODIFIED_DATE \ --last-value '2019-03-20 06:43:59.0' \ My source Oracle table is as follows: 1 Rushi Pradhan engineer 30000 18-MAR-19 2 abc xyz doctor 20000 18-MAR-19 I changed the salary of id =1 and updated corresponding date manually. Now I want to reflect this change at hive end also. But there it is not allowing me to update the record but to only append.
... View more
09-09-2014
04:36 AM
So, I managed to fix my problem. The first hint was the GC overhead limit exceeded message. I quickly found out that this can be cause by lack of heapspace for the JVM. After digging a bit into the YARN configuration in Cloudera Manager, and comparing it to the setting in an Amazon Elastic Mapreduce cluster (where my Pig scripts did work), I found out that, even though each node had 30GB of memory, most YARN components had very low heapspace settings. I updated the heapspace for the NodeManagers, ResourceManager and Containers and I also set the max heapspace for mappers and reducers somewhat higher, keeping in mind the total amount of memory available on each node (and the other services running there, like Impala) and now my Pig scripts work again! Two issues I want to mention in case a Cloudera engineer reads this: I find it a bit strange that Cloudera Manager doesn't set saner heapspace amounts, based on the total amount of RAM available The fact that not everything runs under YARN yet, makes it harder to manage memory. You actually have to manage memory manually. If Impala would run under YARN, there would be less memory management I think 🙂
... View more
08-12-2014
01:04 PM
Thanks Marcel! That seems to work indeed, at least with Tableau and Impyla. Apparently the instructions on the Amazon website regarding setting up a tunnel, don't work that well. I'm gonna try out tomorrow if this tunnel also works with Squirrel and other generic JDBC DB tools.
... View more