About Daan

ruship · ‎03-20-2019

Any solution to this issue? using --append in place of --lastmodified is not the correct solution as it won't update the record but create new record in hive. --delete-target-dir defeats the purpose to update data as it will create new directory everytime which is same as importing entire source table into hdfs-hive everytime. I tried using --merge-key but it gives following error: 19/03/20 07:07:41 ERROR tool.ImportTool: Import failed: java.io.IOException: Could not load jar /tmp/sqoop-gfctwnsg/compile/c63dd58c7ae7aa383d4fe8e795fd8604/FRESH.EMPLOYEERUSHI.jar into JVM. (Could not find class FRESH.EMPLOYEERUSHI.) at org.apache.sqoop.util.ClassLoaderStack.addJarFile(ClassLoaderStack.java:92) at com.cloudera.sqoop.util.ClassLoaderStack.addJarFile(ClassLoaderStack.java:36) at org.apache.sqoop.tool.ImportTool.loadJars(ImportTool.java:120) at org.apache.sqoop.tool.ImportTool.lastModifiedMerge(ImportTool.java:456) at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:522) at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:621) at org.apache.sqoop.Sqoop.run(Sqoop.java:147) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234) at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243) at org.apache.sqoop.Sqoop.main(Sqoop.java:252) Caused by: java.lang.ClassNotFoundException: FRESH.EMPLOYEERUSHI at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:814) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.sqoop.util.ClassLoaderStack.addJarFile(ClassLoaderStack.java:88) My sqoop command is as follows: sqoop import \ --connect "jdbc:oracle:thin:@oraasmwd17-scan.nam.nsroot.net:8889/GENIFRD" \ --username FRESH \ --password C1T12016 \ --table FRESH.EMPLOYEERUSHI \ --merge-key id \ --target-dir /data/gfctwnsg/staging/hive/gfctwnsg_staging/rp86813/sqoopimportdir \ --incremental lastmodified \ --check-column MODIFIED_DATE \ --last-value '2019-03-20 06:43:59.0' \ My source Oracle table is as follows: 1 Rushi Pradhan engineer 30000 18-MAR-19 2 abc xyz doctor 20000 18-MAR-19 I changed the salary of id =1 and updated corresponding date manually. Now I want to reflect this change at hive end also. But there it is not allowing me to update the record but to only append.

Daan · ‎10-15-2014

I noticed! Very cool development!

Daan · ‎09-09-2014

So, I managed to fix my problem. The first hint was the GC overhead limit exceeded message. I quickly found out that this can be cause by lack of heapspace for the JVM. After digging a bit into the YARN configuration in Cloudera Manager, and comparing it to the setting in an Amazon Elastic Mapreduce cluster (where my Pig scripts did work), I found out that, even though each node had 30GB of memory, most YARN components had very low heapspace settings. I updated the heapspace for the NodeManagers, ResourceManager and Containers and I also set the max heapspace for mappers and reducers somewhat higher, keeping in mind the total amount of memory available on each node (and the other services running there, like Impala) and now my Pig scripts work again! Two issues I want to mention in case a Cloudera engineer reads this: I find it a bit strange that Cloudera Manager doesn't set saner heapspace amounts, based on the total amount of RAM available The fact that not everything runs under YARN yet, makes it harder to manage memory. You actually have to manage memory manually. If Impala would run under YARN, there would be less memory management I think 🙂

Daan · ‎08-12-2014

Thanks Marcel! That seems to work indeed, at least with Tableau and Impyla. Apparently the instructions on the Amazon website regarding setting up a tunnel, don't work that well. I'm gonna try out tomorrow if this tunnel also works with Squirrel and other generic JDBC DB tools.

Online	Offline
Last Visited	‎06-08-2015 12:29 PM

Member Since	‎06-23-2014 04:19 AM
Last Visited	‎06-08-2015 12:29 PM
Posts	17

Cloudera Community

Re: Pig memory

Re: Sqoop incremental: Output directory already ex...

Re: Install CDH5 on EC2 without human interaction

Re: Pig memory

Re: Can't connect to Impala through JDBC on Amazon...