- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
IllegalArgumentException and Illegal partition for 'val' in sqoop
Created on 07-02-2016 05:40 AM - edited 09-16-2022 03:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I have asked this question over and over and am starting to believe that I am missing out on something really basic here. Not a lot of people seem to have come across this and I am really stuck on this one:
I get this error when I specify merge key argument with incremental import lastmodified in sqoop. If I run the job through command line, it works alright but not when I submit it to oozie. I submit my jobs through oozie. Not sure if oozie is the problem or hue, but sqoop job is not since it really works fine when executed through command line including the merge step.
My sqoop job looks like this:
sqoop job --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop --create test_table -- import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToNull --username USER_NAME --password 'PASSWORD' --table test_table --merge-key id -- split-by id --target-dir LOCATION --incremental lastmodified --last-value 0 --check-column updated_at
The first import works alright .Starting second import I get:
I created a small test table to test with an int, datetime and varchar , without any NULL or invalid chars in the data and yet I faced the same issue:
# id, updated_at, name '1', '2016-07-02 17:16:53', 'l' '3', '2016-06-29 14:12:53', 'f'
There were only 2 rows in the data and yet I got this:
Error: java.lang.IllegalArgumentException at java.nio.ByteBuffer.allocate(ByteBuffer.java:330) at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:51) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1848) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1508) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Error: java.io.IOException: Illegal partition for 3 (-2) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1083) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:715) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.sqoop.mapreduce.MergeMapperBase.processRecord(MergeMapperBase.java:82) at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:58) at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:34) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Error: java.lang.IllegalArgumentException at java.nio.ByteBuffer.allocate(ByteBuffer.java:330) at org.apache.hadoop.mapred.SpillRecord.<init>(SpillRecord.java:51) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.mergeParts(MapTask.java:1848) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1508) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.close(MapTask.java:723) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:793) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) Error: java.io.IOException: Illegal partition for 1 (-2) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1083) at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:715) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.sqoop.mapreduce.MergeMapperBase.processRecord(MergeMapperBase.java:82) at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:58) at org.apache.sqoop.mapreduce.MergeTextMapper.map(MergeTextMapper.java:34) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1693) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
I get this error only in OOZIE and submit the job through HUE and this works just fine including the Merge mapreduce when I run the sqoop job through command line
Taken from oozie launcher, This is what my mapreduce job logs look like:
>>> Invoking Sqoop command line now >>> 5373 [uber-SubtaskRunner] WARN org.apache.sqoop.tool.SqoopTool - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 5407 [uber-SubtaskRunner] INFO org.apache.sqoop.Sqoop - Running Sqoop version: 1.4.6-cdh5.7.0 5702 [uber-SubtaskRunner] WARN org.apache.sqoop.tool.BaseSqoopTool - Setting your password on the command-line is insecure. Consider using -P instead. 5715 [uber-SubtaskRunner] WARN org.apache.sqoop.ConnFactory - $SQOOP_CONF_DIR has not been set in the environment. Cannot check for additional configuration. 5740 [uber-SubtaskRunner] WARN org.apache.sqoop.ConnFactory - Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time. 5754 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Using default fetchSize of 1000 5754 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.CodeGenTool - Beginning code generation 6091 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM test_table AS t WHERE 1=0 6098 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM test_table AS t WHERE 1=0 6118 [uber-SubtaskRunner] INFO org.apache.sqoop.orm.CompilationManager - HADOOP_MAPRED_HOME is /opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/hadoop-mapreduce 8173 [uber-SubtaskRunner] INFO org.apache.sqoop.orm.CompilationManager - Writing jar file: /tmp/sqoop-yarn/compile/454902ac78d49b783a1f51b7bfe0a2be/test_table.jar 8185 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM test_table AS t WHERE 1=0 8192 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Incremental import based on column updated_at 8192 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Lower bound value: '2016-07-02 17:13:24.0' 8192 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Upper bound value: '2016-07-02 17:16:56.0' 8194 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Beginning import of test_table 8214 [uber-SubtaskRunner] INFO org.apache.sqoop.manager.SqlManager - Executing SQL statement: SELECT t.* FROM test_table AS t WHERE 1=0 8230 [uber-SubtaskRunner] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies. 8716 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.DBInputFormat - Using read commited transaction isolation 8717 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.DataDrivenDBInputFormat - BoundingValsQuery: SELECT MIN(id), MAX(id) FROM test_table WHERE ( updated_at >= '2016-07-02 17:13:24.0' AND updated_at < '2016-07-02 17:16:56.0' ) 8721 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.db.IntegerSplitter - Split size: 0; Num splits: 4 from: 1 to: 1 25461 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Transferred 26 bytes in 17.2192 seconds (1.5099 bytes/sec) 25471 [uber-SubtaskRunner] INFO org.apache.sqoop.mapreduce.ImportJobBase - Retrieved 1 records. 25536 [uber-SubtaskRunner] WARN org.apache.sqoop.mapreduce.ExportJobBase - IOException checking input file header: java.io.EOFException 25550 [uber-SubtaskRunner] WARN org.apache.sqoop.mapreduce.JobBase - SQOOP_HOME is unset. May not be able to find all job dependencies. Heart beat Heart beat 70628 [uber-SubtaskRunner] ERROR org.apache.sqoop.tool.ImportTool - Merge MapReduce job failed! 70628 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Saving incremental import state to the metastore 70831 [uber-SubtaskRunner] INFO org.apache.sqoop.tool.ImportTool - Updated data for job: test_table
Created 07-05-2016 06:18 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Sorry for the late response! I am currently on a holiday, and thus can only check this problem occasionally.
It doesn't make sense that the merge job doesn't have a partitioner class, because it does need a reduce phase to merge the records. Could you please:
1) Make sure that you are checking the merge job, not the import job.
2) Please check the number of reducers of the failed merge job
3) Please try to add parameter "-Dmapred.reduce.tasks=1" to your Sqoop import job and see whether it helps
4) If nothing above helps, let's try to narrow down the problem a bit by removing the impact of Sqoop metastore. Please run the command directly in Oozie Sqoop action instead of storing it in metastore. The command should be something like below (I have removed the --driver parameter as well to use the MySQL connector instead of the generic JDBC connector, although it shouldn't make any difference for our problem).
sqoop import -Dmapred.reduce.tasks=1 --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToNull --username USER_NAME --password 'PASSWORD' --table test_table --merge-key id -- split-by id --target-dir LOCATION --incremental lastmodified --last-value 0 --check-column updated_at
When I have some time, I will try to reproduce your problem. Please let me know the CDH and CM version you are using.
Created 07-03-2016 07:26 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hi sim6,
Thanks for reporting this problem!
This problem is weird. The first thing I have noticed is that we are using Uber mode with Oozie Launchers. Could you please try to disable it and see whether it helps? To do this, please set below in oozie-site.xml of Oozie Server:
<property> <name>oozie.action.launcher.mapreduce.job.ubertask.enable</name> <value>false</value> </property>
If you are using Cloudera Manager, please set them in Oozie server safety valve and then restart Oozie server. If you are not using CM, please manually add it to oozie-site.xml and restart Oozie Server.
If you still have this problem after disabling Oozie Launcher uber mode, please try to collect the configuration of one failed and one succeeded Sqoop Merge job. The stack trace shows that the mapper output collector cannot write the data because it cannot find the partition for the given output key (1 and 3 for your data). I am suspecting that the partitioner passed to the jobs are different.
Created on 07-03-2016 10:09 PM - edited 07-03-2016 10:33 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Thanks. I feel a bit relaxed after hearing something on this.I am usign CM and I tried setting uber mode off. It still has the same problem. 😞 i haven't seen any of my merge jobs succeeding .I have tried it with different data from different databases as well but never did it work right. @yshi
Created 07-04-2016 12:39 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
This is really weird. Could you please upload the diagnostic data of one failed merge job together with one successful job?
To get the diagnostic data of a job, please go to CM > YARN > Applications, find the job, click the buttom to its right and choose "Collect Diagnostic Data". In the popout diaglogue, please confirm the operation and download the result data after it finishes.
Created on 07-04-2016 12:47 AM - edited 07-04-2016 12:50 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hey, Tried collecting the diagnostics. I got an error: @yshi
Path: http://warehouse.swtched.com:7180/cmf/services/51/yarnDiagnosticsCollection
Version: Cloudera Express 5.7.0 (#76 built by jenkins on 20160401-1334 git: ec0e7e69444280aa311511998bd83e8e6572f61c)
at YarnController.java line 463
in com.cloudera.server.web.cmf.YarnController collectYarnApplicationDiagnostics()
- YarnController.java line 463
in com.cloudera.server.web.cmf.YarnController collectYarnApplicationDiagnostics() - <generated> line -1
in com.cloudera.server.web.cmf.YarnController$$FastClassByCGLIB$$ac91d355 invoke() - MethodProxy.java line 191
in net.sf.cglib.proxy.MethodProxy invoke() - Cglib2AopProxy.java line 688
in org.springframework.aop.framework.Cglib2AopProxy$CglibMethodInvocation invokeJoinpoint() - ReflectiveMethodInvocation.java line 150
in org.springframework.aop.framework.ReflectiveMethodInvocation proceed() - MethodSecurityInterceptor.java line 61
in org.springframework.security.access.intercept.aopalliance.MethodSecurityInterceptor invoke() - ReflectiveMethodInvocation.java line 172
in org.springframework.aop.framework.ReflectiveMethodInvocation proceed() - Cglib2AopProxy.java line 621
in org.springframework.aop.framework.Cglib2AopProxy$DynamicAdvisedInterceptor intercept() - <generated> line -1
in com.cloudera.server.web.cmf.YarnController$$EnhancerByCGLIB$$f035552a collectYarnApplicationDiagnostics() - NativeMethodAccessorImpl.java line -2
in sun.reflect.NativeMethodAccessorImpl invoke0() - NativeMethodAccessorImpl.java line 57
in sun.reflect.NativeMethodAccessorImpl invoke() - DelegatingMethodAccessorImpl.java line 43
in sun.reflect.DelegatingMethodAccessorImpl invoke() - Method.java line 606
in java.lang.reflect.Method invoke() - HandlerMethodInvoker.java line 176
in org.springframework.web.bind.annotation.support.HandlerMethodInvoker invokeHandlerMethod() - AnnotationMethodHandlerAdapter.java line 436
in org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter invokeHandlerMethod() - AnnotationMethodHandlerAdapter.java line 424
in org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter handle() - DispatcherServlet.java line 790
in org.springframework.web.servlet.DispatcherServlet doDispatch() - DispatcherServlet.java line 719
in org.springframework.web.servlet.DispatcherServlet doService() - FrameworkServlet.java line 669
in org.springframework.web.servlet.FrameworkServlet processRequest() - FrameworkServlet.java line 585
in org.springframework.web.servlet.FrameworkServlet doPost() - HttpServlet.java line 727
in javax.servlet.http.HttpServlet service() - HttpServlet.java line 820
in javax.servlet.http.HttpServlet service() - ServletHolder.java line 511
in org.mortbay.jetty.servlet.ServletHolder handle() - ServletHandler.java line 1221
in org.mortbay.jetty.servlet.ServletHandler$CachedChain doFilter() - UserAgentFilter.java line 78
in org.mortbay.servlet.UserAgentFilter doFilter() - GzipFilter.java line 131
in org.mortbay.servlet.GzipFilter doFilter() - ServletHandler.java line 1212
in org.mortbay.jetty.servlet.ServletHandler$CachedChain doFilter() - JAMonServletFilter.java line 48
in com.jamonapi.http.JAMonServletFilter doFilter() - ServletHandler.java line 1212
in org.mortbay.jetty.servlet.ServletHandler$CachedChain doFilter() - JavaMelodyFacade.java line 109
in com.cloudera.enterprise.JavaMelodyFacade$MonitoringFilter doFilter() - ServletHandler.java line 1212
in org.mortbay.jetty.servlet.ServletHandler$CachedChain doFilter() - FilterChainProxy.java line 311
in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() - FilterSecurityInterceptor.java line 116
in org.springframework.security.web.access.intercept.FilterSecurityInterceptor invoke() - FilterSecurityInterceptor.java line 83
in org.springframework.security.web.access.intercept.FilterSecurityInterceptor doFilter() - FilterChainProxy.java line 323
in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() - ExceptionTranslationFilter.java line 113
in org.springframework.security.web.access.ExceptionTranslationFilter doFilter() - FilterChainProxy.java line 323
in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() - SessionManagementFilter.java line 101
in org.springframework.security.web.session.SessionManagementFilter doFilter() - FilterChainProxy.java line 323
in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() - AnonymousAuthenticationFilter.java line 113
in org.springframework.security.web.authentication.AnonymousAuthenticationFilter doFilter() - FilterChainProxy.java line 323
in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() - RememberMeAuthenticationFilter.java line 146
in org.springframework.security.web.authentication.rememberme.RememberMeAuthenticationFilter doFilter() - FilterChainProxy.java line 323
in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() - SecurityContextHolderAwareRequestFilter.java line 54
in org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter doFilter() - FilterChainProxy.java line 323
in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() - RequestCacheAwareFilter.java line 45
in org.springframework.security.web.savedrequest.RequestCacheAwareFilter doFilter() - FilterChainProxy.java line 323
in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() - AbstractAuthenticationProcessingFilter.java line 182
in org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter doFilter() - FilterChainProxy.java line 323
in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() - LogoutFilter.java line 105
in org.springframework.security.web.authentication.logout.LogoutFilter doFilter() - FilterChainProxy.java line 323
in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() - SecurityContextPersistenceFilter.java line 87
in org.springframework.security.web.context.SecurityContextPersistenceFilter doFilter() - FilterChainProxy.java line 323
in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() - ConcurrentSessionFilter.java line 125
in org.springframework.security.web.session.ConcurrentSessionFilter doFilter() - FilterChainProxy.java line 323
in org.springframework.security.web.FilterChainProxy$VirtualFilterChain doFilter() - FilterChainProxy.java line 173
in org.springframework.security.web.FilterChainProxy doFilter() - DelegatingFilterProxy.java line 237
in org.springframework.web.filter.DelegatingFilterProxy invokeDelegate() - DelegatingFilterProxy.java line 167
in org.springframework.web.filter.DelegatingFilterProxy doFilter() - ServletHandler.java line 1212
in org.mortbay.jetty.servlet.ServletHandler$CachedChain doFilter() - CharacterEncodingFilter.java line 88
in org.springframework.web.filter.CharacterEncodingFilter doFilterInternal() - OncePerRequestFilter.java line 76
in org.springframework.web.filter.OncePerRequestFilter doFilter() - ServletHandler.java line 1212
in org.mortbay.jetty.servlet.ServletHandler$CachedChain doFilter() - ServletHandler.java line 399
in org.mortbay.jetty.servlet.ServletHandler handle() - SecurityHandler.java line 216
in org.mortbay.jetty.security.SecurityHandler handle() - SessionHandler.java line 182
in org.mortbay.jetty.servlet.SessionHandler handle() - SecurityHandler.java line 216
in org.mortbay.jetty.security.SecurityHandler handle() - ContextHandler.java line 767
in org.mortbay.jetty.handler.ContextHandler handle() - WebAppContext.java line 450
in org.mortbay.jetty.webapp.WebAppContext handle() - HandlerWrapper.java line 152
in org.mortbay.jetty.handler.HandlerWrapper handle() - StatisticsHandler.java line 53
in org.mortbay.jetty.handler.StatisticsHandler handle() - HandlerWrapper.java line 152
in org.mortbay.jetty.handler.HandlerWrapper handle() - Server.java line 326
in org.mortbay.jetty.Server handle() - HttpConnection.java line 542
in org.mortbay.jetty.HttpConnection handleRequest() - HttpConnection.java line 945
in org.mortbay.jetty.HttpConnection$RequestHandler content() - HttpParser.java line 756
in org.mortbay.jetty.HttpParser parseNext() - HttpParser.java line 218
in org.mortbay.jetty.HttpParser parseAvailable() - HttpConnection.java line 404
in org.mortbay.jetty.HttpConnection handle() - SelectChannelEndPoint.java line 410
in org.mortbay.io.nio.SelectChannelEndPoint run() - QueuedThreadPool.java line 582
in org.mortbay.thread.QueuedThreadPool$PoolThread run()
Created 07-04-2016 04:40 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Hello Sim6,
My understanding (will confirm) is as follows:
- The --merge-key option is part of the Sqoop1 merge command and not available to be used with the Sqoop1 import command
- The Sqoop1 merge command is intented to run after (separate job) to flatten the two datasets that exist in HDFS as a result of multiple Sqoop1 import commands (commonly created with --incremental however not exclusively)
If you remove "--merge-key id" from your command, does it work without error?
Thank you,
Markus Kemper
Created on 07-04-2016 04:47 AM - edited 07-04-2016 05:34 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Markus Kemper: That's right. But then we already have the file in target dir when running the import for second time. It throws error that either import using incremental append or specify merge key when target dir already exists and using incremental lastmodified.
Using sqoop merge command in a seprate job sounds okay but I would like to have --merge-key to work when it's available 😄 .
Also, the link https://community.hortonworks.com/questions/10710/sqoop-incremental-import-working-fine-now-i-want-k... suggests to use --merge-key with import command only and has been confirmed that it works.
This link herehttp://stackoverflow.com/questions/34400973/apache-sqoop-incremental-import?rq=1 also confirms that it works
Also, it works when I run the sqoop job through CLI. It gives this problem only when running it through oozie
Created 07-04-2016 05:28 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
@Markus Kemper Although not documented, we can use "--merge-key" in Sqoop import command if "--incremental lastmodified" is used. If the target dir is not empty, the Sqoop import tool will automatically run a merge job after the import job is finished to merge the data. Please see below source code for details:
I have just realized that you are express edition of Cloudera Manager. It may not have the function of collecting diagnostic data of jobs.
Please help manually check:
1) Please add parameter "--verbose" and "-Dmapreduce.map.log.level=DEUBG" to your Sqoop import command, run the sqoop job and. After it failed, please upload ALL the sqoop command output and the failed MR job logs (use command "yarn logs -applicationId=<app_id>" to get it)
2) The value of property "mapreduce.job.partitioner.class" in both successful and failed jobs
3) The value of property "sqoop.merge.key.col" in both successful and failed jobs
Created 07-04-2016 05:56 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
To enable the logs, I added it to my job :
sqoop job -Dmapreduce.map.log.level=DEUBG --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop --verbose --create test_table -- import --driver com.mysql.jdbc.Driver --connect jdbc:mysql://ip3/testdb?zeroDateTimeBehavior=convertToNull --username root --password 'password' --table test_table --merge-key id --split-by id --target-dir location --incremental lastmodified --last-value 0 --check-column updated_at
But trying to get the logs, I still get:
/tmp/logs/root/logs/application_1463739226103_5116does not exist.
Log aggregation has not completed or is not enabled.
As a sidenote: I have also tried to change time zone of sqoop job in similar manner using -D option and that did not work either. Am I adding it in an incorrect way to the sqoop job? @yshi
Created 07-04-2016 06:14 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Could you please let me know under which user you submitted the oozie Sqoop job? Is it the same user as you ran the job on command line? If you are using different users for Oozie and command line jobs, please try to switch to the same user and see whether it brings any change.
As for the application logs, we may need to specify the application owner to be able to get it. Please try below command:
sudo -u mapred yarn logs -applicationId <app_id> -appOwner <owner>
