About yshi

yshi · ‎07-19-2016

@sim6 Sorry for the delay in response! Based on below 2 parameters: --driver com.mysql.jdbc.Driver --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToN It looks like that you are importing data from MySQL instead of Oracle. As a result, the workaround mentioned in another thread (-D oracle.sessionTimeZone=GMT) won't work, because it is actaully an Oracle database parameter. Sqoop just passes it to Oracle JDBC driver. I am not very familiar with MySQL JDBC driver, but according to its document[1], maybe we can achieve your goal with JDBC parameter "useTimezone". To use GMT time, please try below command: sqoop job --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop \ --create JOB_NAME -- import --driver com.mysql.jdbc.Driver \ --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToull&useTimezone=false \ --username root --password 'PASSWORD' --table TABLE_NAME \ --incremental lastmodified --check-column updated_at --last-value 0 \ --merge-key entity_id --split-by entity_id \ --target-dir LOCATION_SPECIFIED --hive-database Magento \ --hive-drop-import-delims --null-string '\\N' \ --null-non-string '\\N' --fields-terminated-by '\001' \ --input-null-string '\\N' --input-null-non-string '\\N' \ --input-null-non-string '\\N' --input-fields-terminated-by '\001' To use your local time, please try below command: sqoop job --meta-connect jdbc:hsqldb:hsql://FQDN:16000/sqoop \ --create JOB_NAME -- import --driver com.mysql.jdbc.Driver \ --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToull&useTimezone=true&serverTimezone=GMT \ --username root --password 'PASSWORD' --table TABLE_NAME \ --incremental lastmodified --check-column updated_at --last-value 0 \ --merge-key entity_id --split-by entity_id \ --target-dir LOCATION_SPECIFIED --hive-database Magento \ --hive-drop-import-delims --null-string '\\N' \ --null-non-string '\\N' --fields-terminated-by '\001' \ --input-null-string '\\N' --input-null-non-string '\\N' \ --input-null-non-string '\\N' --input-fields-terminated-by '\001' Notice: 1) Please use the latest MySQL jdbc driver: 5.1.34 2) Please make sure that the timezone used in database is GMT And please contact your MySQL team to seek some suggestions from them. This is more a MySQL problem than a Sqoop problem.

yshi · ‎07-19-2016

@sim6 Sorry, I am very busy these days to catch up with my work after the holiday. I will find some time to look at the other problem tomorrow.

yshi · ‎07-14-2016

@Vinod369 Sqoop doesn't directly support JSON file format. The only way I can think of importing data as JSON to utilise the hcatalog import function. This can be done in below formats: 1) Create a table in hcatalog with JSON serde. Below is just an example: hcat -e "create table json_test (id int, value string) row format serde 'org.apache.hive.hcatalog.data.JsonSerDe'" 2) Use Sqoop to import data into that table: sqoop import\ --connect 'jdbc:mysql://nightly57-unsecure-1.gce.cloudera.com:3306/sqoop1'\ --username sqoop1 --password cloudera --table test -m 1\ --hcatalog-database default --hcatalog-table json_test 3) Check the imported data: [root@nightly57-unsecure-1 ~]# hadoop fs -ls /user/hive/warehouse/json_test Found 1 items -rw-rw-rw- 3 root hive 46 2016-07-14 18:35 /user/hive/warehouse/json_test/part-m-00000 [root@nightly57-unsecure-1 ~]# hadoop fs -cat /user/hive/warehouse/json_test/part-m-00000 {"id":1,"value":"abcd"} {"id":2,"value":"cd"} The downside of this method is that you will have to create a Hive/HCatalog table before the import. You will have to clean it if it is not need in future. Please let me know if you have any further question.

yshi · ‎07-11-2016

@sim6 I cannot reproduce your problem on my local cluster. Not sure how this happened to you. Please remember to increase the number of reducers when your data grow. 1 reducer is definitely not enough for big tables. I will have a look at the other problem you asked here when I have some time.

yshi · ‎07-07-2016

@sim6 I am not 100% sure how my solutions works at this moment. And because I am on holiday, I cannot reproduce this problem to figure out the exact root cause. I worked out this workaround with below observations: 1) Sqoop import job has no reducers, but Sqoop merge job does have merge job 2) According to all the information you have provided, your Oozie Sqoop job seems to fail because lack of partition class definition in job configuration. 3) I was suspecting that Oozie Sqoop reuse the jobConf object between Sqoop import and merge job, and thus accidentally set the number of reducers in merge job to 0. By explicitly setting the number of reducers to 1, we might be able to workaround this problem. And we are lucky that this does work. 🙂 I will later try to reproduce this problem and check whether my theory is correct or not.

yshi · ‎07-05-2016

@sim6 Sorry for the late response! I am currently on a holiday, and thus can only check this problem occasionally. It doesn't make sense that the merge job doesn't have a partitioner class, because it does need a reduce phase to merge the records. Could you please: 1) Make sure that you are checking the merge job, not the import job. 2) Please check the number of reducers of the failed merge job 3) Please try to add parameter "-Dmapred.reduce.tasks=1" to your Sqoop import job and see whether it helps 4) If nothing above helps, let's try to narrow down the problem a bit by removing the impact of Sqoop metastore. Please run the command directly in Oozie Sqoop action instead of storing it in metastore. The command should be something like below (I have removed the --driver parameter as well to use the MySQL connector instead of the generic JDBC connector, although it shouldn't make any difference for our problem). sqoop import -Dmapred.reduce.tasks=1 --connect jdbc:mysql://IP/DB?zeroDateTimeBehavior=convertToNull --username USER_NAME --password 'PASSWORD' --table test_table --merge-key id -- split-by id --target-dir LOCATION --incremental lastmodified --last-value 0 --check-column updated_at When I have some time, I will try to reproduce your problem. Please let me know the CDH and CM version you are using.

yshi · ‎07-04-2016

@sim6 Could you please let me know under which user you submitted the oozie Sqoop job? Is it the same user as you ran the job on command line? If you are using different users for Oozie and command line jobs, please try to switch to the same user and see whether it brings any change. As for the application logs, we may need to specify the application owner to be able to get it. Please try below command: sudo -u mapred yarn logs -applicationId <app_id> -appOwner <owner>

yshi · ‎07-04-2016

@Markus Kemper Although not documented, we can use "--merge-key" in Sqoop import command if "--incremental lastmodified" is used. If the target dir is not empty, the Sqoop import tool will automatically run a merge job after the import job is finished to merge the data. Please see below source code for details: https://github.com/cloudera/sqoop/blob/cdh5-1.4.6_5.7.1/src/java/org/apache/sqoop/tool/ImportTool.java#L515-L517 https://github.com/cloudera/sqoop/blob/cdh5-1.4.6_5.7.1/src/java/org/apache/sqoop/tool/ImportTool.java#L432-L433 @sim6 I have just realized that you are express edition of Cloudera Manager. It may not have the function of collecting diagnostic data of jobs. Please help manually check: 1) Please add parameter "--verbose" and "-Dmapreduce.map.log.level=DEUBG" to your Sqoop import command, run the sqoop job and. After it failed, please upload ALL the sqoop command output and the failed MR job logs (use command "yarn logs -applicationId=<app_id>" to get it) 2) The value of property "mapreduce.job.partitioner.class" in both successful and failed jobs 3) The value of property "sqoop.merge.key.col" in both successful and failed jobs

yshi · ‎07-04-2016

This is really weird. Could you please upload the diagnostic data of one failed merge job together with one successful job? To get the diagnostic data of a job, please go to CM > YARN > Applications, find the job, click the buttom to its right and choose "Collect Diagnostic Data". In the popout diaglogue, please confirm the operation and download the result data after it finishes.

yshi · ‎07-03-2016

Hi sim6, Thanks for reporting this problem! This problem is weird. The first thing I have noticed is that we are using Uber mode with Oozie Launchers. Could you please try to disable it and see whether it helps? To do this, please set below in oozie-site.xml of Oozie Server: <property> <name>oozie.action.launcher.mapreduce.job.ubertask.enable</name> <value>false</value> </property> If you are using Cloudera Manager, please set them in Oozie server safety valve and then restart Oozie server. If you are not using CM, please manually add it to oozie-site.xml and restart Oozie Server. If you still have this problem after disabling Oozie Launcher uber mode, please try to collect the configuration of one failed and one succeeded Sqoop Merge job. The stack trace shows that the mapper output collector cannot write the data because it cannot find the partition for the given output key (1 and 3 for your data). I am suspecting that the partitioner passed to the jobs are different.

Online	Offline
Last Visited	‎07-17-2017 08:21 PM

Member Since	‎08-18-2014 05:05 PM
Last Visited	‎07-17-2017 08:21 PM
Posts	35
Kudos received	8

Cloudera Community

Re: Sqoop2 metastore configuration error

Re: Error while executing CTE query: Argument type...

Re: View SQL for Hive job

Re: Sqoop lastmodified - custom upper bound

Re: Query Runs slow on hive when using NOT LIKE

Re: change sqoop metastore timezone to GMT

Re: IllegalArgumentException and Illegal partition...

Re: Sqoop to write JSON in HDFS

Re: IllegalArgumentException and Illegal partition...

Re: IllegalArgumentException and Illegal partition...

Re: IllegalArgumentException and Illegal partition...

Re: IllegalArgumentException and Illegal partition...

Re: IllegalArgumentException and Illegal partition...

Re: IllegalArgumentException and Illegal partition...

Re: IllegalArgumentException and Illegal partition...