About yshi

yshi · ‎06-27-2017

This problem was caused because the Sqoop tables hadn't been created in PostgreSQL. To solve this problem, please go to CM > Sqoop 2 service, click on Actions button and then choose "Create Sqoop Database". After that, please try to start Sqoop2 service again.

yshi · ‎02-04-2017

This feature is not supported in CDH yet. We consider this feature is experimental, and incomplete (only works with ORC file format).

yshi · ‎12-01-2016

@Saransh Could you please re-upload the error screenshot? I cannot open it in your post. I am also confused by your question. You mentioned that column d_rel_issd is defined as Date type, but what I can see here is that this column is just an alias of the case expression. The else clause of your case expression returns a string value. Could you please try to cast it to data and see whether it helps? cast('9999-12-31' as date)

yshi · ‎11-03-2016

Please try click on the job, then open its "metadata" tab. In this tab, search for property "query.string". Its value should contain the full SQL.

yshi · ‎09-01-2016

@Rekonn At this moment, Sqoop doesn't support specifying a custom upper bound value for lastmodified mode incremental import. Please create a JIRA to track this requirement. For now, could you try specifying a smaller value for --last-value so that the old data can be re-imported. With the merge job running after importing, duplicate records would be dropped. This way, you can have all the missing records be imported to Hadoop.

yshi · ‎08-23-2016

Thanks for the insight, @Harsh J! Have a quick look at the document you pasted here, it looks like that Spark needs extra information (low/high bound) to finish the partitioning, while Sqoop can figure this out by itself. Yes, I agree with you that Spark is very convenient if only an ad-hoc import is needed in a coding env, but other than that, Sqoop is a better tool to import/export data in RDBMS.

yshi · ‎08-21-2016

@Aditya This spark code looks awesome! I have some questions: 1) Does spark read data from Oracle in driver or executors? 2) If Spark reads Oracle in executors, how does the source data importing is split among different executors? That is, how each executor know which part of data is should read? Sqoop does a lot of optimization in such areas. Now its Oracle connectors, especially the direct connector, can read metadata from Oracle, and use the metadata to split data importing among different mappers. Its performance is quite good. Does Spark have a similar optimization or not.

yshi · ‎07-27-2016

@sim6 Sorry, I cannot make that time. Could you please paste the information here?

yshi · ‎07-25-2016

@Harsh J Thanks! I made a mistake with the command -- forgot the single quote @sim6 Thanks for your offer! I would like to hear a review from you! Would you please let me know your availability?

yshi · ‎07-21-2016

@JananiViswa1 From the first galance of this problem, I can see that you have a lot of "like" operators in your query. An like operator incurs regular expression matching, which is very costive, and may cause slowness to the query. Have you noticed where the slowness happens? Is it within Hive itself, or is it just the MR job runs for a long time? If it is the MR job that slows everything down, please consider reducing the split size of the job and thus using more mappers to process the input data. To do this, please run below commands before the query: set mapred.min.split.size=63000000; set mapred.max.split.size=64000000; If my assumption is wrong, or you still have problem after applying above change, please give me more info so that I can investigate further: 1) Hive log If you use Hive CLI, please give us the command output If you use HS2, please give us the HS2 log file or relevant information in it 2) mapreduce job configuration and log file 3) The definition of your source table (output of "show create table <tbl_name>")

Online	Offline
Last Visited	‎07-17-2017 08:21 PM

Member Since	‎08-18-2014 05:05 PM
Last Visited	‎07-17-2017 08:21 PM
Posts	35
Kudos received	8

Cloudera Community

Re: Sqoop2 metastore configuration error

Re: Error while executing CTE query: Argument type...

Re: View SQL for Hive job

Re: Sqoop lastmodified - custom upper bound

Re: Query Runs slow on hive when using NOT LIKE