Member since
08-18-2014
35
Posts
8
Kudos Received
8
Solutions
My Accepted Solutions
Title | Views | Posted |
---|---|---|
2523 | 06-27-2017 02:32 PM | |
3391 | 12-01-2016 03:41 PM | |
6033 | 11-03-2016 05:04 PM | |
2541 | 09-01-2016 08:32 PM | |
7718 | 07-21-2016 06:05 PM |
06-27-2017
02:32 PM
This problem was caused because the Sqoop tables hadn't been created in PostgreSQL.
To solve this problem, please go to CM > Sqoop 2 service, click on Actions button and then choose "Create Sqoop Database". After that, please try to start Sqoop2 service again.
... View more
02-04-2017
02:03 AM
This feature is not supported in CDH yet. We consider this feature is experimental, and incomplete (only works with ORC file format).
... View more
12-01-2016
03:41 PM
1 Kudo
@Saransh Could you please re-upload the error screenshot? I cannot open it in your post. I am also confused by your question. You mentioned that column d_rel_issd is defined as Date type, but what I can see here is that this column is just an alias of the case expression. The else clause of your case expression returns a string value. Could you please try to cast it to data and see whether it helps? cast('9999-12-31' as date)
... View more
11-03-2016
05:04 PM
Please try click on the job, then open its "metadata" tab. In this tab, search for property "query.string". Its value should contain the full SQL.
... View more
09-01-2016
08:32 PM
1 Kudo
@Rekonn At this moment, Sqoop doesn't support specifying a custom upper bound value for lastmodified mode incremental import. Please create a JIRA to track this requirement. For now, could you try specifying a smaller value for --last-value so that the old data can be re-imported. With the merge job running after importing, duplicate records would be dropped. This way, you can have all the missing records be imported to Hadoop.
... View more
08-23-2016
06:35 AM
Thanks for the insight, @Harsh J! Have a quick look at the document you pasted here, it looks like that Spark needs extra information (low/high bound) to finish the partitioning, while Sqoop can figure this out by itself. Yes, I agree with you that Spark is very convenient if only an ad-hoc import is needed in a coding env, but other than that, Sqoop is a better tool to import/export data in RDBMS.
... View more
08-21-2016
10:36 PM
@Aditya This spark code looks awesome! I have some questions: 1) Does spark read data from Oracle in driver or executors? 2) If Spark reads Oracle in executors, how does the source data importing is split among different executors? That is, how each executor know which part of data is should read? Sqoop does a lot of optimization in such areas. Now its Oracle connectors, especially the direct connector, can read metadata from Oracle, and use the metadata to split data importing among different mappers. Its performance is quite good. Does Spark have a similar optimization or not.
... View more
07-27-2016
09:05 PM
@sim6 Sorry, I cannot make that time. Could you please paste the information here?
... View more
07-25-2016
05:09 AM
1 Kudo
@Harsh J Thanks! I made a mistake with the command -- forgot the single quote @sim6 Thanks for your offer! I would like to hear a review from you! Would you please let me know your availability?
... View more
07-21-2016
06:05 PM
@JananiViswa1 From the first galance of this problem, I can see that you have a lot of "like" operators in your query. An like operator incurs regular expression matching, which is very costive, and may cause slowness to the query. Have you noticed where the slowness happens? Is it within Hive itself, or is it just the MR job runs for a long time? If it is the MR job that slows everything down, please consider reducing the split size of the job and thus using more mappers to process the input data. To do this, please run below commands before the query: set mapred.min.split.size=63000000;
set mapred.max.split.size=64000000; If my assumption is wrong, or you still have problem after applying above change, please give me more info so that I can investigate further: 1) Hive log If you use Hive CLI, please give us the command output If you use HS2, please give us the HS2 log file or relevant information in it 2) mapreduce job configuration and log file 3) The definition of your source table (output of "show create table <tbl_name>")
... View more