Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

5th attempt at getting an answer to this question

Solved Go to solution
Highlighted

5th attempt at getting an answer to this question

Master Collaborator

I have a strong feeling Hortonworks is purposely not answering it as I got all my other questions answered ..so I m hoping one brave person will step forward and tell me the truth . after all that's what the purpose of the forum is.

my question that is still not answered is : is sqoop incremental load in hive ORC table supported and has anyone done it ?

i am sure many people will benefit from this answer

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted

Re: 5th attempt at getting an answer to this question

@Sami Ahmad

1) NiFi is definitely an option.

If CDC is important for you, be aware that MySQL CDC processor is supported. Unfortunately, other supported CDC processors are not available due to licensing issues with vendors like Oracle etc. So, if you use NiFi, you need to write your queries smartly to catch the changes and limit the impact on source databases.

2) Another good option is Attunity, but that comes at higher cost.

3) I have seen others using Spark for your use case.

4) I am doing some guesswork here, because I don't have info from your 4 questions before. As I recall the incremental data import is supported via sqoop job and not directly via sqoop import. I see that you did it as a job, but could be it a typo in your syntax? I see a space between "dash-dash" and "import" :) Joking, but you may want to check. I have seen strange messages out of Sqoop.

https://community.hortonworks.com/questions/131541/is-sqoop-incremental-load-possible-for-hive-orc-t...

Could you point to previous questions URLs or clarify/add more info to the current question?

5) Regarding, "Also i use "--hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")'" for non incremental loads and i was told that soon this would be working , still not ?" I need a little bit more context or maybe this was already another question you submitted and I can check.

View solution in original post

7 REPLIES 7

Re: 5th attempt at getting an answer to this question

@Sami Ahmad

I doubt this is a conspiracy :). I found one instance of those 4th previous attempts: https://community.hortonworks.com/questions/131541/is-sqoop-incremental-load-possible-for-hive-orc-t...

It would have been good if you could have referenced the URLs of the previous 4 attempt so we can get some historical information. It is not clear what version of Hive you use, 1.2.1 or 2.1.0, also how whether you created the target Hive table as transactional, but anyway, long story short, the following is the practice that Hortonworks recommends on HDP 2.6.0, assuming that is your HDP version as your question was tagged: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.0/bk_data-access/content/incrementally-updati...

Highlighted

Re: 5th attempt at getting an answer to this question

Master Collaborator

it must be my lucky day :)

Stanca but this will introduce a lot of delays in the data , we wanted a near real time data , is it not possible using sqoop ?

what about Nifi ?

Also i use "--hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")'" for non incremental loads and i was told that soon this would be working , still not ?

Highlighted

Re: 5th attempt at getting an answer to this question

Master Collaborator

I read your DATA ACCESS document and I think its for cases where you don't have a date check column. since its doing many operations like merging tables , purging , compacting , deleting ..etc

why would I do all this when I can just

1- import the whole base table as ORC

2- bring in the incrementals as text to an exterenally mapped table

3- insert into the base ORC table selecting everything from the incremental table

4- delete all the files in the external table folder.

I tested this method and its working fine .

is there any flaw in this method that I am not seeing ?

Highlighted

Re: 5th attempt at getting an answer to this question

Master Collaborator

I just noted there is a small note on top saying

"

[Note>Note
This procedure requires change data capture from the operational database that has a primary key and modified date field where you pulled the records from since the last update.

we don't have CDC on our database so we cant do incremental imports? it should be possible by looking at the date field as that's constantly increasing ?

Highlighted

Re: 5th attempt at getting an answer to this question

@Sami Ahmad

1) NiFi is definitely an option.

If CDC is important for you, be aware that MySQL CDC processor is supported. Unfortunately, other supported CDC processors are not available due to licensing issues with vendors like Oracle etc. So, if you use NiFi, you need to write your queries smartly to catch the changes and limit the impact on source databases.

2) Another good option is Attunity, but that comes at higher cost.

3) I have seen others using Spark for your use case.

4) I am doing some guesswork here, because I don't have info from your 4 questions before. As I recall the incremental data import is supported via sqoop job and not directly via sqoop import. I see that you did it as a job, but could be it a typo in your syntax? I see a space between "dash-dash" and "import" :) Joking, but you may want to check. I have seen strange messages out of Sqoop.

https://community.hortonworks.com/questions/131541/is-sqoop-incremental-load-possible-for-hive-orc-t...

Could you point to previous questions URLs or clarify/add more info to the current question?

5) Regarding, "Also i use "--hcatalog-storage-stanza 'stored as orc tblproperties ("orc.compress"="SNAPPY")'" for non incremental loads and i was told that soon this would be working , still not ?" I need a little bit more context or maybe this was already another question you submitted and I can check.

View solution in original post

Highlighted

Re: 5th attempt at getting an answer to this question

Master Collaborator

here is one of the post showing hive incremental import into ORC is possible using sqoop . so why its not working for me ? I was using the correct syntax . no space between dash-dash

https://community.hortonworks.com/questions/58015/sqoop-hcataloghive-incremental-import-in-orc-forma...

kindly check my other post ,, I am trying to follow the link you posted earlier but getting errors.

https://community.hortonworks.com/questions/223262/malformed-orc-file-format.html

Highlighted

Re: 5th attempt at getting an answer to this question

Master Collaborator

please see the syntax in the attached screenshot .. its not complaining about dash-dash but its not liking the --append-mode with HCatalog

91527-capture1.jpg

Don't have an account?
Coming from Hortonworks? Activate your account here