- Subscribe to RSS Feed
- Mark Question as New
- Mark Question as Read
- Float this Question for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Is it possible to load hive table parallely?
- Labels:
-
Apache Hadoop
-
Apache Hive
Created ‎09-25-2016 12:40 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Im trying to load hive table.
I have two different source which has to loaded into same target. Is it ok If run that job in parallel?
Created ‎09-26-2016 05:48 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
Unfortunately, you cannot run multiple insert commands on the same destination table at the same time (technically you can, but the job will get executed one after the other)
however, if you are using external file, you can achieve parallelism by writing multiple files into your destination folder and creating a hive external table on top of your destination folder.
It will look something like this:
CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT, page_url STRING, referrer_url STRING, ip STRING COMMENT 'IP Address of the User', country STRING COMMENT 'country of origination') LOCATION '/logs/mywebapp/'
where '/logs/mywebapp/' will be your hdfs directory and you will write multiple files (one for each of your parallel jobs) into this directory.
** If this answers your question, please don't forget to upvote and Accept the answer **
Created ‎09-26-2016 05:52 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
If your table is an actual Hive table (not an external table) and it is ACID-enabled (require ORC file format) and Hive/Tez is enabled globally for parallelism and you write those SQL statements as separate jobs, then YES. The assumption is that you run one of the versions of Hive capable of ACID which most likely you do if you use anything released in the last 1.5-2 years.
