Support Questions

Find answers, ask questions, and share your expertise

Is it possible to load hive table parallely?

avatar

Im trying to load hive table.

I have two different source which has to loaded into same target. Is it ok If run that job in parallel?

2 REPLIES 2

avatar
Super Collaborator

@Bala Vignesh N V

Unfortunately, you cannot run multiple insert commands on the same destination table at the same time (technically you can, but the job will get executed one after the other)

however, if you are using external file, you can achieve parallelism by writing multiple files into your destination folder and creating a hive external table on top of your destination folder.

It will look something like this:

CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT,
     page_url STRING, referrer_url STRING,
     ip STRING COMMENT 'IP Address of the User',
     country STRING COMMENT 'country of origination')
LOCATION '/logs/mywebapp/'

where '/logs/mywebapp/' will be your hdfs directory and you will write multiple files (one for each of your parallel jobs) into this directory.

** If this answers your question, please don't forget to upvote and Accept the answer **

avatar
Super Guru

@Bala Vignesh N V

If your table is an actual Hive table (not an external table) and it is ACID-enabled (require ORC file format) and Hive/Tez is enabled globally for parallelism and you write those SQL statements as separate jobs, then YES. The assumption is that you run one of the versions of Hive capable of ACID which most likely you do if you use anything released in the last 1.5-2 years.