Support Questions

Find answers, ask questions, and share your expertise
Check out our newest addition to the community, the Cloudera Data Analytics (CDA) group hub.

Is it possible to load hive table parallely?

Im trying to load hive table.

I have two different source which has to loaded into same target. Is it ok If run that job in parallel?


Expert Contributor

@Bala Vignesh N V

Unfortunately, you cannot run multiple insert commands on the same destination table at the same time (technically you can, but the job will get executed one after the other)

however, if you are using external file, you can achieve parallelism by writing multiple files into your destination folder and creating a hive external table on top of your destination folder.

It will look something like this:

CREATE EXTERNAL TABLE page_view(viewTime INT, userid BIGINT,
     page_url STRING, referrer_url STRING,
     ip STRING COMMENT 'IP Address of the User',
     country STRING COMMENT 'country of origination')
LOCATION '/logs/mywebapp/'

where '/logs/mywebapp/' will be your hdfs directory and you will write multiple files (one for each of your parallel jobs) into this directory.

** If this answers your question, please don't forget to upvote and Accept the answer **

@Bala Vignesh N V

If your table is an actual Hive table (not an external table) and it is ACID-enabled (require ORC file format) and Hive/Tez is enabled globally for parallelism and you write those SQL statements as separate jobs, then YES. The assumption is that you run one of the versions of Hive capable of ACID which most likely you do if you use anything released in the last 1.5-2 years.

Take a Tour of the Community
Don't have an account?
Your experience may be limited. Sign in to explore more.