Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

INSERT INTO Hive Storage Handlers

INSERT INTO Hive Storage Handlers

New Contributor

I have the following technical questions regarding INSERT INTO Hive storage handlers.

FYI here is how I define my OutputFormat:

 public class MyOutputFormat extends OutputFormat implements HiveOutputFormat { 

1) I notice that when hive.execution.engine=mr, the checkOutputSpecs(FileSystem fs, JobConf job) method in my OutputFormat is called once before executing the multi-threaded RecordWriter code. However, when hive.execution.engine=tez, the method is not called at all. Is this expected? And if so is there a different method called instead?

2) When will the getOutputCommitter() method in my OutputFormat be called? I can't seem to get it called during my INSERT query.

3) How is the parallelism determined for a INSERT INTO Hive storage handler query? Does Hive automatically decide how many mappers/containers to use based on the size of the data to be inserted? Do we have any control over that? Thank you in advance for any help to any of the questions.

Don't have an account?
Coming from Hortonworks? Activate your account here