Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Please see the Cloudera blog for information on the Cloudera Response to CVE-2021-4428

Difference between Spark Insert and Hive insert

I have a process which will run Spark , convert the required data in different DF's based on the input and store it to 8 - 10 different tables. ( single input file has data for multiple Tables ).

Now i am trying to run update statement in the table which spark inserts data which is causing a lot of issues. (array out of bound index )

So would like to understand if there can be a difference between Hive insert and Spark insert (Spark version 1.6.3 ) which is causing this issue.

I have tried to insert the same data in different table created from hive and insert the data. When run the updated it finished without any issues(Bucketing partition all remains same as the table which spark inserts)

Please share any insight or thoughts on the same.

Let me know if any further details are needed.

Thanks in Advance.

0 REPLIES 0