Member since
02-01-2023
2
Posts
0
Kudos Received
0
Solutions
02-05-2023
10:00 PM
Hi, my question is not about the connector. my question is how dynamically i can work with spark dataframe that should handle multiple different schema. look at the attachment given. nevertheless, let me add some insights. on spark 3.0 , we have allowmissingcolumns parameter for unionbyname command; what do we have on 2.0 which is equevalent?
... View more
02-01-2023
08:48 AM
Hi all, I work with cloudera 7.4.4 as our solution works with hive over spark. As I load text files into hive , I may have schema changes on 3 manners: 1. the source data has added a column - causes data loss on insertion till the column is updated on Hive 2. the source data has omitted a column - fails the insert since that column was not dropped on hive 3. the data type has escalated to different type - had Hive is not updated in the new type , for example , int to bigint , the result will be null nevertheless, inferschema of spark may change numeric fields to alpha and vice versa. is there a certain way , to make a non external Hive table to comply with these changes. I did manage to create a program that do a filler of omitted columns to the dataframe and auto add new columns and escalates the data type, but is there a built in method? for change alphanumeric to numeric and vice versa i don't have a solution. Or, would you suggest to put the Hive as an external table over hbase/mongo/cassandra (any other that is better?!) and is a "refresh" of the structure will be as a snap of update a structure or lock my table till data will be rebalanced? the attachment shows that i have an initial schema and the necessity to update thx in advanced
... View more
Labels:
- Labels:
-
Apache Hive
-
Apache Spark