About hades_63146

hades_63146 · ‎02-05-2023

Hi, my question is not about the connector. my question is how dynamically i can work with spark dataframe that should handle multiple different schema. look at the attachment given. nevertheless, let me add some insights. on spark 3.0 , we have allowmissingcolumns parameter for unionbyname command; what do we have on 2.0 which is equevalent?

hades_63146 · ‎02-01-2023

Hi all, I work with cloudera 7.4.4 as our solution works with hive over spark. As I load text files into hive , I may have schema changes on 3 manners: 1. the source data has added a column - causes data loss on insertion till the column is updated on Hive 2. the source data has omitted a column - fails the insert since that column was not dropped on hive 3. the data type has escalated to different type - had Hive is not updated in the new type , for example , int to bigint , the result will be null nevertheless, inferschema of spark may change numeric fields to alpha and vice versa. is there a certain way , to make a non external Hive table to comply with these changes. I did manage to create a program that do a filler of omitted columns to the dataframe and auto add new columns and escalates the data type, but is there a built in method? for change alphanumeric to numeric and vice versa i don't have a solution. Or, would you suggest to put the Hive as an external table over hbase/mongo/cassandra (any other that is better?!) and is a "refresh" of the structure will be as a snap of update a structure or lock my table till data will be rebalanced? the attachment shows that i have an initial schema and the necessity to update thx in advanced

Online	Offline
Last Visited	‎02-06-2023 02:06 AM

Member Since	‎02-01-2023 08:30 AM
Last Visited	‎02-06-2023 02:06 AM
Posts	2

Cloudera Community

Re: Hive with spark table schema changes sensitivi...

Hive with spark table schema changes sensitivity