Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Loading multiple CSV file( different layout) into a single hive table

Loading multiple CSV file( different layout) into a single hive table

New Contributor

Hi,

I have to process multiple CSV (8K) files( each are having different schema) and then load them into an Hive table with fixed format (1500 Columns). Each files are having different number of columns and missing columns will be loaded with Null and extra columns will be dropped. Please advise on what will be the best approach with Pyspark without using Pandas or Scala.

Thanks

Don't have an account?
Coming from Hortonworks? Activate your account here