Support Questions
Find answers, ask questions, and share your expertise

Data Ingestion From Hive to GP

What is the best and fast way of moving data from Hive to Greenplum.? Using gpfdist protocol or gphdfs protocol?



gpfdist serves files in posix filesystem while gphdfs connects directly to hdfs. Both protocols serve files in parallel so both are very fast. I would use gphdfs so that you won't have to export the data from hdfs to posix before loading.