Support Questions
Find answers, ask questions, and share your expertise
Announcements
Alert: Welcome to the Unified Cloudera Community. Former HCC members be sure to read and learn how to activate your account here.

Stream data to hdfs, convert them to binary and append to existing file also update to hive.

Stream data to hdfs, convert them to binary and append to existing file also update to hive.

New Contributor

Hi!

I'm kinda new to the world of big data and the Hadoop ecosystem. I'm currently learning it by myself for my University Thesis.

I'd like to analyse public transportation data and meteorological data and store them. I can access these data from webservices.

I have made a service in .NET Core that collects these data, preprocess and clean it, leaving just the usefull stuff and save as them in csv and json. About 4-5 mb of data in every minute.

Can I stream these files with Flume or Kafka or NiFi to HDFS, append to an existing file and load them to Hive automatically?

Can it automatically convert these files to a binary data - avro, parquet - file for HDFS storage with append possibilities?

Later I would process these data with Spark as well as Hive.

Don't have an account?
Coming from Hortonworks? Activate your account here