I'm kinda new to the world of big data and the Hadoop ecosystem. I'm currently learning it by myself for my University Thesis.
I'd like to analyse public transportation data and meteorological data and store them. I can access these data from webservices.
I have made a service in .NET Core that collects these data, preprocess and clean it, leaving just the usefull stuff and save as them in csv and json. About 4-5 mb of data in every minute.
Can I stream these files with Flume or Kafka or NiFi to HDFS, append to an existing file and load them to Hive automatically?
Can it automatically convert these files to a binary data - avro, parquet - file for HDFS storage with append possibilities?
Later I would process these data with Spark as well as Hive.