Hi! What are the benefits using a database system on top of HDFS and the Hadoop ecosystem (Hive, HBase, Cassandra, MongoDB, MySQL) In my view:
ACID queries Accessing data using standard JDBC/ODBC connect (what could be useless if it depends on big data queris, what could take up minutes to compile, and also could be provided by Presto) CAP solution - depending on the system and the ecosystem CAP-proofness A well thought-out querying system. (What Spark and Hive could easily substitute) What are the pros and the cons?
... View more
Hi! I'm kinda new to the world of big data and the Hadoop ecosystem. I'm currently learning it by myself for my University Thesis. I'd like to analyse public transportation data and meteorological data and store them. I can access these data from webservices. I have made a service in .NET Core that collects these data, preprocess and clean it, leaving just the usefull stuff and save as them in csv and json. About 4-5 mb of data in every minute. Can I stream these files with Flume or Kafka or NiFi to HDFS, append to an existing file and load them to Hive automatically? Can it automatically convert these files to a binary data - avro, parquet - file for HDFS storage with append possibilities? Later I would process these data with Spark as well as Hive.
... View more