Support Questions

hinxx · ‎07-06-2021

I'm looking at a possible use of Hadoop ecosystem with the high rate and high volume scientific data.

Data that needs to be stored is arriving in a stream updating ~20 times a second, with each data stream delivering anywhere from a single value to an array of 500k values (could be integers, doubles,..). Streams have a name, data is binary (not text), and comes with timestamps. There could be millions of such streams to handle. I would look into store these input streams with the help Avro to the HDFS. From the client perspective I would like to preferably work with python (not really looking for SQL-like access at the moment). User should query for data using stream name and be able to fetch data from different time slices.

Assuming I would be able to scale up the node count and storage space as required, is this use case something that Hadoop ecosystem would be good at? Are there any use cases like this out there? Any benchmarks I can look at?

Thank you in advance!

Daming Xue · ‎07-11-2021

Hello

You are welcomed to trial Cloudera CDP platform, you can find more details here:

https://docs.cloudera.com/cdp-private-cloud/latest/release-guide/topics/cdpdc-trial-download-informa...

View solution in original post

Daming Xue · ‎07-11-2021

Hello

You are welcomed to trial Cloudera CDP platform, you can find more details here:

https://docs.cloudera.com/cdp-private-cloud/latest/release-guide/topics/cdpdc-trial-download-informa...

Cloudera Community

Support Questions

scientific data in hadoop