Support Questions

Find answers, ask questions, and share your expertise

scientific data in hadoop

avatar
New Contributor

I'm looking at a possible use of Hadoop ecosystem with the high rate and high volume scientific data.

Data that needs to be stored is arriving in a stream updating ~20 times a second, with each data stream delivering anywhere from a single value to an array of 500k values (could be integers, doubles,..). Streams have a name, data is binary (not text), and comes with timestamps. There could be millions of such streams to handle. I would look into store these input streams with the help Avro to the HDFS. From the client perspective I would like to preferably work with python (not really looking for SQL-like access at the moment). User should query for data using stream name and be able to fetch data from different time slices.

 

Assuming I would be able to scale up the node count and storage space as required, is this use case something that Hadoop ecosystem would be good at? Are there any use cases like this out there? Any benchmarks I can look at?

 

Thank you in advance!

1 ACCEPTED SOLUTION

avatar
Expert Contributor

Hello

 

You are welcomed to trial Cloudera CDP platform, you can find more details here: 

https://docs.cloudera.com/cdp-private-cloud/latest/release-guide/topics/cdpdc-trial-download-informa...

View solution in original post

1 REPLY 1

avatar
Expert Contributor

Hello

 

You are welcomed to trial Cloudera CDP platform, you can find more details here: 

https://docs.cloudera.com/cdp-private-cloud/latest/release-guide/topics/cdpdc-trial-download-informa...