Support Questions

Find answers, ask questions, and share your expertise

Solution for the problem

Hello All ,

Need some idea for the problem statement

Problem statement: need to get top 100 trading value till now for the given stock symbol.

More info : N number of user can login to the trading portal at the same time and enter the stock symbol, back end have to fetch top 100 trading value till date.

what are the tools required to accomplish this problem and also let me know , if there is any limitation on suggested tools.

3 REPLIES 3

Super Guru
@Gobi Subramani

I will solve this by using HBase - Tall and narrow table. Actually I have worked on a particular application which stored ticker data in HBase and recorded every change. Our HBase key, was stock symbol plus timestamp plus some more stuff we needed to search on. This enabled us the following:

AAPL<epoch time>

AAPL<epoch time -1>

AAPL<epoch time - 2>

and so on

This is Trillion plus row table. now if you have a given symbol, in this case AAPL, then you run a scan and limit to 100 rows.

You can also make a short wide table where all data for AAPL is stored in one row and then do a get and get only first hundred columns. This should be easy to implement using HBase.

@mqureshi , Interesting...

Can you please elaborate more ,

1. I dont get the jargon AAPL , why HBase key has symol and other info ( bit confused , it should have only stock symbol.)

2. what is the data incoming rate to your cluster and how many nodes are running to handle this data ( it will help me to take call on the cluster configuration )

3. In case, if you want to process the data rather than directly put in to HBase, then what would be your recommendation

Super Guru

1. I dont get the jargon AAPL , why HBase key has symol and other info ( bit confused , it should have only stock symbol.)

AAPL is just an example. It is the stock symbol for Apple. I am putting stock ticker symbol as part of the key so we can fetch by stock symbol.

2. what is the data incoming rate to your cluster and how many nodes are running to handle this data ( it will help me to take call on the cluster configuration )

We were doing mini batch and getting around 10-15 GB every 15 minutes. When project was first launched, cluster had 70 nodes.

3. In case, if you want to process the data rather than directly put in to HBase, then what would be your recommendation.

I'll pass it through Storm (Streaming Analytics Manager today) an do analytics before landing data in HBase. I would say, do your analytics in real time and then land in Kafka. And then batch pull from Kafka to put into HBase.