Support Questions

Find answers, ask questions, and share your expertise
Announcements
Celebrating as our community reaches 100,000 members! Thank you!

How to process large volume of data(e.g, 100 GB) in Apache Hadoop?

avatar
New Contributor

I want to process 100 GB of RFID data on apache haddop. Can anyone explain me how to do it using Horton Works Sandbox.Thanks in advance

1 ACCEPTED SOLUTION

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login
2 REPLIES 2

avatar
Master Guru
hide-solution

This problem has been solved!

Want to get a detailed solution you have to login/registered on the community

Register/Login

avatar
Master Guru

Regarding how refer to Sunile. Pig is nice and flexible, Hive is good if you know SQL and your RFID data is already basically in a flat table format, Spark also works well ...

But the question is if you really want to process 100GB of data on the sandbox. The memory settings are tiny there is a single drive data is not replicated ... If you do it like this you can just use python on a local machine. If you want a decent environment you might want to set up 3-4 nodes on a VMware server perhaps 32GB of RAM for each? That would give you a nice little environment and you could actually do some fast processing.