Created on 07-09-2016 11:19 AM - edited 09-16-2022 03:29 AM
I want to process 100 GB of RFID data on apache haddop. Can anyone explain me how to do it using Horton Works Sandbox.Thanks in advance
Created 07-11-2016 03:50 AM
@SANTOSH DASH You can process data in hadoop using many difference services. If your data has a schema then you can start with processing the data with hive. Full tutorial here. My preference is to do ELT logic with pig. Full tutorial here. there are many ways to skin a cat here. Full list of tutorials are here.
Created 07-11-2016 03:50 AM
@SANTOSH DASH You can process data in hadoop using many difference services. If your data has a schema then you can start with processing the data with hive. Full tutorial here. My preference is to do ELT logic with pig. Full tutorial here. there are many ways to skin a cat here. Full list of tutorials are here.
Created 07-11-2016 10:31 AM
Regarding how refer to Sunile. Pig is nice and flexible, Hive is good if you know SQL and your RFID data is already basically in a flat table format, Spark also works well ...
But the question is if you really want to process 100GB of data on the sandbox. The memory settings are tiny there is a single drive data is not replicated ... If you do it like this you can just use python on a local machine. If you want a decent environment you might want to set up 3-4 nodes on a VMware server perhaps 32GB of RAM for each? That would give you a nice little environment and you could actually do some fast processing.