Created on 07-13-2015 02:55 AM - edited 09-16-2022 02:33 AM
Hello,
I'm trying to work with Spark and Cassandra to extract data from the data lake and transform it. Transformation may be done before of after loading in Cassandra.
About transformations, I'm wondering: what would be the transformation tool that would allow me to do transformations without wondering of data storage? I mean that, if tomorrow I don't want to use Cassandra anymore but Hadoop, I would like that my transformations remain valid. So, I would like that my transformation tool works with Spark directly and that my transformation tool do not take care of the tool Spark works with.
Please, could you recommend me a tool that would work with Spark and do not take care of undernying tools like Cassandra et Hadoop?
Thanks 🙂
Created 07-19-2015 05:54 PM
In Spark a transformation works directly on the RDD. Transforms are implemented lazely and closely coupled to the RDDs. You can not use them separately.
What you are looking for is a tool that can generate Saprk code for you based on the transformation rule. I don't think that something like that exists.
Wilfred
Created 07-15-2015 10:36 PM
Do these transformations not work for you? Anything that you write in Spark can be adjusted to work with different storage underneath.
What else would you be looking for.
Wilfred
Created 07-16-2015 01:28 AM
Hello,
Thanks for your answer.
Spark allows me to do some transformations but it is not the main goal of Spark. A transformation tool would offer me more capabilities and would be more productive for tons of transformation rules to be produced, case by case.
Created 07-19-2015 05:54 PM
In Spark a transformation works directly on the RDD. Transforms are implemented lazely and closely coupled to the RDDs. You can not use them separately.
What you are looking for is a tool that can generate Saprk code for you based on the transformation rule. I don't think that something like that exists.
Wilfred