Created on 12-08-2015 05:26 AM - edited 09-16-2022 02:51 AM
What are common use cases for Spark and Data science across different verticals?
Created 12-08-2015 05:44 AM
Some of the common use case for Spark:
To make it more concrete, here are some examples from actual customers:
Created 12-08-2015 05:44 AM
Some of the common use case for Spark:
To make it more concrete, here are some examples from actual customers:
Created 09-23-2017 06:20 AM
5 Common use cases for Apache Spark:
Streaming ingest and analytics
Spark isn’t the first big data tool for handling streaming ingest, but it is the first one to integrate it with the rest of the analytic environment. Spark is friendly with the rest of the streaming data ecosystem, supporting data sources including Flume, Kafka, ZeroMQ, and HDFS.
Exploratory analytics
One of the headline benefits of using Spark is that you no longer need to maintain different environments for exploratory and production work. The relatively long execution times of a Hadoop MapReduce job make it difficult for hands-on exploration of data: data scientists typically still must sample data if they want to move quickly. Thanks to the speed of Spark’s in-memory capabilities, interactive exploration can now happen completely within Spark , without the need for Java engineering or sampling of the data.
Model building and machine learning
Spark’s status as a big data tool that data scientists find easy to use makes it ideal for building models for analytical purposes. In a pre-Spark world, big data modelers typically built their models in a language such as R or SAS, then threw them to data engineers to re-implement in Java for production on Hadoop.
Graph analysis
By incorporating the GraphX component, Spark brings all the benefits of using its environment to graph computation: enabling use cases such as social network analysis, fraud detection, and recommendations.
Simpler, faster, ETL
Though less glamorous than the analytical applications, ETL is often the lion’s share of data workloads. If the rest of your data pipeline is based on Spark, then the benefits of using Spark for ETL are obvious, with consequent increases in maintainability and code-reuse.