Member since
06-29-2017
6
Posts
1
Kudos Received
0
Solutions
09-27-2017
10:12 AM
Hadoop Connector Guide provides a brief introduction on cloud connectors and its features. The guide provides detailed information on how to set up the connector and run Data Synchronization tasks. The guide provides an overview of supported features and task operations that can be performed using Hadoop Connector. Docs for Hadoop connector for Informatica: https://kb.informatica.com/proddocs/Product%20Documentation/6/IC_Spring2017_HadoopConnectorGuide_en.pdf
... View more
09-23-2017
06:20 AM
5 Common use cases for Apache Spark: Streaming ingest and analytics Spark isn’t the first big data tool for handling streaming ingest, but it is the first one to integrate it with the rest of the analytic environment. Spark is friendly with the rest of the streaming data ecosystem, supporting data sources including Flume, Kafka, ZeroMQ, and HDFS. Exploratory analytics One of the headline benefits of using Spark is that you no longer need to maintain different environments for exploratory and production work. The relatively long execution times of a Hadoop MapReduce job make it difficult for hands-on exploration of data: data scientists typically still must sample data if they want to move quickly. Thanks to the speed of Spark’s in-memory capabilities, interactive exploration can now happen completely within Spark , without the need for Java engineering or sampling of the data. Model building and machine learning Spark’s status as a big data tool that data scientists find easy to use makes it ideal for building models for analytical purposes. In a pre-Spark world, big data modelers typically built their models in a language such as R or SAS, then threw them to data engineers to re-implement in Java for production on Hadoop. Graph analysis By incorporating the GraphX component, Spark brings all the benefits of using its environment to graph computation: enabling use cases such as social network analysis, fraud detection, and recommendations. Simpler, faster, ETL Though less glamorous than the analytical applications, ETL is often the lion’s share of data workloads. If the rest of your data pipeline is based on Spark, then the benefits of using Spark for ETL are obvious, with consequent increases in maintainability and code-reuse.
... View more
01-09-2018
06:06 PM
IBM offers free courses in Scala and other languages, they are free. There are tests at the end of the course once successful you can earn badges and showcase them. https://cognitiveclass.ai/
... View more