Apache Spark and Scala Certification Training

Overview

With Simplilearn’s Apache Spark and Scala certification training you would advance your expertise in Big Data Hadoop Ecosystem. With this Apache Spark certification you will master the essential skills such as Spark Streaming, Spark SQL, Machine Learning Programming, GraphX Programming, Shell Scripting Spark. And with real life industry project coupled with 30 demos you would be ready to take up Hadoop developer job requiring Apache Spark expertise.

Price (*ask for discount) 400 USD
Access Period 180 days

Prerequisite list

  • Fundamental knowledge of any programming language
  • Basic understanding of any database, SQL, and query language for databases
  • Working knowledge of Linux- or Unix-based systems (not mandatory)
  • It is recommended to do a Big Data Hadoop Developer Certification Training as a prerequisite as it provides an excellent foundation for Apache Spark and Scala certification.

Audience list

  • Professionals aspiring for a career in field of real time Big data analytics
  • Analytics professionals
  • Research professionals
  • IT developers and testers
  • Data scientists
  • BI and reporting professionals
  • Students who wish to gain a thorough understanding of Apache Spark

What is included

  • 15 hours of self-paced video.
  • Includes topics on Spark streaming, Spark ML, and GraphX programming.
  • 1 industry project for submission and 2 for hands-on practice.
  • Includes downloadable ebooks and 30 demos.

Certification Info

  • How To Earn?  • Complete 85% of the course. Complete 1 project and 1 simulation test with a minimum score of 60%.
  • How To Maintain?  N/A

Certification Exam Format

  • N/A

Retake policy

  • N/A.

Enrollment Policy

  • You should pay the online course fee then the online course access will be granted to you within 1 week after receiving payment.
  • Course fee payment is not refundable.

Frequently Asked Questions

Course Outline

Introduction to Spark
  • Evolution of Distributed Systems
  • Need of New Generation Distributed Systems
  • Limitations of MapReduce in Hadoop
  • Batch vs. Real-Time Processing
  • Application of Stream Processing
  • Application of In-Memory Processing
  • Introduction to Apache Spark
  • Components of a Spark Project
  • History of Spark
  • Language Flexibility in Spark
  • Spark Execution Architecture
  • Automatic Parallelization of Complex Flows
  • Automatic Parallelization of Complex Flows-Important Points
  • APIs That Match User Goals
  • Apache Spark-A Unified Platform of Big Data Apps
  • More Benefits of Apache Spark
  • Running Spark in Different Modes
  • Installing Spark as a Standalone Cluster-Configurations
  • Demo-Install Apache Spark
  • Overview of Spark on a Cluster
  • Tasks of Spark on a Cluster
  • Companies Using Spark-Use Cases
  • Hadoop Ecosystem vs. Apache Spark
  • Quiz
  • Summary
Introduction to Programming in Scala
  • Introduction
  • Objectives
  • Introduction to Scala
  • Features of Scala
  • Basic Data Types
  • Basic Literals
  • Introduction to Operators
  • Types of Operators
  • Use Basic Literals and the Arithmetic Operator
  • Demo Use Basic Literals and the Arithmetic Operator
  • Use the Logical Operator
  • Demo Use the Logical Operator
  • Introduction to Type Inference
  • Type Inference for Recursive Methods
  • Type Inference for Polymorphic Methods and Generic Classes
  • Unreliability on Type Inference Mechanism
  • Mutable Collection vs. Immutable Collection
  • Functions
  • Anonymous Functions
  • Objects
  • Classes
  • Use Type Inference, Functions, Anonymous Function, and Class
  • Demo Use Type Inference, Functions, Anonymous Function and Class
  • Traits as Interfaces
  • Traits-Example
  • Collections
  • Types of Collections
  • Lists
  • Perform Operations on Lists
  • Demo Use Data Structures
  • Maps
  • Maps-Operations
  • Pattern Matching
  • Implicits
  • Streams
  • Use Data Structures
  • Demo Perform Operations on Lists
  • Quiz
  • Summary
Using RDD for Creating Applications in Spark
  • Introduction
  • Objectives
  • RDDs API
  • Features of RDDs
  • Creating RDDs
  • Creating RDDs—Referencing an External Dataset
  • Referencing an External Dataset—Text Files
  • Referencing an External Dataset—Sequence Files
  • Referencing an External Dataset—Other Hadoop Input Formats
  • Creating RDDs—Important Points
  • RDD Operations
  • RDD Operations—Transformations
  • Features of RDD Persistence
  • Storage Levels Of RDD Persistence
  • Choosing The Correct RDD Persistence Storage Level
  • Invoking the Spark Shell
  • Importing Spark Classes
  • Creating the SparkContext
  • Loading a File in Shell
  • Performing Some Basic Operations on Files in Spark Shell RDDs
  • Packaging a Spark Project with SBT
  • Running a Spark Project With SBT
  • Demo-Build a Scala Project
  • Build a Scala Project
  • Demo-Build a Spark Java Project
  • Build a Spark Java Project
  • Shared Variables—Broadcast
  • Shared Variables—Accumulators
  • Writing a Scala Application
  • Demo-Run a Scala Application
  • Run a Scala Application
  • Demo-Write a Scala Application Reading the Hadoop Data
  • Write a Scala Application Reading the Hadoop Data
  • Demo-Run a Scala Application Reading the Hadoop Data
  • Run a Scala Application Reading the Hadoop Data
  • Scala RDD Extensions
  • DoubleRDD Methods
  • PairRDD Methods—Join
  • PairRDD Methods—Others
  • Java PairRDD Methods
  • General RDD Methods
  • Java RDD Methods
  • Common Java RDD Methods
  • Spark Java Function Classes
  • Method for Combining JavaPairRDD Functions
  • Transformations in RDD
  • Other Methods
  • Actions in RDD
  • Key-Value Pair RDD in Scala
  • Key-Value Pair RDD in Java
  • Using MapReduce and Pair RDD Operations
  • Reading Text File from HDFS
  • Reading Sequence File from HDFS
  • Writing Text Data to HDFS
  • Writing Sequence File to HDFS
  • Using GroupBy
  • Demo-Run a Scala Application Performing GroupBy Operation
  • Run a Scala Application Performing GroupBy Operation
  • Demo-Run a Scala Application Using the Scala Shell
  • Run a Scala Application Using the Scala Shell
  • Demo-Write and Run a Java Application
  • Write and Run a Java Application
  • Quiz
  • Summary
Running SQL Queries Using Spark SQL
  • Introduction
  • Objectives
  • Importance of Spark SQL
  • Benefits of Spark SQL
  • DataFrames
  • SQLContext
  • Creating a DataFrame
  • Using DataFrame Operations
  • Demo-Run SparkSQL with a Dataframe
  • Run SparkSQL with a Dataframe
  • Interoperating with RDDs
  • Using the Reflection-Based Approach
  • Using the Programmatic Approach
  • Demo-Run Spark SQL Programmatically
  • Run Spark SQL Programmatically
  • Data Sources
  • Save Modes
  • Saving to Persistent Tables
  • Parquet Files
  • Partition Discovery
  • Schema Merging
  • JSON Data
  • Hive Table
  • DML Operation-Hive Queries
  • Demo-Run Hive Queries Using Spark SQL
  • Run Hive Queries Using Spark SQL
  • JDBC to Other Databases
  • Supported Hive Features
  • Supported Hive Data Types
  • Case Classes
  • Quiz
  • Summary
Spark Streaming
  • Introduction
  • Objectives
  • Introduction to Spark Streaming
  • Working of Spark Streaming
  • Features of Spark Streaming
  • Streaming Word Count
  • Micro Batch
  • DStreams
  • Input DStreams and Receivers
  • Basic Sources
  • Advanced Sources
  • Advanced Sources-Twitter
  • Transformations on DStreams
  • Output Operations on DStreams
  • Design Patterns for Using ForeachRDD
  • DataFrame and SQL Operations
  • Checkpointing
  • Enabling Checkpointing
  • Socket Stream
  • File Stream
  • Stateful Operations
  • Window Operations
  • Types of Window Operations
  • Join Operations-Stream-Dataset Joins
  • Join Operations-Stream-Stream Joins
  • Monitoring Spark Streaming Application
  • Performance Tuning-High Level
  • Performance Tuning-Detail Level
  • Demo-Capture and Process the Netcat Data
  • Capture and Process the Netcat Data
  • Demo-Capture and Process the Flume Data
  • Capture and Process the Flume Data
  • Demo-Capture the Twitter Data
  • Capture the Twitter Data
  • Quiz
  • Summary
Spark ML Programming
  • Introduction
  • Objectives
  • Introduction to Machine Learning
  • Common Terminologies in Machine Learning
  • Applications of Machine Learning
  • Machine Learning in Spark
  • Spark ML API
  • DataFrames
  • Transformers and Estimators
  • Pipeline
  • Working of a Pipeline
  • DAG Pipelines
  • Runtime Checking
  • Parameter Passing
  • General Machine Learning Pipeline-Example
  • Model Selection via Cross-Validation
  • Supported Types, Algorithms, and Utilities
  • Data Types
  • Feature Extraction and Basic Statistics
  • Clustering
  • K-Means
  • Demo-Perform Clustering Using K-Means
  • Perform Clustering Using K-Means
  • Gaussian Mixture
  • Power Iteration Clustering (PIC)
  • Latent Dirichlet Allocation (LDA)
  • Collaborative Filtering
  • Classification
  • Regression
  • Example of Regression
  • Demo-Perform Classification Using Linear Regression
  • Perform Classification Using Linear Regression
  • Demo-Run Linear Regression
  • Run Linear Regression
  • Demo-Perform Recommendation Using Collaborative Filtering
  • Perform Recommendation Using Collaborative Filtering
  • Demo-Run Recommendation System
  • Run Recommendation System
  • Quiz
  • Summary
Spark GraphX Programming
  • Introduction
  • Objectives
  • Introduction to Graph-Parallel System
  • Limitations of Graph-Parallel System
  • Introduction to GraphX
  • Importing GraphX
  • The Property Graph
  • Features of the Property Graph
  • Creating a Graph
  • Demo-Create a Graph Using GraphX
  • Create a Graph Using GraphX
  • Triplet View
  • Graph Operators
  • List of Operators
  • Property Operators
  • Structural Operators
  • Subgraphs
  • Join Operators