Big Data Hadoop and Spark Developer Certification Training

Overview

The Big Data Hadoop and Spark developer course is designed to give you in-depth knowledge of the Big Data framework using Hadoop and Spark, including HDFS, YARN, and MapReduce. You will learn to use Pig, Hive, and Impala to process and analyze large datasets stored in the HDFS, and use Sqoop and Flume for data ingestion. You will master real-time data processing using Spark, including functional programming in Spark, implementing Spark applications, understanding parallel processing in Spark, and using Spark RDD optimization techniques. You will also learn the various interactive algorithms in Spark and use Spark SQL for creating, transforming, and querying data forms. As a part of the course, you will be required to execute real-life industry-based projects using CloudLab in the domains of banking, telecommunication, social media, insurance, and e-commerce. This Big Data Hadoop training course will prepare you for the Cloudera CCA175 certification.

Price (*ask for discount) 400 USD
Access period 180 days

Prerequisite list

  • As knowledge of Java is necessary for this course, we are providing a complimentary access to the “Java Essentials for Hadoop” course.
  • For Spark, we use Python and Scala. An e-book is provided for support.
  • Knowledge of an operating system such as Linux is useful for this course.

Audience list

  • Software Developers and Architects
  • Analytics Professionals
  • Senior IT professionals
  • Testing and Mainframe Professionals
  • Data Management Professionals
  • Business Intelligence Professionals
  • Project Managers
  • Aspiring Data Scientists
  • Graduates looking to build a career in Big Data Analytics

What is included

  • 24 hours of self-paced video
  • 5 real-life industry projects using Hadoop and Spark
  • Hands-on practice on CloudLab
  • Training on Yarn, MapReduce, Pig, Hive, Impala, HBase, and Apache Spark
  • Aligned to Cloudera CCA175 certification exam

Certification Info

  • How To Earn?  Complete 85% of the course. Complete one project and one simulation test with a minimum score of 80%.
  • How To Maintain?  N/A

Certification Exam Format

  • No Exam

Retake policy

  • N/A.

Enrollment Policy

  • You should pay the online course fee then the online course access will be granted to you within 1 week after receiving payment.
  • Course fee payment is not refundable.

Frequently Asked Questions

Course Outline

Introduction to Big data and Hadoop Ecosystem
  • Introduction
  • Overview to Big Data and Hadoop
  • Hadoop Ecosystem
  • Quiz
  • Key Takeaways
HDFS and YARN
  • Introduction
  • HDFS Architecture and Components
  • Pop Quiz
  • Block Replication Architecture
  • YARN Introduction
  • Quiz
  • Key Takeaways
  • Hands-on Exercise
MapReduce and Scoop
  • Introduction
  • Why Mapreduce
  • Small Data and Big Data
  • Pop Quiz
  • Data Types in Hadoop
  • Joins in MapReduce
  • What is Sqoop
  • Quiz
  • Key Takeaways
  • Hands-on Exercise
Basics of Hive and Impala
  • Introduction
  • Pop Quiz
  • Interacting with Hive and Impala
  • Quiz
  • Key Takeaways
Working with Hive and Impala
  • Working with Hive and Impala
  • Pop Quiz
  • Data Types in Hive
  • Validation of Data
  • What is Hcatalog and Its Uses
  • Quiz
  • Key Takeaways
  • Hands-on Exercise
Types of Data Formats
  • Introduction
  • Types of File Format
  • Pop Quiz
  • Data Serialization
  • Importing MySql and Creating hivetb
  • Parquet With Sqoop
  • Quizn
  • Key Takeaways
  • Hands-on Exercise
Advanced Hive Concept and Data File Partitioning
  • Introduction
  • Pop Quiz
  • Overview of the Hive Query Language
  • Quiz
  • Key Takeaways
  • Hands-on Exercise
Apache Flume and HBase
  • Introduction
  • Pop Quiz
  • Introduction to HBase
  • Quiz
  • Key Takeaways
  • Hands-on Exercise
Pig
  • Introduction
  • Pop Quiz
  • Getting Datasets for Pig Development
  • Quiz
  • Key Takeaways
  • Hands-on Exercise
Basics of Apache Spark
  • Introduction
  • Spark - Architecture, Execution, and Related Concepts
  • Pop Quiz
  • RDD Operations
  • Functional Programming in Spark
  • Quiz
  • Key Takeaways
  • Hands-on Exercise
RDDs in Spark
  • Introduction
  • RDD Data Types and RDD Creation
  • Pop Quiz
  • Operations in RDDs
  • Quiz
  • Key Takeaways
  • Hands-on Exercise
Implementation of Spark Applications
  • Introduction
  • Running Spark on YARN
  • Pop Quiz
  • Running a Spark Application
  • Dynamic Resource Allocation
  • Configuring Your Spark Application
  • Quiz
  • Key Takeaways
Spark Parallel Processing
  • Introduction
  • Pop Quiz
  • Parallel Operations on Partitions
  • Quiz
  • Key Takeaways
  • Hands-on Exercise
Spark RDD Optimization Techniques
  • Introduction
  • Pop Quiz
  • RDD Persistence
  • Quiz
  • Key Takeaways
  • Hands-on Exercise
Spark Algorithm
  • Introduction
  • Spark: An Iterative Algorithm
  • Introduction To Graph Parallel System
  • Pop Quiz
  • Introduction To Machine Learning
  • Introduction To Three C's
  • Quiz
  • Key Takeaways
  • What is next?
Spark SQL
  • Introduction
  • Pop Quiz
  • Interoperating with RDDs
  • Quiz
  • Key Takeaways
  • Hands-on Exercise
Projects and Simulation Test
  • Project For Submission
  • Projects with solutions
  • Simulation Test Paper Instructions