Data Engineering Course - Data Engineer Job Oriented Program

Data Engineer Job Oriented Program

#No.1 Data Engineer Course

Prepzee’s Data Engineering Course has been curated to help you master skills like PySpark, Azure Data Engineer Certification DP 203, Databricks, Snowflake, Airflow, Kafka and CosmosDB. This Data Engineering Bootcamp will help you get your dream Job in Data Engineering Domain.

Master Databricks, Snowflake, Airflow, Kafka

Hands-On PySpark Training for Data Engineering

Clear Azure Data Engineer Certification DP203

Download Curriculum

View Schedule

What You will Learn in the Program?

Module 1
PySpark/ Kafka for Data Engineering
24 Hours
Introduction to Python for Apache Spark
Deep Dive into Apache Spark Framework
Mastering Spark RDD’s
Dataframes and SparkSQL
Apache Spark Steaming Data Sources
Core Concepts of Kafka
Kafka Architecture
Where is Kafka Used
Understanding the Components of Kafka Cluster
Configuring Kafka Cluster
Module 2
Data Warehousing Fundamentals
6 Hours
- OLAP vs OLTP
- What is a Data Warehouse?
- Difference between Data Warehouse, Data Lake and Data Mart
- Fact Tables
- Dimension Tables
- Slowly changing Dimensions
- Types of SCDs
- Star Schema Design
- Snowflake Schema Design
- Data Warehousing Case Studies
Module 3
Data Engineering with Cloud ( Azure )
36 Hours
- Introduction to Microsoft Azure
- Azure Databricks Introduction
- Read and Write Data in Azure Databricks
- Data Processing in Azure Databricks
- Work with DataFrames in Azure Databricks
- Platform Architecture, Security and Data Protection in Databricks
- Introduction to Azure Synapse Analytics
- Design a multidimensional schema to optimize analytical workloads
- Azure Synapse serverless SQL pool
- Ingest and Load Data into the Data Warehouse
- Transform Data with Azure Data Factory or Azure Synapse Pipelines
- Query Azure Cosmos DB with Apache Spark for Azure Synapse Analytics
- Configure Azure Synapse Link with Azure Cosmos DB
Module 4
Orchestration with Apache Airflow
12 Hours
Airflow Introduction
Different Components of Airflow
Installing Airflow
Understanding Airflow Web UI
DAG Operators & Tasks in Airflow Job
Create & Schedule Airflow Jobs For Data Processing
Create plugins to add functionalities to Apache Airflow
Module 5
Compute with Snowflake
20 Hours
Introduction of SnowFlake Data Warehousing Service
SnowFlake Architecture
Complete Setup of SnowFlake
Create Data Warehouse on SnowFlake
Analytical Queries on SnowFlake Data Warehouse
Understand the entire Snowflake workflow from end-to end
Undestanding SnowPark (Execute PySpark Application on SnowFlake)
Module 6
Industry Scenario based Labs
40 Hours
Lab 1 : Explore Compute & Storage options for Data Engineering Workloads
Lab 2 : Load and Save Data through RDD in PySpark
Lab 3 : Configuring Single Node Single Cluster in Kafka
Lab 4 : Run Interactive Queries using Azure Synapse Analytics Serverless SQL Pools
Lab 5 : Data Exploration and Transformation in Azure Databricks
Lab 6 : Explore Transform and Load Data into the Data Warehouse using Spark
Lab 7 : Ingest and Load Data into the Data Warehouse
Lab 8 : Transform Data with Azure Data Factory or Azure Synapse Pipeline
Lab 9 : Real Time Stream Processing with Stream Analytics
Lab 10 : Create a Stream Processing Solution with Event Hub and Databricks

Download Syllabus

Data Engineer Job Oriented ProgramLearning Path

Course 1

online classroom pass

Data Engineer Job Oriented Program

Embark on your journey towards a thriving career in data engineering with best Data Engineering courses. This comprehensive program is meticulously crafted to empower you with the skills and expertise needed to excel in the dynamic world of data engineering.Learn Data Engineering with Prepzee, throughout the program, you’ll explore a wide array of essential tools and technologies, including industry favorites like Databricks, Snowflake, PySpark, Azure, Azure Synapse Analytics, and more.Dive into industry projects, elevate your CV and LinkedIn presence, and attain mastery in Data Engineer technologies under the mentorship of seasoned experts.

1.1: Overview of Python

1.2: Different Applications where Python is Used

1.3: Values, Types, Variables

1.4: Operands and Expressions

1.5: Conditional Statements

1.6: Loops

1.7: Command Line Arguments

1.8: Writing to the Screen

1.9: Python files I/O Functions

1.10: Numbers

1.11: Strings and related operations

1.12: Tuples and related operations

1.13: Lists and related operations

1.14: Dictionaries and related operations

1.15: Sets and related operations

Hands On:

Creating “Hello World” code
Demonstrating Conditional Statements
Demonstrating Loops
Tuple – properties, related operations, compared with list
List – properties, related operations
Dictionary – properties, related operations
Set – properties, related operations

2.1 Functions

2.2 Function Parameters

2.3 Global Variables

2.4 Variable Scope and Returning Values

2.5 Lambda Functions

2.6 Object-Oriented Concepts

2.7 Standard Libraries

2.8 Modules Used in Python

2.9 The Import Statements

2.10 Module Search Path

2.11 Package Installation Ways

Hands-On:

Functions – Syntax, Arguments, Keyword Arguments, Return Values
Lambda – Features, Syntax, Options, Compared with the Functions
Sorting – Sequences, Dictionaries, Limitations of Sorting
Errors and Exceptions – Types of Issues, Remediation
Packages and Module – Modules, Import Options, sys Path

3.1 Spark Components & its Architecture

3.2 Spark Deployment Modes

3.3 Introduction to PySpark Shell

3.4 Submitting PySpark Job

3.5 Spark Web UI

3.6 Writing your first PySpark Job Using Jupyter Notebook

3.7 Data Ingestion using Sqoop

Hands-On:

Building and Running Spark Application
Spark Application Web UI
Understanding different Spark Properties

4.1 Challenges in Existing Computing Methods

4.2 Probable Solution & How RDD Solves the Problem

4.3 What is RDD, It’s Operations, Transformations & Actions

4.4 Data Loading and Saving Through RDDs

4.5 Key-Value Pair RDDs

4.6 Other Pair RDDs, Two Pair RDDs

4.7 RDD Lineage

4.8 RDD Persistence

4.9 WordCount Program Using RDD Concepts

4.10 RDD Partitioning & How it Helps Achieve Parallelization

4.11 Passing Functions to Spark

Hands-On:

Loading data in RDDs
Saving data through RDDs
RDD Transformations
RDD Actions and Functions
RDD Partitions
WordCount through RDDs

5.1 Need for Spark SQL

5.2 What is Spark SQL

5.3 Spark SQL Architecture

5.4 SQL Context in Spark SQL

5.5 Schema RDDs

5.6 User-Defined Functions

5.7 Data Frames & Datasets

5.8 Interoperating with RDDs

6.1 Need for Kafka

6.2 What is Kafka

6.3 Core Concepts of Kafka

6.4 Kafka Architecture

6.5 Where is Kafka Used

6.6 Understanding the Components of Kafka Cluster

6.7 Configuring Kafka Cluster

Hands-On:

Configuring Single Node Single Broker Cluster
Configuring Single Node Multi-Broker Cluster

7.1 Drawbacks in Existing Computing Methods

7.2 Why Streaming is Necessary

7.3 What is Spark Streaming

7.4 Spark Streaming Features

7.5 Spark Streaming Workflow

7.6 How Uber Uses Streaming Data

7.7 Streaming Context & DStreams

7.8 Transformations on DStreams

7.9 Describe Windowed Operators and Why it is Useful

7.10 Important Windowed Operators

7.11 Slice, Window and ReduceByWindow Operators

7.12 Stateful Operators

Hands-On:

WordCount Program using Spark Streaming

8.1 Apache Spark Streaming Data Sources

8.2 Streaming Data Source Overview

8.3 Example Using a Kafka Direct Data Source

Hands-On:

Various Spark Streaming Data Sources

9.1 OLAP vs OLTP

9.2 What is a Data Warehouse?

9.3 Difference between Data Warehouse, Data Lake and Data Mart

9.4 Fact Tables

9.5 Dimension Tables

9.6 Slowly changing Dimensions

9.7 Types of SCDs

9.8 Star Schema Design

9.9 Snowflake Schema Design

9.10 Data Warehousing Case Studies

10.1 Introduction to cloud computing

10.2 Types of Cloud Models

10.3 Types of Cloud Service Models

10.4 IAAS

10.5 SAAS

10.6 PAAS

10.7 Creation of Microsoft Azure Account

10.8 Microsoft Azure Portal Overview

11.1 Introduction to Azure Synapse Analytics

11.2 Work with data streams by using Azure Stream Analytics

11.3 Design a multidimensional schema to optimize analytical workloads
11.4 Code-free transformation at scale with Azure Data Factory
11.5 Populate slowly changing dimensions in Azure Synapse Analytics pipelines

11.6 Design a Modern Data Warehouse using Azure Synapse Analytics
11.7 Secure a data warehouse in Azure Synapse Analytics

12.1 Explore Azure Synapse serverless SQL pool capabilities

12.2 Query data in the lake using Azure Synapse serverless SQL pools

12.3 Create metadata objects in Azure Synapse serverless SQL pools

12.4 Secure data and manage users in Azure Synapse serverless SQL pools

13.1 Understand big data engineering with Apache Spark in Azure Synapse Analytics

13.2 Ingest data with Apache Spark notebooks in Azure Synapse Analytics

13.3 Transform data with DataFrames in Apache Spark Pools in Azure Synapse Analytics

13.4 Integrate SQL and Apache Spark pools in Azure Synapse Analytics

13.5 Integrate SQL and Apache Spark pools in Azure Synapse Analytics

14.1 Describe Azure Databricks
14.2 Read and write data in Azure Databricks
14.3 Work with DataFrames in Azure Databricks
14.4 Work with DataFrames advanced methods in Azure Databricks

15.1 Use data loading best practices in Azure Synapse Analytics
15.2 Petabyte-scale ingestion with Azure Data Factory or Azure Synapse Pipelines

16.1 Data integration with Azure Data Factory or Azure Synapse Pipelines

16.2 Code-free transformation at scale with Azure Data Factory or Azure Synapse Pipelines

16.3 Orchestrate data movement and transformation in Azure Data Factory or Azure Synapse Pipelines

17.1 Optimize data warehouse query performance in Azure Synapse Analytics
17.2 Understand data warehouse developer features of Azure Synapse Analytics

17.3 Analyze and optimize data warehouse storage in Azure Synapse Analytics

18.2 Configure Azure Synapse Link with Azure Cosmos DB
18.3 Query Azure Cosmos DB with Apache Spark for Azure Synapse Analytics
18.4 Query Azure Cosmos DB with SQL serverless for Azure Synapse Analytics

19.1 Secure a data warehouse in Azure Synapse Analytics
19.2 Configure and manage secrets in Azure Key Vault
19.3 Implement compliance controls for sensitive data

20.1 Enable reliable messaging for Big Data applications using Azure Event Hubs
20.2 Work with data streams by using Azure Stream Analytics
20.3 Ingest data streams with Azure Stream Analytics

21.1 Process streaming data with Azure Databricks structured streaming

22.1 Create reports with Power BI using its integration with Azure Synapse Analytics

23.1 Use the integrated machine learning process in Azure Synapse Analytics

24.1 Introduction of Airflow

24.2 Different Components of Airflow

24.3 Installing Airflow

24.4 Understanding Airflow Web UI

24.5 DAG Operators & Tasks in Airflow Job

24.6 Create & Schedule Airflow Jobs For Data Processing

25.1 Snowflake Overview and Architecture
25.2 Connecting to Snowflake
25.3 Data Protection Features
25.4 SQL Support in Snowflake
25.5 Caching in Snowflake
Query Performance
25.6 Data Loading and Unloading
25.7 Functions and Procedures
Using Tasks
25.8 Managing Security
Access Control and User Management
25.9 Semi-Structured Data
25.10 Introduction to Data Sharing
25.11 Virtual Warehouse Scaling
25.12 Account and Resource Management

Why should I get enrolled for Data Engineer job oriented program?

Enroll in our Data Engineer Job-Oriented Program and embark on a dynamic journey towards a thriving career in data engineering. This comprehensive program is designed to equip you with the skills and knowledge necessary to excel in the ever-evolving field of data engineering. Throughout this program, you'll delve into a diverse array of tools and technologies that are crucial for data engineers, including popular platforms like Databricks, Snowflake, PySpark, Azure, and Azure Synapse Analytics, among many more.

Can I request a support session to understand the topic better?

Prepzee offers 24/7 support to resolve queries. You raise the issue with the support team at any time. You can also opt for email assistance for all your requests. If not, a one-on-one session can also be arranged with the team. This session is, however, only provided for six months starting from your course date.

Who are the instructors at Prepzee?

All instructors at Prepzee are Microsoft certified experts with over twelve years of experience relevant to the industry. They are rightfully the experts on the subject matter, given that they have been actively working in the domain as consultants. You can check out the sample videos to ease your doubts.

Which projects are included in this training?

Projects included in the training program are updated and hold high relevance and value in the real world. Projects help you apply the acquired learning in real-world industry structures. Training involves several projects that test practical knowledge, understanding, and skills. High-tech domains like e-commerce, networking, marketing, insurance, banking, sales, etc., make for the subjects of the projects you will work on. After completing the Projects, your skills will be synonymous with months of meticulous industry experience.

Will the job support program guarantee me a job?

Actually, no. Our job assistance program intends to help you land the job of your dreams. The program offers opportunities to explore competitive vacancies in the corporates and look for a job that pays well and matches your profile and skill set. The final hiring decision will always be based on how you perform in the interview and the recruiter's requirements.

Data Engineer Job Oriented Program

#No.1 Data Engineer Course

05-05-2024

90 Hrs

No Interest EMI

online

Featured In

Career Transition

A journey from 'Zero Cloud Knowledge' to cloud excellence!

Immense Job opportunities

From Non-IT to IT Journey

This Data Engineering Training program is for you if

Data Engineer Classes Overview

90+ Hours of Live Training

80+ Hours Hands-on & Exercises

10+ Projects & Case Studies

24*7 Technical Support

Learn from the Top 1% of Experts

Lifetime Live Training Access

What You will Learn in the Program?

PySpark/ Kafka for Data Engineering

Data Warehousing Fundamentals

Data Engineering with Cloud ( Azure )

Orchestration with Apache Airflow

Compute with Snowflake

Industry Scenario based Labs

Program Creators

Neeraj

Amazon Authorised Instructor

Sidharth

Amazon Authorised Instructor

Nagarjuna

Microsoft Certified Trainer

KK Rathi

Microsoft Certified Trainer

Where Will Your Career Take Off?

Data Engineer

Data Integration Specialist

Cloud Data Warehouse Engineer

Cloud Data Engineer:

Data Consultant:

Data Architect

Skills Covered

Tools Covered

Unlock Bonuses worth 20000₹

AWS Cloud Practitioner Course

Linux Fundamentals Course

Azure DP 203 Master Cheat Sheet

Playbook of 97 Things Every Data Engineer should Know

Designing Data Intensive Applications PlayBook

Time is Running Out. Grab Your Spot Fast!

Placement Overview

500+

9 Days

Upto 350%

Data Engineer Job Oriented ProgramLearning Path

Data Engineer Job Oriented Program

Course Content

Module 1 – Introduction to PySpark

Module 2 – Functions, OOPS and Modules in Python

Module 3 – Deep Dive into Apache Spark Framework

Module 4 – Mastering Spark RDD’s

Module 5 – Dataframes and SparkSQL

Module 6 – Kafka

Module 7 – Apache Spark Steaming – Processing Multiple Batches

Module 8 – Apache Spark Streaming Data Sources

Module 9 – Data Warehousing

Module 10 - Introduction to Cloud Computing and Microsoft Azure

Module 11 – Serving layer design and implementation

Module 12 –Azure Synapse Analytics serverless SQL pool

Module 13 – Work on Data using Azure Synapse Analytics Apache Spark

Module 14 - Data exploration and transformation in Azure Databricks

Module 15 - Ingest and load data into the data warehouse

Module 16 - Transform data with Azure Data Factory or Azure Synapse Pipelines

Module 17 Optimize query performance with dedicated SQL pools in Azure Synapse Analytics

Module 18 – Cosmos DB

Module 19 - End-to-end security with Azure Synapse Analytics

Module 20 - Real-time stream processing with Azure Stream Analytics

Module 21 - Create a stream processing solution with Event Hubs and Azure Databricks

Module 22 - Power BI using its integration