Microsoft MCSA: Data Engineering with Azure (70-775 & 70-776)

Duration:
From 5 days

Study Mode:
Classroom

Next Date:
20/05/2025 (Tuesday)

Overview Curriculum Exam Track What's Included Prerequisites Benefits

Overview

Your 5-day accelerated MCSA: Data Engineering with Azure course will develop the skills to design and implement big data engineering workflows with the Microsoft cloud ecosystem and Microsoft HD Insight to extract strategic value from your data.

Your expert Microsoft Certified Trainer (MCT) will immerse you in Microsoft Official Curriculum (MOC) at our distraction-free training centre, allowing you to focus 100% on learning. You'll experience Firebrand's Lecture | Lab | Review technique, combining hands-on practical labs, theory and review sessions to reinforce learning and develop skills and knowledge faster.

You'll cover a range of exciting topics including:

Deploying and securing multi-user HDInsight clusters
Implementing batch solutions with Hive and Apache Pig
Creating Spark streaming applications using DStream API
Developing big data real-time processing solutions with Apache Storm
Performing exploratory data analysis by using Spark SQL

You'll be prepared for exams 70-775: Perform Data Engineering on Microsoft HD Insight and 70-776: Engineering Data with Microsoft Cloud Services. You'll sit these on-site during the course, covered by your Certification Guarantee.

Your MCSA: Data Engineering with Azure certification will validate skills in implementing big data engineering workflows with Microsoft Cloud Services and Microsoft HDInsight. Ideal if you're a data engineer, data architect, data scientist or data developer. Earning this certifcation acts as your first step towards achieving the MCSE: Data Management and Analytics credential.

Curriculum

20775A: Perform Data Engineering on Microsoft Azure HDInsight

Module 1: Getting Started with HDInsight

This module introduces Hadoop, the MapReduce paradigm, and HDInsight.

Lessons

What is Big Data?
Introduction to Hadoop
Working with MapReduce Function
Introducing HDInsight

Lab : Working with HDInsight

Provision an HDInsight cluster and run MapReduce jobs

After completing this module, students will be able to:

Describe Hadoop, MapReduce and HDInsight.
Use scripts to provision an HDInsight Cluster.
Run a word-counting MapReduce program using PowerShell.

Module 2: Deploying HDInsight Clusters

This module provides an overview of the Microsoft Azure HDInsight cluster types, in addition to the creation and maintenance of the HDInsight clusters. The module also demonstrates how to customise clusters by using script actions through the Azure Portal, Azure PowerShell, and the Azure command-line interface (CLI). This module includes labs that provide the steps to deploy and manage the clusters.

Lessons

Identifying HDInsight cluster types
Managing HDInsight clusters by using the Azure portal
Managing HDInsight Clusters by using Azure PowerShell

Lab : Managing HDInsight clusters with the Azure Portal

Create an HDInsight cluster that uses Data Lake Store storage
Customise HDInsight by using script actions
Delete an HDInsight cluster

After completing this module, students will be able to:

Identify HDInsight cluster types
Manage HDInsight clusters by using the Azure Portal.
Manage HDInsight clusters by using Azure PowerShell.

Module 3: Authorising Users to Access Resources

This module provides an overview of non-domain and domain-joined Microsoft HDInsight clusters, in addition to the creation and configuration of domain-joined HDInsight clusters. The module also demonstrates how to manage domain-joined clusters using the Ambari management UI and the Ranger Admin UI. This module includes the labs that will provide the steps to create and manage domain-joined clusters.

Lessons

Non-domain Joined clusters
Configuring domain-joined HDInsight clusters
Manage domain-joined HDInsight clusters

Lab : Authorising Users to Access Resources

Prepare the Lab Environment
Manage a non-domain joined cluster

After completing this module, students will be able to:

Identify the characteristics of non-domain and domain-joined HDInsight clusters.
Create and configure domain-joined HDInsight clusters through the Azure PowerShell.
Manage the domain-joined cluster using the Ambari management UI and the Ranger Admin UI.
Create Hive policies and manage user permissions.

Module 4: Loading data into HDInsight

This module provides an introduction to loading data into Microsoft Azure Blob storage and Microsoft Azure Data Lake storage. At the end of this lesson, you will know how to use multiple tools to transfer data to an HDInsight cluster. You will also learn how to load and transform data to decrease your query run time.

Lessons

Storing data for HDInsight processing
Using data loading tools
Maximising value from stored data

Lab : Loading Data into your Azure account

Load data for use with HDInsight

After completing this module, students will be able to:

Discuss the architecture of key HDInsight storage solutions.
Use tools to upload data to HDInsight clusters.
Compress and serialize uploaded data for decreased processing time.

Module 5: Troubleshooting HDInsight

In this module, you will learn how to interpret logs associated with the various services of Microsoft Azure HDInsight cluster to troubleshoot any issues you might have with these services. You will also learn about Operations Management Suite (OMS) and its capabilities.

Lessons

Analyse HDInsight logs
YARN logs
Heap dumps
Operations management suite

Lab : Troubleshooting HDInsight

Analyse HDInsight logs
Analyse YARN logs
Monitor resources with Operations Management Suite

After completing this module, students will be able to:

Locate and analyse HDInsight logs.
Use YARN logs for application troubleshooting.
Understand and enable heap dumps.
Describe how the OMS can be used with Azure resources.

Module 6: Implementing Batch Solutions

In this module, you will look at implementing batch solutions in Microsoft Azure HDInsight by using Hive and Pig. You will also discuss the approaches for data pipeline operationalisation that are available for big data workloads on an HDInsight stack.

Lessons

Apache Hive storage
HDInsight data queries using Hive and Pig
Operationalise HDInsight

Lab : Implement Batch Solutions

Deploy HDInsight cluster and data storage
Use data transfers with HDInsight clusters
Query HDInsight cluster data

After completing this module, students will be able to:

Understand Apache Hive and the scenarios where it can be used.
Run batch jobs using Apache Hive and Apache Pig.
Explain the capabilities of the Microsoft Azure Data Factory and Apache Oozie and how they can orchestrate and automate big data workflows.

Module 7: Design Batch ETL solutions for big data with Spark

This module provides an overview of Apache Spark, describing its main characteristics and key features. Before you start, it's helpful to understand the basic architecture of Apache Spark and the different components that are available. The module also explains how to design batch Extract, Transform, Load (ETL) solutions for big data with Spark on HDInsight. The final lesson includes some guidelines to improve Spark performance.

Lessons

What is Spark?
ETL with Spark
Spark performance

Lab : Design Batch ETL solutions for big data with Spark.

Create a HDInsight Cluster with access to Data Lake Store
Use HDInsight Spark cluster to analyse data in Data Lake Store
Analysing website logs using a custom library with Apache Spark cluster on HDInsight
Managing resources for Apache Spark cluster on Azure HDInsight

After completing this module, students will be able to:

Describe the architecture of Spark on HDInsight.
Describe the different components required for a Spark application on HDInsight.
Identify the benefits of using Spark for ETL processes.
Create Python and Scala code in a Spark program to ingest or process data.
Identify cluster settings for optimal performance.
Track and debug jobs running on an Apache Spark cluster in HDInsight.

Module 8: Analyse Data with Spark SQL

This module describes how to analyse data by using Spark SQL. In it, you will be able to explain the differences between RDD, Datasets and Dataframes, identify the uses cases between Iterative and Interactive queries, and describe best practices for Caching, Partitioning and Persistence. You will also look at how to use Apache Zeppelin and Jupyter notebooks, carry out exploratory data analysis, then submit Spark jobs remotely to a Spark cluster.

Lessons

Implementing iterative and interactive queries
Perform exploratory data analysis

Lab : Performing exploratory data analysis by using iterative and interactive queries

Build a machine learning application
Use zeppelin for interactive data analysis
View and manage Spark sessions by using Livy

After completing this module, students will be able to:

Implement interactive queries.
Perform exploratory data analysis.

Module 9: Analyse Data with Hive and Phoenix

In this module, you will learn about running interactive queries using Interactive Hive (also known as Hive LLAP or Live Long and Process) and Apache Phoenix. You will also learn about the various aspects of running interactive queries using Apache Phoenix with HBase as the underlying query engine.

Lessons

Implement interactive queries for big data with interactive hive.
Perform exploratory data analysis by using Hive
Perform interactive processing by using Apache Phoenix

Lab : Analyse data with Hive and Phoenix

Implement interactive queries for big data with interactive Hive
Perform exploratory data analysis by using Hive
Perform interactive processing by using Apache Phoenix

After completing this module, students will be able to:

Implement interactive queries with interactive Hive.
Perform exploratory data analysis using Hive.
Perform interactive processing by using Apache Phoenix.

Module 10: Stream Analytics

The Microsoft Azure Stream Analytics service has some built-in features and capabilities that make it as easy to use as a flexible stream processing service in the cloud. You will see that there are a number of advantages to using Stream Analytics for your streaming solutions, which you will discuss in more detail. You will also compare features of Stream Analytics to other services available within the Microsoft Azure HDInsight stack, such as Apache Storm. You will learn how to deploy a Stream Analytics job, connect it to the Microsoft Azure Event Hub to ingest real-time data, and execute a Stream Analytics query to gain low-latency insights. After that, you will learn how Stream Analytics jobs can be monitored when deployed and used in production settings.

Lessons

Stream analytics
Process streaming data from stream analytics
Managing stream analytics jobs

Lab : Implement Stream Analytics

Process streaming data with stream analytics
Managing stream analytics jobs

After completing this module, students will be able to:

Describe stream analytics and its capabilities.
Process streaming data with stream analytics.
Manage stream analytics jobs.

Module 11: Implementing Streaming Solutions with Kafka and HBase

In this module, you will learn how to use Kafka to build streaming solutions. You will also see how to use Kafka to persist data to HDFS by using Apache HBase, and then query this data.

Lessons

Building and Deploying a Kafka Cluster
Publishing, Consuming, and Processing data using the Kafka Cluster
Using HBase to store and Query Data

Lab : Implementing Streaming Solutions with Kafka and HBase

Create a virtual network and gateway
Create a storm cluster for Kafka
Create a Kafka producer
Create a streaming processor client topology
Create a Power BI dashboard and streaming dataset
Create an HBase cluster
Create a streaming processor to write to HBase

After completing this module, students will be able to:

Build and deploy a Kafka Cluster.
Publish data to a Kafka Cluster, consume data from a Kafka Cluster, and perform stream processing using the Kafka Cluster.
Save streamed data to HBase, and perform queries using the HBase API.

Module 12: Develop big data real-time processing solutions with Apache Storm

This module explains how to develop big data real-time processing solutions with Apache Storm.

Lessons

Persist long term data
Stream data with Storm
Create Storm topologies
Configure Apache Storm

Lab : Developing big data real-time processing solutions with Apache Storm

Stream data with Storm
Create Storm Topologies

After completing this module, students will be able to:

Persist long term data.
Stream data with Storm.
Create Storm topologies.
Configure Apache Storm.

Module 13: Create Spark Streaming Applications

This module describes Spark Streaming; explains how to use discretised streams (DStreams); and explains how to apply the concepts to develop Spark Streaming applications.

Lessons

Working with Spark Streaming
Creating Spark Structured Streaming Applications
Persistence and Visualisation

Lab : Building a Spark Streaming Application

Installing Required Software
Building the Azure Infrastructure
Building a Spark Streaming Pipeline

After completing this module, students will be able to:

Describe Spark Streaming and how it works.
Use discretised streams (DStreams).
Work with sliding window operations.
Apply the concepts to develop Spark Streaming applications.
Describe Structured Streaming.

20776A: Performing Big Data Engineering on Microsoft Cloud Services

Module 1: Architectures for Big Data Engineering with Azure

This module describes common architectures for processing big data using Azure tools and services.

Lessons

Understanding Big Data
Architectures for Processing Big Data
Considerations for designing Big Data solutions

Lab : Designing a Big Data Architecture

Design a big data architecture

After completing this module, students will be able to:

Explain the concept of Big Data.
Describe the Lambda and Kappa architectures.
Describe design considerations for building Big Data Solutions with Azure.

Module 2: Processing Event Streams using Azure Stream Analytics

This module describes how to use Azure Stream Analytics to design and implement stream processing over large-scale data.

Lessons

Introduction to Azure Stream Analytics
Configuring Azure Stream Analytics jobs

Lab : Processing Event Streams with Azure Stream Analytics

Create an Azure Stream Analytics job
Create another Azure Stream job
Add an Input
Edit the ASA job
Determine the nearest Patrol Car

After completing this module, students will be able to:

Describe the purpose and structure of Azure Stream Analytics.
Configure Azure Stream Analytics jobs for scalability, reliability and security.

Module 3: Performing custom processing in Azure Stream Analytics

This module describes how to include custom functions and incorporate machine learning activities into an Azure Stream Analytics job.

Lessons

Implementing Custom Functions
Incorporating Machine Learning into an Azure Stream Analytics Job

Lab : Performing Custom Processing with Azure Stream Analytics

Add logic to the analytics
Detect consistent anomalies
Determine consistencies using machine learning and ASA

After completing this module, students will be able to:

Describe how to create and use custom functions in Azure Stream Analytics.
Describe how to use Azure Machine Learning models in an Azure Stream Analytics job.

Module 4: Managing Big Data in Azure Data Lake Store

This module describes how to use Azure Data Lake Store as a large-scale repository of data files.

Lessons

Using Azure Data Lake Store
Monitoring and protecting data in Azure Data Lake Store

Lab : Managing Big Data in Azure Data Lake Store

Update the ASA Job
Upload details to ADLS

After completing this module, students will be able to:

Describe how to create an Azure Data Lake Store, create folders, and upload data.
Explain how to monitor an Azure Data Lake account, and protect the data that it contains.

Module 5: Processing Big Data using Azure Data Lake Analytics

This module describes how to use Azure Data Lake Analytics to examine and process data held in Azure Data Lake Store.

Lessons

Introduction to Azure Data Lake Analytics
Analysing Data with U-SQL
Sorting, grouping, and joining data

Lab : Processing Big Data using Azure Data Lake Analytics

Add functionality
Query against Database
Calculate average speed

After completing this module, students will be able to:

Describe the purpose of Azure Data Lake Analytics, and how to create and run jobs.
Describe how to use USQL to process and analyse data.
Describe how to use windowing to sort data and perform aggregated operations, and how to join data from multiple sources.

Module 6: Implementing custom operations and monitoring performance in Azure Data Lake Analytics

This module describes how to create and deploy custom functions and operations, integrate with Python and R, and protect and optimise jobs.

Lessons

Incorporating custom functionality into Analytics jobs
Managing and Optimising jobs

Lab : Implementing custom operations and monitoring performance in Azure Data Lake Analytics

Custom extractor
Custom processor
Integration with R/Python
Monitor and optimise a job

After completing this module, students will be able to:

Describe how to incorporate custom features and assemblies into USQL.
Describe how to implement security to protect jobs, and how to monitor and optimise jobs to ensure efficient operations.

Module 7: Implementing Azure SQL Data Warehouse

This module describes how to use Azure SQL Data Warehouse to create a repository that can support large-scale analytical processing over data at rest.

Lessons

Introduction to Azure SQL Data Warehouse
Designing tables for efficient queries
Importing Data into Azure SQL Data Warehouse

Lab : Implementing Azure SQL Data Warehouse

Create a new data warehouse
Design and create tables and indexes
Import data into the warehouse.

After completing this module, students will be able to:

Describe the purpose and structure of Azure SQL Data Warehouse.
Describe how to design table to optimise the processing performed by the data warehouse.
Describe tools and techniques for importing data into a warehouse at scale.

Module 8: Performing Analytics with Azure SQL Data Warehouse

This module describes how to import data in Azure SQL Data Warehouse, and how to protect this data.

Lessons

Querying Data in Azure SQL Data Warehouse
Maintaining Performance
Protecting Data in Azure SQL Data Warehouse

Lab : Performing Analytics with Azure SQL Data Warehouse

Performing queries and tuning performance
Integrating with Power BI and Azure Machine Learning
Configuring security and analysing threats

After completing this module, students will be able to:

Describe how to perform queries and use the data held in a data warehouse to perform analytics and generate reports.
Describe how to configure and monitor a data warehouse to maintain good performance.
Describe how to protect data and manage security in a data warehouse.

Module 9: Automating the Data Flow with Azure Data Factory

This module describes how to use Azure Data Factory to import, transform, and transfer data between repositories and services.

Lessons

Introduction to Azure Data Factory
Transferring Data
Transforming Data
Monitoring Performance and Protecting Data

Lab : Automating the Data Flow with Azure Data Factory

Automate the Data Flow with Azure Data Factory

After completing this module, students will be able to:

Describe the purpose of Azure Data Factory, and explain how it works.
Describe how to create Azure Data Factory pipelines that can transfer data efficiently.
Describe how to perform transformations using an Azure Data Factory pipeline.
Describe how to monitor Azure Data Factory pipelines, and how to protect the data flowing through these pipelines.

Exam Track

You'll sit the following exams at the Firebrand training centre during the course, covered by your Certification Guarantee:

70-775: Perform Data Engineering on Microsoft HD Insight

Skills measured:

Administer and Provision HDInsight Clusters
Implement Big Data Batch Processing Solutions
Implement Big Data Interactive Processing Solutions
Implement Big Data Real-Time Processing Solutions

70-776: Engineering Data with Microsoft Cloud Services - currently in beta

Skills measured:

Design and Implement Complex Event Processing By Using Azure Stream Analytics (15-20%)
Design and Implement Analytics by Using Azure Data Lake (25-30%)
Design and Implement Azure SQL Data Warehouse Solutions (15-20%)
Design and Implement Cloud-Based Integration by using Azure Data Factory (15-20%)
Manage and Maintain Azure SQL Data Warehouse, Azure Data Lake, Azure Data Factory, and Azure Stream Analytics (20-25%)

This exam is currently in beta. If you pass it, you will receive full credit toward applicable certifications, but you will not receive a score report or pass/fail notification until 8-12 weeks following the conclusion of the beta period.

What's Included

Microsoft Official Curriculum

MOC 20775A: Perform Data Engineering on Microsoft HD Insight
MOC 20776A: Engineering Data with Microsoft Cloud Services

Prerequisites

Before attending this course, you must meet the following prerequisite knowledge and skills of:

Azure data services
Microsoft Windows operating system and its core functionality
Relational databases
Programming using R, and familiarity with common R packages
Common statistical methods and data analysis best practices.

Benefits

Seven reasons why you should sit your course with Firebrand Training

Two training options. Choose between residential classroom-based and online courses
You'll be certified fast. With us, you’ll be trained in record time
Our course is all-inclusive. A one-off fee covers all course materials, exams**, accommodation* and meals*. No hidden extras.
Pass the first time or train again for free. This is our guarantee. We’re confident you’ll pass your course the first time. But if not, come back within a year and only pay for accommodation, exams and incidental costs
You’ll learn more. A day with a traditional training provider generally runs from 9am–5pm, with a nice long break for lunch. With Firebrand, you’ll get at least 12 hours/day of quality learning time with your instructor
You’ll learn faster. Chances are, you’ll have a different learning style to those around you. We combine visual, auditory and tactile styles to deliver the material in a way that ensures you will learn faster and more easily
You’ll be studying with the best. We’ve been named in the Training Industry’s “Top 20 IT Training Companies of the Year” every year since 2010. As well as winning many more awards, we’ve trained and certified over 135,000 professionals

*For residential training only. Doesn't apply for online courses
**Some exceptions apply. Please refer to the Exam Track or speak with our experts

Are you ready for the course?

Get access to free practice tests for your course Free Practice Test

Course Dates

Sorry, there are currently no dates available for this course. Please submit an enquiry and one of our team will contact you about potential future dates or alternative options.

The contact information you provide, allows us to respond to your query and to contact you about our products and services. You may unsubscribe from these communications at any time. For information on how to unsubscribe, as well as our privacy practices and commitment to protecting your privacy, please review our Privacy Notice.