End of 2024 20% Discount Promotion
Only 5 days
Classroom
16/12/2024 (Monday)
Overview
Your 5-day accelerated MCSA: Data Engineering with Azure course will develop the skills to design and implement big data engineering workflows with the Microsoft cloud ecosystem and Microsoft HD Insight to extract strategic value from your data.
Your expert Microsoft Certified Trainer (MCT) will immerse you in Microsoft Official Curriculum (MOC) at our distraction-free training centre, allowing you to focus 100% on learning. You'll experience Firebrand's Lecture | Lab | Review technique, combining hands-on practical labs, theory and review sessions to reinforce learning and develop skills and knowledge faster.
You'll cover a range of exciting topics including:
- Deploying and securing multi-user HDInsight clusters
- Implementing batch solutions with Hive and Apache Pig
- Creating Spark streaming applications using DStream API
- Developing big data real-time processing solutions with Apache Storm
- Performing exploratory data analysis by using Spark SQL
You'll be prepared for exams 70-775: Perform Data Engineering on Microsoft HD Insight and 70-776: Engineering Data with Microsoft Cloud Services. You'll sit these on-site during the course, covered by your Certification Guarantee.
Your MCSA: Data Engineering with Azure certification will validate skills in implementing big data engineering workflows with Microsoft Cloud Services and Microsoft HDInsight. Ideal if you're a data engineer, data architect, data scientist or data developer. Earning this certifcation acts as your first step towards achieving the MCSE: Data Management and Analytics credential.
Curriculum
20775A: Perform Data Engineering on Microsoft Azure HDInsight
Module 1: Getting Started with HDInsight
This module introduces Hadoop, the MapReduce paradigm, and HDInsight.
Lessons
- What is Big Data?
- Introduction to Hadoop
- Working with MapReduce Function
- Introducing HDInsight
Lab : Working with HDInsight
- Provision an HDInsight cluster and run MapReduce jobs
After completing this module, students will be able to:
- Describe Hadoop, MapReduce and HDInsight.
- Use scripts to provision an HDInsight Cluster.
- Run a word-counting MapReduce program using PowerShell.
Module 2: Deploying HDInsight Clusters
This module provides an overview of the Microsoft Azure HDInsight cluster types, in addition to the creation and maintenance of the HDInsight clusters. The module also demonstrates how to customise clusters by using script actions through the Azure Portal, Azure PowerShell, and the Azure command-line interface (CLI). This module includes labs that provide the steps to deploy and manage the clusters.
Lessons
- Identifying HDInsight cluster types
- Managing HDInsight clusters by using the Azure portal
- Managing HDInsight Clusters by using Azure PowerShell
Lab : Managing HDInsight clusters with the Azure Portal
- Create an HDInsight cluster that uses Data Lake Store storage
- Customise HDInsight by using script actions
- Delete an HDInsight cluster
After completing this module, students will be able to:
- Identify HDInsight cluster types
- Manage HDInsight clusters by using the Azure Portal.
- Manage HDInsight clusters by using Azure PowerShell.
Module 3: Authorising Users to Access Resources
This module provides an overview of non-domain and domain-joined Microsoft HDInsight clusters, in addition to the creation and configuration of domain-joined HDInsight clusters. The module also demonstrates how to manage domain-joined clusters using the Ambari management UI and the Ranger Admin UI. This module includes the labs that will provide the steps to create and manage domain-joined clusters.
Lessons
- Non-domain Joined clusters
- Configuring domain-joined HDInsight clusters
- Manage domain-joined HDInsight clusters
Lab : Authorising Users to Access Resources
- Prepare the Lab Environment
- Manage a non-domain joined cluster
After completing this module, students will be able to:
- Identify the characteristics of non-domain and domain-joined HDInsight clusters.
- Create and configure domain-joined HDInsight clusters through the Azure PowerShell.
- Manage the domain-joined cluster using the Ambari management UI and the Ranger Admin UI.
- Create Hive policies and manage user permissions.
Module 4: Loading data into HDInsight
This module provides an introduction to loading data into Microsoft Azure Blob storage and Microsoft Azure Data Lake storage. At the end of this lesson, you will know how to use multiple tools to transfer data to an HDInsight cluster. You will also learn how to load and transform data to decrease your query run time.
Lessons
- Storing data for HDInsight processing
- Using data loading tools
- Maximising value from stored data
Lab : Loading Data into your Azure account
- Load data for use with HDInsight
After completing this module, students will be able to:
- Discuss the architecture of key HDInsight storage solutions.
- Use tools to upload data to HDInsight clusters.
- Compress and serialize uploaded data for decreased processing time.
Module 5: Troubleshooting HDInsight
In this module, you will learn how to interpret logs associated with the various services of Microsoft Azure HDInsight cluster to troubleshoot any issues you might have with these services. You will also learn about Operations Management Suite (OMS) and its capabilities.
Lessons
- Analyse HDInsight logs
- YARN logs
- Heap dumps
- Operations management suite
Lab : Troubleshooting HDInsight
- Analyse HDInsight logs
- Analyse YARN logs
- Monitor resources with Operations Management Suite
After completing this module, students will be able to:
- Locate and analyse HDInsight logs.
- Use YARN logs for application troubleshooting.
- Understand and enable heap dumps.
- Describe how the OMS can be used with Azure resources.
Module 6: Implementing Batch Solutions
In this module, you will look at implementing batch solutions in Microsoft Azure HDInsight by using Hive and Pig. You will also discuss the approaches for data pipeline operationalisation that are available for big data workloads on an HDInsight stack.
Lessons
- Apache Hive storage
- HDInsight data queries using Hive and Pig
- Operationalise HDInsight
Lab : Implement Batch Solutions
- Deploy HDInsight cluster and data storage
- Use data transfers with HDInsight clusters
- Query HDInsight cluster data
After completing this module, students will be able to:
- Understand Apache Hive and the scenarios where it can be used.
- Run batch jobs using Apache Hive and Apache Pig.
- Explain the capabilities of the Microsoft Azure Data Factory and Apache Oozie and how they can orchestrate and automate big data workflows.
Module 7: Design Batch ETL solutions for big data with Spark
This module provides an overview of Apache Spark, describing its main characteristics and key features. Before you start, it's helpful to understand the basic architecture of Apache Spark and the different components that are available. The module also explains how to design batch Extract, Transform, Load (ETL) solutions for big data with Spark on HDInsight. The final lesson includes some guidelines to improve Spark performance.
Lessons
- What is Spark?
- ETL with Spark
- Spark performance
Lab : Design Batch ETL solutions for big data with Spark.
- Create a HDInsight Cluster with access to Data Lake Store
- Use HDInsight Spark cluster to analyse data in Data Lake Store
- Analysing website logs using a custom library with Apache Spark cluster on HDInsight
- Managing resources for Apache Spark cluster on Azure HDInsight
After completing this module, students will be able to:
- Describe the architecture of Spark on HDInsight.
- Describe the different components required for a Spark application on HDInsight.
- Identify the benefits of using Spark for ETL processes.
- Create Python and Scala code in a Spark program to ingest or process data.
- Identify cluster settings for optimal performance.
- Track and debug jobs running on an Apache Spark cluster in HDInsight.
Module 8: Analyse Data with Spark SQL
This module describes how to analyse data by using Spark SQL. In it, you will be able to explain the differences between RDD, Datasets and Dataframes, identify the uses cases between Iterative and Interactive queries, and describe best practices for Caching, Partitioning and Persistence. You will also look at how to use Apache Zeppelin and Jupyter notebooks, carry out exploratory data analysis, then submit Spark jobs remotely to a Spark cluster.
Lessons
- Implementing iterative and interactive queries
- Perform exploratory data analysis
Lab : Performing exploratory data analysis by using iterative and interactive queries
- Build a machine learning application
- Use zeppelin for interactive data analysis
- View and manage Spark sessions by using Livy
After completing this module, students will be able to:
- Implement interactive queries.
- Perform exploratory data analysis.
Module 9: Analyse Data with Hive and Phoenix
In this module, you will learn about running interactive queries using Interactive Hive (also known as Hive LLAP or Live Long and Process) and Apache Phoenix. You will also learn about the various aspects of running interactive queries using Apache Phoenix with HBase as the underlying query engine.
Lessons
- Implement interactive queries for big data with interactive hive.
- Perform exploratory data analysis by using Hive
- Perform interactive processing by using Apache Phoenix
Lab : Analyse data with Hive and Phoenix
- Implement interactive queries for big data with interactive Hive
- Perform exploratory data analysis by using Hive
- Perform interactive processing by using Apache Phoenix
After completing this module, students will be able to:
- Implement interactive queries with interactive Hive.
- Perform exploratory data analysis using Hive.
- Perform interactive processing by using Apache Phoenix.
Module 10: Stream Analytics
The Microsoft Azure Stream Analytics service has some built-in features and capabilities that make it as easy to use as a flexible stream processing service in the cloud. You will see that there are a number of advantages to using Stream Analytics for your streaming solutions, which you will discuss in more detail. You will also compare features of Stream Analytics to other services available within the Microsoft Azure HDInsight stack, such as Apache Storm. You will learn how to deploy a Stream Analytics job, connect it to the Microsoft Azure Event Hub to ingest real-time data, and execute a Stream Analytics query to gain low-latency insights. After that, you will learn how Stream Analytics jobs can be monitored when deployed and used in production settings.
Lessons
- Stream analytics
- Process streaming data from stream analytics
- Managing stream analytics jobs
Lab : Implement Stream Analytics
- Process streaming data with stream analytics
- Managing stream analytics jobs
After completing this module, students will be able to:
- Describe stream analytics and its capabilities.
- Process streaming data with stream analytics.
- Manage stream analytics jobs.
Module 11: Implementing Streaming Solutions with Kafka and HBase
In this module, you will learn how to use Kafka to build streaming solutions. You will also see how to use Kafka to persist data to HDFS by using Apache HBase, and then query this data.
Lessons
- Building and Deploying a Kafka Cluster
- Publishing, Consuming, and Processing data using the Kafka Cluster
- Using HBase to store and Query Data
Lab : Implementing Streaming Solutions with Kafka and HBase
- Create a virtual network and gateway
- Create a storm cluster for Kafka
- Create a Kafka producer
- Create a streaming processor client topology
- Create a Power BI dashboard and streaming dataset
- Create an HBase cluster
- Create a streaming processor to write to HBase
After completing this module, students will be able to:
- Build and deploy a Kafka Cluster.
- Publish data to a Kafka Cluster, consume data from a Kafka Cluster, and perform stream processing using the Kafka Cluster.
- Save streamed data to HBase, and perform queries using the HBase API.
Module 12: Develop big data real-time processing solutions with Apache Storm
This module explains how to develop big data real-time processing solutions with Apache Storm.
Lessons
- Persist long term data
- Stream data with Storm
- Create Storm topologies
- Configure Apache Storm
Lab : Developing big data real-time processing solutions with Apache Storm
- Stream data with Storm
- Create Storm Topologies
After completing this module, students will be able to:
- Persist long term data.
- Stream data with Storm.
- Create Storm topologies.
- Configure Apache Storm.
Module 13: Create Spark Streaming Applications
This module describes Spark Streaming; explains how to use discretised streams (DStreams); and explains how to apply the concepts to develop Spark Streaming applications.
Lessons
- Working with Spark Streaming
- Creating Spark Structured Streaming Applications
- Persistence and Visualisation
Lab : Building a Spark Streaming Application
- Installing Required Software
- Building the Azure Infrastructure
- Building a Spark Streaming Pipeline
After completing this module, students will be able to:
- Describe Spark Streaming and how it works.
- Use discretised streams (DStreams).
- Work with sliding window operations.
- Apply the concepts to develop Spark Streaming applications.
- Describe Structured Streaming.
20776A: Performing Big Data Engineering on Microsoft Cloud Services
Module 1: Architectures for Big Data Engineering with Azure
This module describes common architectures for processing big data using Azure tools and services.
Lessons
- Understanding Big Data
- Architectures for Processing Big Data
- Considerations for designing Big Data solutions
Lab : Designing a Big Data Architecture
- Design a big data architecture
After completing this module, students will be able to:
- Explain the concept of Big Data.
- Describe the Lambda and Kappa architectures.
- Describe design considerations for building Big Data Solutions with Azure.
Module 2: Processing Event Streams using Azure Stream Analytics
This module describes how to use Azure Stream Analytics to design and implement stream processing over large-scale data.
Lessons
- Introduction to Azure Stream Analytics
- Configuring Azure Stream Analytics jobs
Lab : Processing Event Streams with Azure Stream Analytics
- Create an Azure Stream Analytics job
- Create another Azure Stream job
- Add an Input
- Edit the ASA job
- Determine the nearest Patrol Car
After completing this module, students will be able to:
- Describe the purpose and structure of Azure Stream Analytics.
- Configure Azure Stream Analytics jobs for scalability, reliability and security.
Module 3: Performing custom processing in Azure Stream Analytics
This module describes how to include custom functions and incorporate machine learning activities into an Azure Stream Analytics job.
Lessons
- Implementing Custom Functions
- Incorporating Machine Learning into an Azure Stream Analytics Job
Lab : Performing Custom Processing with Azure Stream Analytics
- Add logic to the analytics
- Detect consistent anomalies
- Determine consistencies using machine learning and ASA
After completing this module, students will be able to:
- Describe how to create and use custom functions in Azure Stream Analytics.
- Describe how to use Azure Machine Learning models in an Azure Stream Analytics job.
Module 4: Managing Big Data in Azure Data Lake Store
This module describes how to use Azure Data Lake Store as a large-scale repository of data files.
Lessons
- Using Azure Data Lake Store
- Monitoring and protecting data in Azure Data Lake Store
Lab : Managing Big Data in Azure Data Lake Store
- Update the ASA Job
- Upload details to ADLS
After completing this module, students will be able to:
- Describe how to create an Azure Data Lake Store, create folders, and upload data.
- Explain how to monitor an Azure Data Lake account, and protect the data that it contains.
Module 5: Processing Big Data using Azure Data Lake Analytics
This module describes how to use Azure Data Lake Analytics to examine and process data held in Azure Data Lake Store.
Lessons
- Introduction to Azure Data Lake Analytics
- Analysing Data with U-SQL
- Sorting, grouping, and joining data
Lab : Processing Big Data using Azure Data Lake Analytics
- Add functionality
- Query against Database
- Calculate average speed
After completing this module, students will be able to:
- Describe the purpose of Azure Data Lake Analytics, and how to create and run jobs.
- Describe how to use USQL to process and analyse data.
- Describe how to use windowing to sort data and perform aggregated operations, and how to join data from multiple sources.
Module 6: Implementing custom operations and monitoring performance in Azure Data Lake Analytics
This module describes how to create and deploy custom functions and operations, integrate with Python and R, and protect and optimise jobs.
Lessons
- Incorporating custom functionality into Analytics jobs
- Managing and Optimising jobs
Lab : Implementing custom operations and monitoring performance in Azure Data Lake Analytics
- Custom extractor
- Custom processor
- Integration with R/Python
- Monitor and optimise a job
After completing this module, students will be able to:
- Describe how to incorporate custom features and assemblies into USQL.
- Describe how to implement security to protect jobs, and how to monitor and optimise jobs to ensure efficient operations.
Module 7: Implementing Azure SQL Data Warehouse
This module describes how to use Azure SQL Data Warehouse to create a repository that can support large-scale analytical processing over data at rest.
Lessons
- Introduction to Azure SQL Data Warehouse
- Designing tables for efficient queries
- Importing Data into Azure SQL Data Warehouse
Lab : Implementing Azure SQL Data Warehouse
- Create a new data warehouse
- Design and create tables and indexes
- Import data into the warehouse.
After completing this module, students will be able to:
- Describe the purpose and structure of Azure SQL Data Warehouse.
- Describe how to design table to optimise the processing performed by the data warehouse.
- Describe tools and techniques for importing data into a warehouse at scale.
Module 8: Performing Analytics with Azure SQL Data Warehouse
This module describes how to import data in Azure SQL Data Warehouse, and how to protect this data.
Lessons
- Querying Data in Azure SQL Data Warehouse
- Maintaining Performance
- Protecting Data in Azure SQL Data Warehouse
Lab : Performing Analytics with Azure SQL Data Warehouse
- Performing queries and tuning performance
- Integrating with Power BI and Azure Machine Learning
- Configuring security and analysing threats
After completing this module, students will be able to:
- Describe how to perform queries and use the data held in a data warehouse to perform analytics and generate reports.
- Describe how to configure and monitor a data warehouse to maintain good performance.
- Describe how to protect data and manage security in a data warehouse.
Module 9: Automating the Data Flow with Azure Data Factory
This module describes how to use Azure Data Factory to import, transform, and transfer data between repositories and services.
Lessons
- Introduction to Azure Data Factory
- Transferring Data
- Transforming Data
- Monitoring Performance and Protecting Data
Lab : Automating the Data Flow with Azure Data Factory
- Automate the Data Flow with Azure Data Factory
After completing this module, students will be able to:
- Describe the purpose of Azure Data Factory, and explain how it works.
- Describe how to create Azure Data Factory pipelines that can transfer data efficiently.
- Describe how to perform transformations using an Azure Data Factory pipeline.
- Describe how to monitor Azure Data Factory pipelines, and how to protect the data flowing through these pipelines.
Exam Track
You'll sit the following exams at the Firebrand training centre during the course, covered by your Certification Guarantee:
70-775: Perform Data Engineering on Microsoft HD Insight
Skills measured:
- Administer and Provision HDInsight Clusters
- Implement Big Data Batch Processing Solutions
- Implement Big Data Interactive Processing Solutions
- Implement Big Data Real-Time Processing Solutions
70-776: Engineering Data with Microsoft Cloud Services - currently in beta
Skills measured:
- Design and Implement Complex Event Processing By Using Azure Stream Analytics (15-20%)
- Design and Implement Analytics by Using Azure Data Lake (25-30%)
- Design and Implement Azure SQL Data Warehouse Solutions (15-20%)
- Design and Implement Cloud-Based Integration by using Azure Data Factory (15-20%)
- Manage and Maintain Azure SQL Data Warehouse, Azure Data Lake, Azure Data Factory, and Azure Stream Analytics (20-25%)
This exam is currently in beta. If you pass it, you will receive full credit toward applicable certifications, but you will not receive a score report or pass/fail notification until 8-12 weeks following the conclusion of the beta period.
What's Included
Microsoft Official Curriculum
- MOC 20775A: Perform Data Engineering on Microsoft HD Insight
- MOC 20776A: Engineering Data with Microsoft Cloud Services
Prerequisites
Before attending this course, you must meet the following prerequisite knowledge and skills of:
- Azure data services
- Microsoft Windows operating system and its core functionality
- Relational databases
- Programming using R, and familiarity with common R packages
- Common statistical methods and data analysis best practices.
Benefits
Seven reasons why you should sit your course with Firebrand Training
- Two options of training. Choose between residential classroom-based, or online courses
- You'll be certified fast. With us, you’ll be trained in record time
- Our course is all-inclusive. A one-off fee covers all course materials, exams**, accommodation* and meals*. No hidden extras.
- Pass the first time or train again for free. This is our guarantee. We’re confident you’ll pass your course the first time. But if not, come back within a year and only pay for accommodation, exams and incidental costs
- You’ll learn more. A day with a traditional training provider generally runs from 9 am – 5 pm, with a nice long break for lunch. With Firebrand Training you’ll get at least 12 hours/day of quality learning time, with your instructor
- You’ll learn faster. Chances are, you’ll have a different learning style to those around you. We combine visual, auditory and tactile styles to deliver the material in a way that ensures you will learn faster and more easily
- You’ll be studying with the best. We’ve been named in the Training Industry’s “Top 20 IT Training Companies of the Year” every year since 2010. As well as winning many more awards, we’ve trained and certified over 135,000 professionals
*For residential training only. Doesn't apply for online courses
**Some exceptions apply. Please refer to the Exam Track or speak with our experts
Think you are ready for the course? Take a FREE practice test to assess your knowledge! Free Practice Test