From 3 days
Classroom
14/04/2025 (Monday)
Overview
On this accelerated 3-day Cloudera CCA Data Analyst course, you'll get the skills you need to apply traditional data analytics and business intelligence skills to big data.
Your expert instructor will introduce you to the tools and techniques you need to access, manipulate, transform, and analyse complex data sets using SQL and familiar scripting languages.
You'll learn topics such as:
- The features that Pig, Hive, and Impala offer for data acquisition, storage, and analysis
- The fundamentals of Apache Hadoop and data ETL (extract, transform, load), ingestion, and processing with Hadoop
- How Pig, Hive, and Impala improve productivity for typical analysis tasks
- Joining diverse datasets to gain valuable business insight
- Performing real-time, complex queries on datasets
Access to 24/7 labs means that you can test your hands-on skills in navigating the Hadoop ecosystem whenever you like. Through our unique Lecture | Lab | Review technique, you'll gain Apache Hadoop skills faster.
On this course, you'll prepare for and sit the CCA Data Analyst exam, covered by your Certification Gurantee.
If you're a data analyst, business intelligence specialist, developer, system architect or database administrator, this course is ideal for you.
Curriculum
Introduction Apache Hadoop Fundamentals
- The Motivation for Hadoop
- Hadoop Overview
- Data Storage: HDFS
- Distributed Data Processing: YARN, MapReduce, and Spark
- Data Processing and Analysis: Pig, Hive, and Impala
- Database Integration: Sqoop
- Other Hadoop Data Tools
- Exercise Scenarios
Introduction to Apache Pig
- What is Pig?
- Pig's Features
- Pig Use Cases
- Interacting with Pig
Basic Data Analysis with Apache Pig
- Pig Latin Syntax
- Loading Data
- Simple Data Types
- Field Definitions
- Data Output
- Viewing the Schema
- Filtering and Sorting Data
- Commonly Used Functions
Processing Complex Data with Apache Pig
- Storage Formats
- Complex/Nested Data Types
- Grouping
- Built-In Functions for Complex Data
- Iterating Grouped Data
Multi-Dataset Operations with Apache Pig
- Techniques for Combining Datasets
- Joining Datasets in Pig
- Set Operations
- Splitting Datasets
Apache Pig Troubleshooting and Optimisation
- Troubleshooting Pig
- Logging
- Using Hadoop's Web UI
- Data Sampling and Debugging
- Performance Overview
- Understanding the Execution Plan
- Tips for Improving the Performance of Pig Jobs
Introduction to Apache Hive and Impala
- What is Hive?
- What is Impala?
- Why Use Hive and Impala?
- Schema and Data Storage
- Comparing Hive and Impala to Traditional Databases
- Use Cases
Querying with Apache Hive and Impala
- Databases and Tables
- Basic Hive and Impala Query Language Syntax
- Data Types
- Using Hue to Execute Queries
- Using Beeline (Hive's Shell)
- Using the Impala Shell
Apache Hive and Impala Data Management
- Data Storage
- Creating Databases and Tables
- Loading Data
- Altering Databases and Tables
- Simplifying Queries with Views
- Storing Query Results
Data Storage and Performance
- Partitioning Tables
- Loading Data into Partitioned Tables
- When to Use Partitioning
- Choosing a File Format
- Using Avro and Parquet File Formats
Relational Data Analysis with Apache Hive and Impala
- Joining Datasets
- Common Built-In Functions
- Aggregation and Windowing
Complex Data with Apache Hive and Impala
- Complex Data with Hive
- Complex Data with Impala
Analysing Text with Apache Hive and Impala
- Using Regular Expressions with
- Hive and Impala
- Processing Text Data with SerDes in Hive
- Sentiment Analysis and n-grams in Hive
Apache Hive Optimisation
- Understanding Query Performance
- Bucketing
- Indexing Data
- Hive on Spark
Apache Impala Optimisation
- How Impala Executes Queries
- Improving Impala Performance
Extending Apache Hive and Impala
- Custom SerDes and File Formats in Hive
- Data Transformation with
- Custom Scripts in Hive
- User-Defined Functions
- Parameterised Queries
Choosing the Best Tool for the Job
- Comparing Pig, Hive, Impala, and Relational Databases
Exam Track
On this course, you'll prepare for and take the following exam at the Firebrand Training centre, covered by your Certification Guarantee.
CCA Data Analyst Exam (CCA159)
- Number of questions: 8-12
- Format: performance-based
- Duration: 120 minutes
- Passing Score: 70%
What's Included
On this course, you'll receive:
- Official Cloudera Data Analyst courseware
Prerequisites
Before attending this course, you should have knowledge of:
- SQL
- Linux command line
- At least one scripting language (e.g., Bash scripting, Perl, Python, Ruby).
You don't need to have experience in Apache Hadoop.
Benefits
Seven reasons why you should sit your course with Firebrand Training
- Two training options. Choose between residential classroom-based and online courses
- You'll be certified fast. With us, you’ll be trained in record time
- Our course is all-inclusive. A one-off fee covers all course materials, exams**, accommodation* and meals*. No hidden extras.
- Pass the first time or train again for free. This is our guarantee. We’re confident you’ll pass your course the first time. But if not, come back within a year and only pay for accommodation, exams and incidental costs
- You’ll learn more. A day with a traditional training provider generally runs from 9am–5pm, with a nice long break for lunch. With Firebrand, you’ll get at least 12 hours/day of quality learning time with your instructor
- You’ll learn faster. Chances are, you’ll have a different learning style to those around you. We combine visual, auditory and tactile styles to deliver the material in a way that ensures you will learn faster and more easily
- You’ll be studying with the best. We’ve been named in the Training Industry’s “Top 20 IT Training Companies of the Year” every year since 2010. As well as winning many more awards, we’ve trained and certified over 135,000 professionals
*For residential training only. Doesn't apply for online courses
**Some exceptions apply. Please refer to the Exam Track or speak with our experts
Are you ready for the course?
Get access to free practice tests for your course Free Practice Test
Course Dates
Sorry, there are currently no dates available for this course. Please submit an enquiry and one of our team will contact you about potential future dates or alternative options.
The contact information you provide, allows us to respond to your query and to contact you about our products and services. You may unsubscribe from these communications at any time. For information on how to unsubscribe, as well as our privacy practices and commitment to protecting your privacy, please review our Privacy Notice.