HDP Developer: Quick Start - Hortonworks Official Curriculum
Details
This 4 day training course is designed for developers who need to create applications to analyze Big Data stored in Apache Hadoop using Apache Pig and Apache Hive, and developing applications on Apache Spark.
Topics include: Essential understanding of HDP and its capabilities, Hadoop, YARN, HDFS, MapReduce/Tez, data ingestion, using Pig and Hive to perform data analytics on Big Data and an introduction to Spark Core, Spark SQL, Apache Zeppelin, and additional Spark
features.
Outline
DAY 1: AN INTRODUCTION TO APACHE HADOOP AND HDFS
OBJECTIVES
-
The Case for Hadoop
-
The Hadoop Ecosystem
-
The HDFS Architecture
-
Ingesting Data Into HDFS
-
Parallel Processing Fundamentals
-
YARN Architecture
-
Introduction to Apache Pig
LABS
-
Starting anHDP Cluster
-
Using HDFS Commands
-
Demonstration: Understanding Apache Pig
-
Getting Started with Apache Pig
-
Exploring Data with Pig
DAY 2: ADVANCED APACHE PIG PROGRAMMING
OBJECTIVES
-
Advanced Apache Pig Programming
-
Introduction to Apache Hive
-
Using HCatalog
LABS
-
Splitting a Dataset
-
Joining Datasets
-
Preparing Data for Apache Hive
-
Understanding Apache Hive Tables
-
Demonstration: Understanding Partitions and Skew
-
Analyzing Big Data with Apache Hive
-
Demonstration: Computing Ngrams
-
Joining Datasets in Apache Hive
-
Computing NGrams of Emails in Avro Format
-
Using HCatalog with Apache Pig
DAY 3: ADVANCED APACHE HIVE PROGRAMMING
OBJECTIVES
-
Advanced Apache Hive Programming
-
An Overview of Apache Zeppelin and Apache Spark
-
An Introduction to RDD Programming
-
An Introduction to Pair RDDs
LABS
-
Advanced Apache Hive Programming
-
Introduction to Apache Spark REPLs and Apache Zeppelin
-
Creating and Manipulating RDDs
-
Creating and Manipulating Pair RDDs
DAY 4: WORKING WITH PAIR RDDS AND BUILDING YARN APPLICATIONS
OBJECTIVES
-
An Introduction to Pair RDDs (Continued)
-
An Introduction to Spark SQL
-
Caching and Persisting
-
Building and Submitting Applications to YARN
LABS
-
Creating and Saving DateFrames and Tables
-
Working with DataFrames
-
Building and Submitting Applications to YARN