Nanodegree key: nd029
Version: 3.0.0
Locale: en-us
Learn the latest skills to process data in real-time by building fluency in modern data engineering tools, such as Apache Spark, Kafka, Spark Streaming, and Kafka Streaming.
Content
Part 01 : Welcome to the Data Streaming Nanodegree Program
-
Module 01: Welcome to the Data Streaming Nanodegree Program
-
Lesson 01: Data Streaming Nanodegree Program Introduction
You are starting a challenging but rewarding journey! Take a few minutes to read how to get help with projects and content.
-
Lesson 02: Introduction to Data Streaming
Learn how to process data in real-time by building fluency in modern data engineering tools, such as Apache Spark, Kafka, Spark Streaming, and Kafka Streaming
-
Lesson 03: Nanodegree Career Services
The Careers team at Udacity is here to help you move forward in your career - whether it's finding a new job, exploring a new career path, or applying new skills to your current job.
-
Part 02 : Data Ingestion with Kafka & Kafka Streaming
Learn to use REST Proxy, Kafka Connect, KSQL, and Faust Python Stream Processing and use it to stream public transit statuses using Kafka and Kafka ecosystem to build a stream processing application that shows the status of trains in real-time.
-
Module 01: Data Ingestion with Kafka & Kafka Streaming
-
Lesson 01: Introduction to Stream Processing
In this lesson students will learn what data streaming is. Students will learn the pros and cons of data streaming, and how it compares to traditional data strategies.
- Concept 01: Welcome
- Concept 02: Lesson Glossary of Key Terms
- Concept 03: Intro to Stream Processing
- Concept 04: Stream Processing Examples
- Concept 05: Stream vs. Batch Processing
- Concept 06: Review: Stream Processing
- Concept 07: Append-Only Logs
- Concept 08: Log-structured storage
- Concept 09: Kafka: A Stream Processing Tool
- Concept 10: Kafka in Industry
- Concept 11: Kafka in Action
- Concept 12: What is a Kafka Topic?
- Concept 13: What is a Kafka Producer?
- Concept 14: What is a Kafka Consumer?
- Concept 15: Using the Kafka CLI tools
- Concept 16: Kafka Python Library
- Concept 17: Lesson Summary
-
Lesson 02: Apache Kafka
In this lesson we’ll review the architecture and configuration of Apache Kafka.
- Concept 01: Lesson Overview
- Concept 02: Lesson Glossary of Key Terms
- Concept 03: Kafka Runs Anywhere
- Concept 04: How Kafka Stores Data
- Concept 05: Kafka Data Partitioning
- Concept 06: Kafka Data Replication
- Concept 07: Explore How Kafka Works
- Concept 08: Summary: How Kafka Works
- Concept 09: Kafka Topics in Depth
- Concept 10: Partitioning Topics
- Concept 11: Review: Topic Partitioning
- Concept 12: Naming Kafka Topics
- Concept 13: Topic Data Management
- Concept 14: Topic Creation
- Concept 15: Kafka Topics Summary
- Concept 16: Kafka Producers
- Concept 17: Synchronous Production
- Concept 18: Asynchronous Production
- Concept 19: Message Serialization
- Concept 20: Synchronous or Asynchronous?
- Concept 21: Producer Configuration
- Concept 22: Review: Producer Configuration
- Concept 23: Batching Configuration
- Concept 24: Configure a Producer
- Concept 25: Kafka Producers Summary
- Concept 26: Kafka Consumers
- Concept 27: Consumer Offsets
- Concept 28: Practice with Consumer Offsets
- Concept 29: Consumer Groups
- Concept 30: Consumer Subscriptions
- Concept 31: Consumer Deserializers
- Concept 32: Retrieving Data from Kafka
- Concept 33: Build a Kafka Consumer
- Concept 34: Kafka Consumers Summary
- Concept 35: Consumer Performance
- Concept 36: Producer Performance
- Concept 37: Broker Performance
- Concept 38: Performance Summary
- Concept 39: Record Removal; Data Privacy
- Concept 40: Per-User Key Encryption
- Concept 41: Review: Record Removal & Data Privacy
- Concept 42: Summary: Record Removal & Data Privacy
- Concept 43: Lesson Summary
-
Lesson 03: Data Schemas and Apache Avro
This lesson covers data schemas and data schema management, with a focus on Apache Avro.
- Concept 01: Intro to Data Schemas
- Concept 02: Lesson Glossary of Key Terms
- Concept 03: Understanding Data Schemas
- Concept 04: Real-world Usage
- Concept 05: Data Streaming w/o Schemas
- Concept 06: Data Streaming with Schemas
- Concept 07: Summary: Data Schemas
- Concept 08: Apache Avro
- Concept 09: What is Apache Avro?
- Concept 10: How Avro Schemas are Defined
- Concept 11: Practice: Defining an Avro Record
- Concept 12: Apache Avro Data Types
- Concept 13: Complex Records in Avro
- Concept 14: Apache Avro Summary
- Concept 15: Apache Avro and Kafka
- Concept 16: Schema Registry
- Concept 17: Kafka - Schema Registry Integration
- Concept 18: Review: Kafka - Schema Registry Integration
- Concept 19: Integrating Schema Registry
- Concept 20: Schema Registry Summary
- Concept 21: Schema Evolution; Compatibility
- Concept 22: Understanding Schema Evolution
- Concept 23: Schema Compatibility
- Concept 24: Backward Compatibility
- Concept 25: Forward Compatibility
- Concept 26: Full Compatibility
- Concept 27: No Compatibility
- Concept 28: Summary: Schema Evolution & Compatibility
- Concept 29: Lesson Summary
-
Lesson 04: Kafka Connect and REST Proxy
This lesson covers producing and consuming data into Kafka with Kafka Connect and REST Proxy.
- Concept 01: Lesson Overview
- Concept 02: Lesson Glossary of Key Terms
- Concept 03: Kafka Connect
- Concept 04: Kafka Connect Architecture
- Concept 05: Kafka Connect Summary
- Concept 06: Kafka Connect Connectors
- Concept 07: Reviewing Kafka Connectors
- Concept 08: The Kafka Connect API
- Concept 09: Using the Kafka Connect API
- Concept 10: Summary: Kafka Connect Connectors
- Concept 11: Key Connectors
- Concept 12: Kafka Connect FileStream Source
- Concept 13: JDBC Sinks and Sources
- Concept 14: Kafka Connect JDBC Source
- Concept 15: Key Connectors Summary
- Concept 16: Kafka REST Proxy
- Concept 17: REST Proxy Architecture
- Concept 18: Review: REST Proxy Architecture
- Concept 19: Practice: REST Proxy Metadata API
- Concept 20: REST Proxy Summary
- Concept 21: Using REST Proxy
- Concept 22: Review of REST Proxy Usage
- Concept 23: Producing JSON Data via REST Proxy
- Concept 24: Producing Avro Data via REST Proxy
- Concept 25: Consuming Data with REST Proxy
- Concept 26: Practice: Consuming Avro Data via REST Proxy
- Concept 27: Summary: Using REST Proxy
- Concept 28: Lesson Summary
-
Lesson 05: Stream Processing Fundamentals
Learn to build real-time applications that instantly process events, the concepts of stream processing state storage, windowed processing, and stateful and non-stateful stream processing.
- Concept 01: Lesson Overview
- Concept 02: Glossary of Terms for Lesson
- Concept 03: Stream Processing Basics
- Concept 04: Stream Processing Strategies
- Concept 05: Combining Streams
- Concept 06: Filtering Streams
- Concept 07: Remapping Streams
- Concept 08: Aggregating Streams
- Concept 09: Handling Time
- Concept 10: Tumbling Window
- Concept 11: Hopping Window
- Concept 12: Sliding Window
- Concept 13: Streams and Tables
- Concept 14: Streams
- Concept 15: Tables
- Concept 16: Streams vs Tables
- Concept 17: Data Storage
- Concept 18: Lesson Summary
-
Lesson 06: Stream Processing with Faust
Students will learn how to use the Python stream processing library Faust to rapidly create powerful stream processing applications.
- Concept 01: Lesson Overview
- Concept 02: Glossary of Terms in Lesson
- Concept 03: Stream Processing with Faust
- Concept 04: Introduction to Faust
- Concept 05: Your First Faust Application
- Concept 06: Serialization and Deserialization in Faust
- Concept 07: Deserialization in Faust
- Concept 08: Serialization in Faust
- Concept 09: Summary: Serialization & Deserialization
- Concept 10: Storage in Faust
- Concept 11: Streams Basics in Faust
- Concept 12: Practice: Streams
- Concept 13: Stream Processors & Operations
- Concept 14: Practice: Processors & Operations
- Concept 15: Streams Summary
- Concept 16: Tables in Faust
- Concept 17: Windowing in Faust
- Concept 18: Practice: Tumbling Windows
- Concept 19: Practice: Hopping Windows
- Concept 20: Faust Windowing Review
- Concept 21: Lesson Summary
-
Lesson 07: KSQL
Learn how to write simple SQL queries to turn Kafka topics into KSQL streams and tables, and then write those tables back out to Kafka.
- Concept 01: Lesson Overview
- Concept 02: Glossary of Terms for Lesson
- Concept 03: Stream Processing with KSQL
- Concept 04: Introduction to KSQL
- Concept 05: KSQL Architecture
- Concept 06: KSQL vs Traditional Frameworks
- Concept 07: Topics --> Tables & Streams
- Concept 08: Creating a Stream
- Concept 09: Practice: Creating a Stream
- Concept 10: Creating a Table
- Concept 11: Practice: Creating a Table
- Concept 12: Querying
- Concept 13: Hopping and Tumbling Windowing
- Concept 14: Session Windowing
- Concept 15: Practice: Session Windowing
- Concept 16: Aggregating Data
- Concept 17: Practice: Aggregating Data
- Concept 18: Joins
- Concept 19: Practice: Joins
- Concept 20: Summary
-
Lesson 08: Optimizing Public Transportation
For your first project, you’ll be streaming public transit status using Kafka and the Kafka ecosystem to build a stream processing application that shows the status of trains in real-time.
-
Lesson 09: Optimize Your GitHub Profile
Other professionals are collaborating on GitHub and growing their network. Submit your profile to ensure your profile is on par with leaders in your field.
- Concept 01: Prove Your Skills With GitHub
- Concept 02: Introduction
- Concept 03: GitHub profile important items
- Concept 04: Good GitHub repository
- Concept 05: Interview with Art - Part 1
- Concept 06: Identify fixes for example “bad” profile
- Concept 07: Quick Fixes #1
- Concept 08: Quick Fixes #2
- Concept 09: Writing READMEs with Walter
- Concept 10: Interview with Art - Part 2
- Concept 11: Commit messages best practices
- Concept 12: Reflect on your commit messages
- Concept 13: Participating in open source projects
- Concept 14: Interview with Art - Part 3
- Concept 15: Participating in open source projects 2
- Concept 16: Starring interesting repositories
- Concept 17: Next Steps
-
Part 03 : Apache Spark and Spark Streaming
-
Module 01: Apache Spark and Spark Streaming
-
Lesson 01: The Power of Spark
In this lesson, you will learn about the problems that Apache Spark is designed to solve. You'll also learn about the greater Big Data ecosystem and how Spark fits into it.
- Concept 01: Introduction
- Concept 02: What is Big Data?
- Concept 03: Numbers Everyone Should Know
- Concept 04: Hardware: CPU
- Concept 05: Hardware: Memory
- Concept 06: Hardware: Storage
- Concept 07: Hardware: Network
- Concept 08: Hardware: Key Ratios
- Concept 09: Small Data Numbers
- Concept 10: Big Data Numbers
- Concept 11: Medium Data Numbers
- Concept 12: History of Distributed Computing
- Concept 13: The Hadoop Ecosystem
- Concept 14: MapReduce
- Concept 15: Hadoop MapReduce [Demo]
- Concept 16: The Spark Cluster
- Concept 17: Spark Use Cases
- Concept 18: Summary
-
Lesson 02: Data Wrangling with Spark
In this lesson, we'll dive into how to use Spark for cleaning and aggregating data.
- Concept 01: Introduction
- Concept 02: Functional Programming
- Concept 03: Why Use Functional Programming
- Concept 04: Procedural Example
- Concept 05: Procedural [Example Code]
- Concept 06: Pure Functions in the Bread Factory
- Concept 07: The Spark DAGs: Recipe for Data
- Concept 08: Maps and Lambda Functions
- Concept 09: Maps and Lambda Functions [Example Code]
- Concept 10: Data Formats
- Concept 11: Distributed Data Stores
- Concept 12: SparkSession
- Concept 13: Reading and Writing Data into Spark Data Frames
- Concept 14: Read and Write Data into Spark Data Frames [example code]
- Concept 15: Imperative vs Declarative programming
- Concept 16: Data Wrangling with DataFrames
- Concept 17: Data Wrangling with DataFrames Extra Tips
- Concept 18: Data Wrangling with Spark [Example Code]
- Concept 19: Quiz - Data Wrangling with DataFrames
- Concept 20: Quiz - Data Wrangling with DataFrames Jupyter Notebook
- Concept 21: Quiz [Solution Code]
- Concept 22: Spark SQL
- Concept 23: Example Spark SQL
- Concept 24: Example Spark SQL [Example Code]
- Concept 25: Quiz - Data Wrangling with SparkSQL
- Concept 26: Quiz [Spark SQL Solution Code]
- Concept 27: RDDs
- Concept 28: Summary
-
Lesson 03: Intro to Spark Streaming
In this lesson, students will learn what Apache Spark Streaming is. Students will review the core architecture of Spark, and distinguish differences between Spark Streaming vs Structured Streaming.
- Concept 01: Welcome to Data Streaming
- Concept 02: Lesson Overview
- Concept 03: Introduce Spark Ecosystem
- Concept 04: Review: Spark Ecosystem
- Concept 05: Intro to Spark RDDs
- Concept 06: Partitioning in Spark
- Concept 07: DataFrames and Datasets
- Concept 08: Create RDD and DataFrame
- Concept 09: Intro to Spark Streaming
- Concept 10: Review: Spark Streaming & DStream
- Concept 11: Spark Structured Streaming
- Concept 12: State Management
- Concept 13: Spark UI / DAGs
- Concept 14: More on the Spark UI
- Concept 15: Spark Stages
- Concept 16: Establishing Schema
- Concept 17: Lesson Recap
-
Lesson 04: Structured Streaming APIs
In this lesson, we’ll go over commonly used functions in RDD/DataFrame/Dataset. We’ll continue to learn about Spark Streaming APIs and how you can use them to solve real-time analytic problems.
- Concept 01: Lesson Overview
- Concept 02: RDD/DataFrame Functions
- Concept 03: Lazy Evaluation
- Concept 04: Transformations
- Concept 05: Actions
- Concept 06: Transformations in DAGs
- Concept 07: Structured Streaming APIs
- Concept 08: Watermark
- Concept 09: Spark UI for Query Plans
- Concept 10: Output Sinks
- Concept 11: Checkpointing Data
- Concept 12: State Store and Checkpointing
- Concept 13: Configurations/Tuning of Spark Clusters
- Concept 14: Lesson Recap
-
Lesson 05: Integration of Spark Streaming and Kafka
In this lesson, students will learn core components in integrating Spark Streaming and Kafka.
- Concept 01: Lesson Overview
- Concept 02: Create a Kafka Producer Server
- Concept 03: Kafka Data Source API
- Concept 04: Kafka Offsets in Spark
- Concept 05: Exercise: Offsets & Triggers
- Concept 06: Integrating Spark and Kafka
- Concept 07: Logs with Spark Console & UI
- Concept 08: Progress Reports in Spark Console
- Concept 09: Review: Progress Report Fields
- Concept 10: Kafka-Spark Integration Pt. 1
- Concept 11: Kafka-Spark Integration Pt. 2
- Concept 12: Performance Tuning
- Concept 13: Lesson and Course Wrap-Up
-
Lesson 06: SF Crime Statistics with Spark Streaming
In this project, you will analyze a real-world dataset of the SF Crime Rate, extracted from kaggle, to provide statistical analysis using Apache Spark Structured Streaming.
Project Description - SF Crime Statistics with Spark Streaming
-
Lesson 07: Take 30 Min to Improve your LinkedIn
Find your next job or connect with industry peers on LinkedIn. Ensure your profile attracts relevant leads that will grow your professional network.
- Concept 01: Get Opportunities with LinkedIn
- Concept 02: Use Your Story to Stand Out
- Concept 03: Why Use an Elevator Pitch
- Concept 04: Create Your Elevator Pitch
- Concept 05: Use Your Elevator Pitch on LinkedIn
- Concept 06: Create Your Profile With SEO In Mind
- Concept 07: Profile Essentials
- Concept 08: Work Experiences & Accomplishments
- Concept 09: Build and Strengthen Your Network
- Concept 10: Reaching Out on LinkedIn
- Concept 11: Boost Your Visibility
- Concept 12: Up Next
-
Part 04 (Elective): Apache Spark and Spark Streaming (Updated)
-
Module 01: Apache Spark and Spark Streaming (Updated)
-
Lesson 01: Welcome
Introduction to Data Streaming with Spark
- Concept 01: Meet Your Instructor
- Concept 02: Course Outline
- Concept 03: Lesson Outline
- Concept 04: Introduction to Data Streaming
- Concept 05: Why Data Streaming Is Important
- Concept 06: Business Stakeholders
- Concept 07: When To Use Data Streaming
- Concept 08: History of Data Streaming
- Concept 09: Tools & Environment
- Concept 10: Workspace Setup Outside of Classroom
- Concept 11: Project: Evaluate Human Balance with Spark Streaming
- Concept 12: Lesson Recap
- Concept 13: Let's Get Started!
-
Lesson 02: Streaming Dataframes, Views, and Spark SQL
In this lesson, you'll learn about working with Spark Dataframes and views.
- Concept 01: Lesson Overview
- Concept 02: Importance of Spark
- Concept 03: Spark Clusters
- Concept 04: Walkthrough 1: Start a Spark Cluster
- Concept 05: Quiz: Spark Clusters
- Concept 06: Exercise 1: Spark Clusters
- Concept 07: Solution: Spark Clusters
- Concept 08: Create a Spark Streaming Dataframe with a Kafka Source
- Concept 09: Walkthrough 2: Create a Spark Streaming Dataframe
- Concept 10: Quiz: Create a Spark Streaming Dataframe
- Concept 11: Exercise : Create a Spark Streaming Dataframe
- Concept 12: Solution 2: Create a Spark Streaming Dataframe
- Concept 13: Spark Views
- Concept 14: Walkthrough 3: Query a Spark View
- Concept 15: Quiz: Query a Spark View
- Concept 16: Exercise: Spark Views
- Concept 17: Solution 3: Spark Views
- Concept 18: Edge Cases
- Concept 19: Glossary
- Concept 20: Further Reading
- Concept 21: Lesson Review
-
Lesson 03: Joins and JSON
In this lesson, you'll learn how to work with JSON and complete Joins for data streaming.
- Concept 01: Lesson Overview
- Concept 02: Why JSON is important?
- Concept 03: Why JSON and joins are important.
- Concept 04: Parse a JSON Payload
- Concept 05: Walkthrough 1: Parse a JSON Payload
- Concept 06: Quiz: Parse a JSON payload into separate fields for analysis
- Concept 07: Exercise: Parse a JSON Payload
- Concept 08: Solution 1: Parse a JSON Payload
- Concept 09: Join Streaming Dataframes from Different Datasources
- Concept 10: Walkthrough 2: Join Streaming Dataframes
- Concept 11: Quiz: Join Streaming Dataframes from Different Datasources
- Concept 12: Exercise 2: Join Streaming Dataframes
- Concept 13: Solution: Join Streaming Dataframes
- Concept 14: Sink to Kafka
- Concept 15: Walkthrough 3: Sink to Kafka
- Concept 16: Quiz: Sink to Kafka
- Concept 17: Exercise: Sink to Kafka
- Concept 18: Solution 3: Sink to Kafka
- Concept 19: Edge Cases
- Concept 20: Glossary
- Concept 21: Further Reading
- Concept 22: Lesson Review
-
Lesson 04: Redis, Base64 and JSON
This lesson will focus on working with Redis, Base64, and JSON in Data Streaming.
- Concept 01: Lesson Outline
- Concept 02: Why are Redis and Base64 Important?
- Concept 03: How an Expert Thinks About Redis and Base64
- Concept 04: Manually Save and Read with Redis and Kafka
- Concept 05: Walkthrough 1: Manually Save and Read with Redis and Kafka
- Concept 06: Quiz: Manually Save and Read with Redis and Kafka
- Concept 07: Exercise: Manually Save and Read with Redis and Kafka
- Concept 08: Solution 1: Manually Save and Read with Redis and Kafka
- Concept 09: Parse Base64 Encoded Information
- Concept 10: Walkthrough 2: Parse Base64 Encoded Information
- Concept 11: Quiz: Parse Base64 Encoded Information
- Concept 12: Exercise 2: Parse Base64 Encoded Information
- Concept 13: Solution: Parse Base64 Encoded Information
- Concept 14: Sink a Subset of JSON
- Concept 15: Walkthrough 3: Sink a Subset of JSON Fields
- Concept 16: Quiz: Sink a Subset of JSON
- Concept 17: Exercise 3: Sink a Subset of JSON
- Concept 18: Solution 3: Sink a Subset of JSON
- Concept 19: Edge Cases
- Concept 20: Quiz: Edge Cases
- Concept 21: Bringing it all Together Walkthrough
- Concept 22: Final Exercise
- Concept 23: Solution 4: Final Exercise
- Concept 24: Glossary
- Concept 25: Further Reading
- Concept 26: Lesson and Course Recap
- Concept 27: Congratulations
-