Data Engineer

A structured 3-month data engineering program focused on building practical, job-ready skills through a mix of core concepts, hands-on tools, and real-world projects. the curriculum covers data pipelines, etl and elt processes, modern data infrastructure, data modeling, orchestration, data quality, and testing. learners gain experience with industry-relevant tools and complete an end-to-end capstone project to demonstrate their ability to design, build, an

Kathmandu

4 Months

Online

NRS -1 0

Total Seat

Batch Start Date

Apr.30,1990

Duration

4 Months

Training Description

Month 1: Foundations and Tools

Week 1: Introduction to Data Engineering and Ecosystem

• Day 1: Roles and Responsibilities

o What is a Data Engineer?

▪ Overview of the role within the data ecosystem.

▪ Difference from Data Scientists and Analysts. (Optional)

▪ Key deliverables: pipelines, infrastructure, and scalability.

o Core Responsibilities:

▪ Building and maintaining data pipelines.

▪ Data integration, transformation, and storage.

▪ Supporting downstream analytics and ML workflows.

o Scope in the Real World:

▪ Demand for Data Engineers in the industry.

▪ Career paths and growth opportunities.

• Day 2: Key Concepts

o What is Data?

▪ Types: Structured, Semi-structured, Unstructured.

▪ Formats: JSON, CSV, Parquet, Avro.

o Data Pipelines Overview:

▪ What are data pipelines and their role in the data ecosystem?

o ETL (Extract, Transform, Load):

▪ Why ETL is foundational to data workflows.

▪ Example: Moving data from APIs to databases.

o Hands-On Introduction:

▪ Setting up Airbyte for simple data extraction.

▪ Setting up Minio as a destination.

Week 2: ETL vs ELT

• Day 1: ELT and Reverse ETL

o What is ELT?

▪ Difference in process (Transformation post-load).

▪ Use cases: Modern data platforms like Snowflake.

o Reverse ETL:

▪ Bringing transformed data back to operational systems.

▪ Examples: Sending processed data back to Salesforce or other CRMs.

o Key Differences:

▪ Use case comparisons.

▪ Cost and performance implications.

• Day 2: Tools and Implementation

o Hands-On ETL Tools:

▪ Extraction using Airbyte.

▪ Loading to Minio using Iceberg.

▪ Transformation using dbt:

▪ Building simple transformations.

▪ Writing SQL models.

Week 3: Data Infrastructure

• Day 1: Databases

o Relational Databases:

▪ Core concepts: Tables, indexes, primary keys, foreign keys.

▪ Common tools: MySQL, PostgreSQL.

o NoSQL Databases:

▪ Core concepts: Key-value stores, document stores, graph databases.

▪ Use cases: MongoDB, Cassandra.

• Day 2: Data Lakes vs Warehouses

o What is a Data Lake?

▪ Characteristics: Raw data storage, schema-on-read.

▪ Use cases and tools: Hadoop, Iceberg.

o What is a Data Warehouse?

▪ Characteristics: Structured data, schema-on-write.

▪ Use cases and tools: Redshift, Snowflake, BigQuery.

o Key Differences:

▪ Scalability, cost, and performance.

Week 4: Orchestration Tools

• Day 1: Airflow Basics

o What is Orchestration?

▪ Need for scheduling and automation.

▪ Overview of Apache Airflow.

o Core Components:

▪ DAGs (Directed Acyclic Graphs).

▪ Tasks and dependencies.

• Day 2: Hands-On with Airflow

o Setting up Airflow locally.

o Creating a simple DAG:

▪ Tasks for data extraction, transformation, and loading.

▪ Monitoring DAG runs.

Month 2: Data Modelling and Advanced Pipelines

Week 5: Data Modelling Concepts

• Day 1: Introduction to Data Modelling

o Banking Domain Overview:

▪ Types of data: Transactions, accounts, customers.

▪ Business requirements for analytics and reporting.

o Star Schema vs Snowflake Schema:

▪ Differences, advantages, and trade-offs.

▪ Examples for both schemas.

• Day 2: Dimension Modelling

o Fact Tables:

▪ Quantitative data (e.g., sales, transactions).

o Dimension Tables:

▪ Descriptive data (e.g., customer, product).

o ERD Tools:

▪ Creating models using Lucidchart or dbt.

Week 6: Building Data Models

• Day 1: Hands-On Data Modelling

o Building a banking data model.

o Identifying facts and dimensions.

• Day 2: Validating Models

o Verifying relationships between tables.

o Optimizing schema for performance.

Week 7: Advanced Data Pipelines

• Day 1: Deep Dive into Extraction and Loading

o Advanced Airbyte usage:

▪ Extracting data from multiple sources (APIs, files).

o Loading to Minio with Iceberg:

▪ Creating partitions and file optimization.

• Day 2: Transformation

o Using dbt for advanced SQL-based transformations.

o Using Pandas and PySpark for programmatic transformations.

Week 8: Reverse ETL

• Day 1: Concepts and Tools

o Reverse ETL process and tools overview.

• Day 2: Hands-On

o Example: Syncing processed data back to a CRM system.

Month 3: Data Quality, Testing, and Projects

Week 9: Data Quality and Testing

• Day 1: Importance of Data Quality

o Common issues: Missing data, duplicates, inconsistencies.

o Tools for quality checks.

• Day 2: Unit Testing for ETL

o Writing tests for pipeline steps (e.g., transformation validation).

Week 10: Capstone Project Introduction

• Day 1: Project Briefing

o Overview of telecom and banking projects.

o Generating mock data.

• Day 2: Setting Up Pipelines

o Starting ETL pipelines for the chosen project.

Week 11: Project Development

• Day 1: Intermediate Steps

o Refining transformations and models.

• Day 2: Integration and Orchestration

o Setting up final Airflow DAGs.

Week 12: Project Completion and Presentation

• Day 1: Finalizing and Testing

o Data quality checks and pipeline validation.

• Day 2: Presentations

o Students present their projects.

o Feedback and suggestions for improvement.

Core Concepts Across Modules

1. ETL/ELT: Focus on real-world implementation and tool usage.

2. Data Infrastructure: Understanding databases, data lakes, and warehouses.

3. Data Modelling: Real-world schema design.

4. Orchestration: Automation with Airflow.

5. Testing and Quality: Building robust and reliable pipelines.

This detailed schedule ensures a balance between theory, hands-on practice, and project-based

learning to build job-ready skills.

Training Highlight

Internship and Job placement

Project Based Training

Fast paced Course

Crash Course

Feasible timing

Resume Interview Practice

Real Time practitioners and Instructors