Professional Certificate 10 Modules 149 Lessons

Data Engineering and Analytics for AI

Description

Program summary

This online training addresses the data foundation required for the sustainable success of AI projects. Topics such as data collection and preparation, quality controls, traceability, pipeline approaches, and analytical modeling are covered in alignment with AI use cases. The program emphasizes the importance of building a solid data system before discussing the “model” and equips participants with an applicable data engineering mindset within their organizations.

Format: The program is delivered online. If there are live sessions, campus visits, or in-person components, they are clearly specified on the program page.

What will you gain from this training?
• Designing a data lifecycle for AI projects
• Data quality and governance approach (control points, traceability)
• Building scalable systems with ETL/ELT and pipeline thinking
• Establishing analytical metric models and reporting logic

Who should attend?
Data engineers/analysts, AI teams, technical product managers, and teams building data platforms.

Modules

10

What is included

Lessons

149

Review the module structure and lesson flow before enrollment.

Content sections

4

Review the sections below and open only the one you need. The summary panel on the side keeps the long explanation separate and readable.

Course Curriculum

Module roadmap

Review the module structure and lesson flow before enrollment.

Module 1

Module 1 — Fundamentals and Ecosystem Overview

9 Lessons

Module 2

Module 2 — Data Collection and Ingestion Systems

14 Lessons

Module 3

Module 3 — Data Storage and Architecture

17 Lessons
Data Engineering and Analytics for AI

Next step

Add to Cart

€ 149,00
Immediate access to the secure purchase flow.
Guided checkout flow for institutional and individual buyers.
A clear curriculum preview before checkout.

You can add the product to your cart and proceed to the payment step.

Certificate Preview

Sample certificate

Preview the institution-issued certificate style learners can expect after successfully completing the program.

What is included

Course Curriculum

Review the module structure and lesson flow before enrollment.

10 Modules 149 Lessons
1

The intersection of artificial intelligence and data engineering

2

Data engineer vs. data scientist vs. ML engineer roles

3

What is the Modern Data Stack

4

Data maturity model: from raw data to AI

5

What is batch processing and how it works

6

What is stream processing and how it works

7

Batch vs. streaming: differences and use cases

8

Cloud platforms: AWS, GCP, and Azure data services overview

9

Overview of core tools used in data engineering

1

Data source taxonomy: structured, semi-structured, and unstructured data

2

What is a REST API and how data ingestion works

3

API authentication methods: API keys, OAuth, and JWT

4

Pagination and rate limiting concepts

5

Web scraping fundamentals and HTML structure

6

What is Change Data Capture (CDC) and how it works

7

Apache Kafka architecture: topics, partitions, and brokers

8

How Kafka producers and consumers work

9

Source and sink integration with Kafka Connect

10

Offset management and at-least-once semantics in Kafka

11

Message queue systems: RabbitMQ vs. Amazon SQS

12

File formats: CSV, JSON, Parquet, Avro, and ORC

13

Data compression methods and serialization concepts

14

Schema evolution: managing schema changes over time

1

What is a Data Warehouse and its historical evolution

2

Kimball vs. Inmon architectural approaches

3

Star schema and snowflake schema design

4

What is a Data Lake and why it emerged

5

Layered data architecture: Bronze, Silver, and Gold

6

What is a Data Lakehouse: Delta Lake and Apache Iceberg

7

Columnar storage and its impact on query performance

8

Partitioning strategies: why and how

9

Clustering and indexing: speeding up large tables

10

Snowflake architecture: virtual warehouses and storage separation

11

BigQuery architecture: how serverless analytics works

12

Object storage: S3, GCS, and Azure Blob concepts

13

NoSQL database types and use case scenarios

14

What is a vector database and how it works

15

Pinecone, Weaviate, and Chroma comparison

16

What is a feature store and why it matters

17

Time-series databases and storing timestamped data

1

What is ETL: Extract, Transform, Load explained

2

What is ELT and its role in the modern data stack

3

ETL vs. ELT: which approach to use and when

4

Apache Spark architecture: driver, executor, cluster manager

5

What is an RDD and how it differs from DataFrame and Dataset

6

Lazy evaluation and execution plans in Spark

7

Join types and the cost of shuffle operations in Spark

8

Spark performance optimization: caching and broadcast joins

9

Spark Streaming: micro-batch and continuous processing

10

What is dbt and the SQL-based transformation approach

11

dbt model layers: staging, intermediate, and mart

12

dbt ref() function and dependency management

13

dbt tests and automated documentation

14

Apache Flink and event-time stream processing

15

Watermarks and handling late-arriving data

16

What is DuckDB: in-process analytical SQL

17

Slowly Changing Dimensions (SCD) types 1, 2, and 3

18

Data normalization and denormalization concepts

1

Dimensions of data quality: accuracy, completeness, consistency, timeliness

2

Data profiling: statistical analysis and anomaly detection

3

Data validation concepts and rule-based checks

4

The Great Expectations framework and the expectation concept

5

Data dictionary and metadata management

6

What is a data catalog: DataHub and Amundsen

7

Data lineage: tracing data from source to destination

8

GDPR and data privacy regulations for data engineering

9

Data masking and anonymization techniques

10

Data encryption: security in transit and at rest

11

What is Master Data Management (MDM)

12

Data mesh architecture: domain ownership and data as a product

13

Data contracts concept and implementation

1

What is workflow orchestration and why it is needed

2

Apache Airflow architecture: scheduler, worker, metadata database

3

The DAG concept: how to define dependency graphs

4

Airflow operator types: Python, Bash, and Sensor

5

Error handling in Airflow: retries, alerts, and SLAs

6

Modern workflow design philosophy with Prefect

7

Dagster: the asset-centric orchestration philosophy

8

Experiment tracking with MLflow: runs, metrics, and artifacts

9

MLflow model registry and version management

10

CI/CD for data: automated testing and deployment concepts

11

Infrastructure as Code: managing data infrastructure with Terraform

12

What is Kubernetes and why it matters for data workloads

13

Model deployment approaches: REST API, batch, and streaming

14

Model monitoring: performance degradation and data drift

15

The difference between concept drift and data drift

1

Why feature engineering matters: its impact on models

2

Numerical features: scaling and normalization techniques

3

Binning and discretization techniques

4

Categorical features: one-hot, label, and target encoding

5

How to handle high-cardinality categorical variables

6

Extracting meaning from date and time features

7

Feature extraction from text data: TF-IDF and n-grams

8

Missing data analysis: types and imputation methods

9

Outlier detection: IQR, Z-score, and isolation forest

10

Feature selection: filter, wrapper, and embedded methods

11

Dimensionality reduction: PCA concept and geometric intuition

12

Dimensionality reduction and visualization with t-SNE and UMAP

13

What is an embedding: from words to vectors

14

Word2Vec, GloVe, and FastText embedding models

15

Sharing and reusing features with a feature store

16

Real-time feature computation and online serving

1

LLM training data requirements: quantity, diversity, and quality

2

Pre-training data: Common Crawl and web-scale corpora

3

Text cleaning pipeline: deduplication, filtering, and normalization

4

What is tokenization: BPE and WordPiece algorithms

5

The impact of tokenization on model performance

6

Instruction tuning data: format and quality criteria

7

What is RLHF: learning from human feedback

8

Collecting preference data and annotation guidelines

9

What is RAG architecture: retrieval-augmented generation

10

Document chunking strategies: size and overlap

11

Embedding model selection and evaluation criteria

12

Vector indexing algorithms: HNSW and IVF

13

Hybrid search: combining dense and sparse retrieval

14

What is re-ranking and two-stage retrieval

15

Synthetic data generation: augmenting data with LLMs

16

Preparing and quality-checking fine-tuning datasets

17

Evaluating LLM outputs: metrics and benchmarks

1

Analytics maturity levels: from descriptive to prescriptive

2

Descriptive analytics: answering what happened

3

Diagnostic analytics: answering why it happened

4

Predictive analytics: answering what will happen

5

Prescriptive analytics: answering what should we do

6

SQL window functions: RANK, LAG, LEAD, PARTITION BY

7

How to perform cohort analysis with SQL

8

Retention and churn analysis concepts

9

Funnel analysis: measuring conversion pipelines

10

A/B testing: statistical significance and p-values

11

KPI selection and metric design principles

12

Dashboard design principles: simplicity and hierarchy

13

Data visualization fundamentals with Tableau

14

Building reports and dashboards with Power BI

15

Business intelligence architecture with Looker and LookML

16

Real-time metric monitoring with Grafana

17

Data storytelling: the art of communicating findings effectively

1

Lambda architecture: batch layer and speed layer

2

Kappa architecture: can a single stream solve everything

3

Comparing Lambda and Kappa architectures

4

Scalability principles in big data architecture

5

Choosing a data platform: build vs. buy decision

6

Cost optimization: reducing cloud spending

7

Data security: access control and role-based authorization

8

Data replication and backup strategies

9

Multi-cloud and hybrid cloud data architectures

10

Real-time analytics architecture: OLAP and HTAP

11

The future of data engineering: AI-native data stack

12

Industry use cases: finance, healthcare, and e-commerce

13

Data engineering career path and learning resources

Program details

Content sections

Review the sections below and open only the one you need. The summary panel on the side keeps the long explanation separate and readable.

Click the Add to Cart button. Complete the purchase process by filling in the required information. Once your payment has been confirmed, your login credentials and access details will be sent to the email address you provided during registration. Use the information sent via email to log in to the learning platform and start the course immediately.
The programs are open to: University students, Recent graduates, Public and private sector employees, Engineers, technicians, and specialists, Managers and management candidates, Professionals seeking to advance their careers, Individuals looking to enhance their digital skills, Anyone interested in gaining competencies in a new field.
Participants who successfully complete the program will: Gain up-to-date knowledge and skills relevant to their field; Develop professional competencies in line with international standards; Adapt to digital transformation and the evolving requirements of the future workforce; Acquire new skills that support career development and professional growth; Receive a verifiable digital certificate documenting their learning achievements; Strengthen their commitment to lifelong learning and continuous professional development. Certificates are issued in digital format and can be verified online through the certificate verification system.
The training programs are offered in Turkish and English and are delivered entirely online. Participants who successfully complete the program will receive a digital certificate. No physical certificate or printed document will be issued or delivered. Upon completion of the application and registration process, access information and login credentials for the training platform will be sent to the email address provided during registration. Participants may access the platform using the credentials provided and follow all training activities online throughout the program.