Professional Certificate 10 Modules 149 Lessons

Data Engineering and Analytics for AI

Description

Program summary

10 Modules · 149 Lessons

This online training addresses the data foundation required for the sustainable success of AI projects. Topics such as data collection and preparation, quality controls, traceability, pipeline approaches, and analytical modeling are covered in alignment with AI use cases. The program emphasizes the importance of building a solid data system before discussing the “model” and equips participants with an applicable data engineering mindset within their organizations.

Format: The program is delivered online. If there are live sessions, campus visits, or in-person components, they are clearly specified on the program page.

What will you gain from this training?
• Designing a data lifecycle for AI projects
• Data quality and governance approach (control points, traceability)
• Building scalable systems with ETL/ELT and pipeline thinking
• Establishing analytical metric models and reporting logic

Who should attend?
Data engineers/analysts, AI teams, technical product managers, and teams building data platforms.

Modules

What is included

Lessons

149

Review the module structure and lesson flow before enrollment.

Content sections

Review the sections below and open only the one you need. The summary panel on the side keeps the long explanation separate and readable.

Course Curriculum

Module roadmap

Review the module structure and lesson flow before enrollment.

Module 1

Module 1 — Fundamentals and Ecosystem Overview

9 Lessons

Module 2

Module 2 — Data Collection and Ingestion Systems

14 Lessons

Module 3

Module 3 — Data Storage and Architecture

17 Lessons

Next step

Add to Cart

€ 149,00

Immediate access to the secure purchase flow.

Guided checkout flow for institutional and individual buyers.

A clear curriculum preview before checkout.

You can add the product to your cart and proceed to the payment step.

Certificate Preview

Sample certificate

Preview the institution-issued certificate style learners can expect after successfully completing the program.

CIT

What is included

Course Curriculum

Review the module structure and lesson flow before enrollment.

10 Modules 149 Lessons

The intersection of artificial intelligence and data engineering

Data engineer vs. data scientist vs. ML engineer roles

What is the Modern Data Stack

Data maturity model: from raw data to AI

What is batch processing and how it works

What is stream processing and how it works

Batch vs. streaming: differences and use cases

Cloud platforms: AWS, GCP, and Azure data services overview

Overview of core tools used in data engineering

Data source taxonomy: structured, semi-structured, and unstructured data

What is a REST API and how data ingestion works

API authentication methods: API keys, OAuth, and JWT

Pagination and rate limiting concepts

Web scraping fundamentals and HTML structure

What is Change Data Capture (CDC) and how it works

Apache Kafka architecture: topics, partitions, and brokers

How Kafka producers and consumers work

Source and sink integration with Kafka Connect

Offset management and at-least-once semantics in Kafka

Message queue systems: RabbitMQ vs. Amazon SQS

File formats: CSV, JSON, Parquet, Avro, and ORC

Data compression methods and serialization concepts

Schema evolution: managing schema changes over time

What is a Data Warehouse and its historical evolution

Kimball vs. Inmon architectural approaches

Star schema and snowflake schema design

What is a Data Lake and why it emerged

Layered data architecture: Bronze, Silver, and Gold

What is a Data Lakehouse: Delta Lake and Apache Iceberg

Columnar storage and its impact on query performance

Partitioning strategies: why and how

Clustering and indexing: speeding up large tables

Snowflake architecture: virtual warehouses and storage separation

BigQuery architecture: how serverless analytics works

Object storage: S3, GCS, and Azure Blob concepts

NoSQL database types and use case scenarios

What is a vector database and how it works

Pinecone, Weaviate, and Chroma comparison

What is a feature store and why it matters

Time-series databases and storing timestamped data

What is ETL: Extract, Transform, Load explained

What is ELT and its role in the modern data stack

ETL vs. ELT: which approach to use and when

Apache Spark architecture: driver, executor, cluster manager

What is an RDD and how it differs from DataFrame and Dataset

Lazy evaluation and execution plans in Spark

Join types and the cost of shuffle operations in Spark

Spark performance optimization: caching and broadcast joins

Spark Streaming: micro-batch and continuous processing

What is dbt and the SQL-based transformation approach

dbt model layers: staging, intermediate, and mart

dbt ref() function and dependency management

dbt tests and automated documentation

Apache Flink and event-time stream processing

Watermarks and handling late-arriving data

What is DuckDB: in-process analytical SQL

Slowly Changing Dimensions (SCD) types 1, 2, and 3

Data normalization and denormalization concepts

Dimensions of data quality: accuracy, completeness, consistency, timeliness

Data profiling: statistical analysis and anomaly detection

Data validation concepts and rule-based checks

The Great Expectations framework and the expectation concept

Data dictionary and metadata management

What is a data catalog: DataHub and Amundsen

Data lineage: tracing data from source to destination

GDPR and data privacy regulations for data engineering

Data masking and anonymization techniques

Data encryption: security in transit and at rest

What is Master Data Management (MDM)

Data mesh architecture: domain ownership and data as a product

Data contracts concept and implementation

What is workflow orchestration and why it is needed

Apache Airflow architecture: scheduler, worker, metadata database

The DAG concept: how to define dependency graphs

Airflow operator types: Python, Bash, and Sensor

Error handling in Airflow: retries, alerts, and SLAs

Modern workflow design philosophy with Prefect

Dagster: the asset-centric orchestration philosophy

Experiment tracking with MLflow: runs, metrics, and artifacts

MLflow model registry and version management

CI/CD for data: automated testing and deployment concepts

Infrastructure as Code: managing data infrastructure with Terraform

What is Kubernetes and why it matters for data workloads

Model deployment approaches: REST API, batch, and streaming

Model monitoring: performance degradation and data drift

The difference between concept drift and data drift

Why feature engineering matters: its impact on models

Numerical features: scaling and normalization techniques

Binning and discretization techniques

Categorical features: one-hot, label, and target encoding

How to handle high-cardinality categorical variables

Extracting meaning from date and time features

Feature extraction from text data: TF-IDF and n-grams

Missing data analysis: types and imputation methods

Outlier detection: IQR, Z-score, and isolation forest

Feature selection: filter, wrapper, and embedded methods

Dimensionality reduction: PCA concept and geometric intuition

Dimensionality reduction and visualization with t-SNE and UMAP

What is an embedding: from words to vectors

Word2Vec, GloVe, and FastText embedding models

Sharing and reusing features with a feature store

Real-time feature computation and online serving

LLM training data requirements: quantity, diversity, and quality

Pre-training data: Common Crawl and web-scale corpora

Text cleaning pipeline: deduplication, filtering, and normalization

What is tokenization: BPE and WordPiece algorithms

The impact of tokenization on model performance

Instruction tuning data: format and quality criteria

What is RLHF: learning from human feedback

Collecting preference data and annotation guidelines

What is RAG architecture: retrieval-augmented generation

Document chunking strategies: size and overlap

Embedding model selection and evaluation criteria

Vector indexing algorithms: HNSW and IVF

Hybrid search: combining dense and sparse retrieval

What is re-ranking and two-stage retrieval

Synthetic data generation: augmenting data with LLMs

Preparing and quality-checking fine-tuning datasets

Evaluating LLM outputs: metrics and benchmarks

Analytics maturity levels: from descriptive to prescriptive

Descriptive analytics: answering what happened

Diagnostic analytics: answering why it happened

Predictive analytics: answering what will happen

Prescriptive analytics: answering what should we do

SQL window functions: RANK, LAG, LEAD, PARTITION BY

How to perform cohort analysis with SQL

Retention and churn analysis concepts

Funnel analysis: measuring conversion pipelines

A/B testing: statistical significance and p-values

KPI selection and metric design principles

Dashboard design principles: simplicity and hierarchy

Data visualization fundamentals with Tableau

Building reports and dashboards with Power BI

Business intelligence architecture with Looker and LookML

Real-time metric monitoring with Grafana

Data storytelling: the art of communicating findings effectively

Lambda architecture: batch layer and speed layer

Kappa architecture: can a single stream solve everything

Comparing Lambda and Kappa architectures

Scalability principles in big data architecture

Choosing a data platform: build vs. buy decision

Cost optimization: reducing cloud spending

Data security: access control and role-based authorization

Data replication and backup strategies

Multi-cloud and hybrid cloud data architectures

Real-time analytics architecture: OLAP and HTAP

The future of data engineering: AI-native data stack

Industry use cases: finance, healthcare, and e-commerce

Data engineering career path and learning resources

Program details

Content sections

Review the sections below and open only the one you need. The summary panel on the side keeps the long explanation separate and readable.

Click the Add to Cart button. Complete the purchase process by filling in the required information. Once your payment has been confirmed, your login credentials and access details will be sent to the email address you provided during registration. Use the information sent via email to log in to the learning platform and start the course immediately.

The programs are open to: University students, Recent graduates, Public and private sector employees, Engineers, technicians, and specialists, Managers and management candidates, Professionals seeking to advance their careers, Individuals looking to enhance their digital skills, Anyone interested in gaining competencies in a new field.

Participants who successfully complete the program will: Gain up-to-date knowledge and skills relevant to their field; Develop professional competencies in line with international standards; Adapt to digital transformation and the evolving requirements of the future workforce; Acquire new skills that support career development and professional growth; Receive a verifiable digital certificate documenting their learning achievements; Strengthen their commitment to lifelong learning and continuous professional development. Certificates are issued in digital format and can be verified online through the certificate verification system.

The training programs are offered in Turkish and English and are delivered entirely online. Participants who successfully complete the program will receive a digital certificate. No physical certificate or printed document will be issued or delivered. Upon completion of the application and registration process, access information and login credentials for the training platform will be sent to the email address provided during registration. Participants may access the platform using the credentials provided and follow all training activities online throughout the program.

Data Engineering and Analytics for AI

Program summary

Module roadmap

Add to Cart

Sample certificate

Course Curriculum

Module 1 — Fundamentals and Ecosystem Overview

Module 2 — Data Collection and Ingestion Systems

Module 3 — Data Storage and Architecture

Module 4 — Data Transformation and Processing

Module 5 — Data Quality and Governance

Module 6 — Pipeline Orchestration and MLOps

Module 7 — Feature Engineering for AI

Module 8 — Data Engineering for Large Language Models

Module 9 — Analytics and Visualization

Module 10 — Architectural Design and Advanced Topics

Content sections

How to join

Who can join

Educational achievements

Process