GakasTech

Hi, my name is

Kaled.

I use my experience and skills in data engineering, science, analytics, and cloud computing;

To build and manage Data pipelines, CLoud Infrastructure, Databases, and Data Warehouses.

About Me

My background and experience allow me to perform in data engineering, AWS Cloud, data modeling, analytics, machine learning, and data science.

My expertise includes building a data pipeline, database, data warehouse, analyzing, machine learning model, relational and non-relational databases like MySQL, PostgreSQL, HBase, AWS Redshift, AWS DynamoDB, AWS Aurora, AWS RDS as well as big data solutions like Hadoop, Apache Spark, AWS EMR.

Here are a few technologies I've been working with recently:

Python
SQL
Snowflake
Apache Spark
Apache Hadoop
AWS Glue
AWS lambda
DynamoDB
Amazon Kinesis
AWS Athena
Docker
AWS QuickStart
MySql
PostgreSQL
SQS, SNS
Apache Airflow
AWS EMR

Experience

ChaoJi Wan
Consulting
BMW China
Zolors

Data Manger - ChaoJi Wan

Sep 2023- present

I worked as a data manager and also developed web applications.

Build and manage the company database.
Weekly collect data for the company that is used for internal analysis.
Build a face recognition app connected with a Tencent cloud database to process the company’s photos.

Data Engineer/ Data scientist consultant - Consulting

Sep 2019-2023

I worked as an AWS data engineer freelancer, where I worked with some small companies to manage their data.

Build and manage data pipeline.
Researched and scrape data for companies and individuals.
Use AWS lambda to process and clean data and AWS Glue to transform and make the data ready for analysis.
Analyzed data sets using AWS Athena and or AWS Quicksight to helped companies make decisions based on findings.
Performed exploratory data analysis and discovered notable relationships.
Utilized algorithmic and programming tools to build helpful predictive models.

Data Engineering - BMW China

Mar 2019 - Aug 2019

Education

2017 - 2019

Master of Computer Science

Beijing Technology and Business University

2012 - 2017

Bachelor of Software Engineering

Beijing JiaoTong University

Projects

Batch Data Pipeline using Airflow Spark, EMR, and Snowflake

The project will utilize Airflow to orchestrate and manage the data pipeline as it creates and terminates an EMR transient cluster to save on cost. Apache Spark will transform data, and the final dataset will be loaded into Snowflake.

ETL Pipeline With Apache Airflow, Snowflake,and AWS

We build an ETL pipeline using Apache Airflow, Snowflake, and different AWS Services.

Kafka Streaming Project

The project is to simulate Real-time streaming for movie details using Kafka. We used different technologies such as Python, Amazon EC2, Apache Kafka, Glue, Athena, and SQL.

Event driven architecture with S3, Lambda, and snowflake

In this project, a lambda function will be triggered when a CSV file is uploaded into a bucket(source bucket); the function will extract the CSV file and load the data in a pandas data frame. Afterward, it will remove some unnecessary characters and then save the data in another S3 bucket(destination bucket), triggering the Snowpipe to load the newly created file automatically into a table in Snowflake.

Streaming Amazon DynamoDB data into a centralized data lake (S3)

DynamoDB Stream captures item-level changes in the DynamoDb table. Then, Kinesis Data Stream and Firehose save the changes to an S3 bucket. A lambda function transforms the data before dumping it into S3.

AWS Serverless Data Lake

This workshop is to build a serverless data lake architecture using Amazon Kinesis Firehose for streaming data ingestion, AWS Glue for Data Integration (ETL, Catalogue Management), Amazon S3 for data lake storage, Amazon Athena for SQL big data analytics.

ETL pipeline with AWS

In this project we got to analyse the car insurance dataset. We get the dataset intro a S3 bucket, then we use AWS glue to transform the dataset,and write a Lambda script to clean the data, after what we query the dataset via AWS Athena then to finish we build a dashboard using AWS Quicksight.

Housing Prices Prediction

We Build a machine learning model to predict the median house values in California.

AWS Certified Solutions Architect – Associate

Badge owners demonstrated the ability to build secure and robust solutions using architectural design principles based on customer requirements. Badge owners are able to strategically design well-architected distributed systems that are scalable, resilient, efficient, and fault-tolerant.

Verification

AWS Cloud Practitioner Certificate

The AWS Certified Cloud Practitioner offers a foundational understanding of AWS Cloud concepts, services, and terminology.

Verification

Simplilearn Data Scientist Master Program

This program is in colaboration with IBM and the course includes hands-on experience with technologies like R, Python, Machine Learning, Tableau, Hadoop, and Spark.

Verification

Data Science Bootcamp

This is a Complete Data Science Training using Mathematics, Statistics, Python, Advanced Statistics in Python, Machine & Deep Learning.

Verification

Python Pro Bootcamp

Use Python to building 100 projects in 100 days. Learn data science, automation, build websites, games and apps!

Verification

Get in Touch

My inbox is always open. Whether you have a question or just want to say hi, I’ll try my best to get back to you!

Mail me Phone

Kaled.

I use my experience and skills in data engineering, science, analytics, and cloud computing;

About Me

Experience

Education

Master of Computer Science

Beijing Technology and Business University

Bachelor of Software Engineering

Beijing JiaoTong University

Projects

Batch Data Pipeline using Airflow Spark, EMR, and Snowflake

ETL Pipeline With Apache Airflow, Snowflake,and AWS

Kafka Streaming Project

Event driven architecture with S3, Lambda, and snowflake

Streaming Amazon DynamoDB data into a centralized data lake (S3)

AWS Serverless Data Lake

ETL pipeline with AWS

Housing Prices Prediction

Certificates

AWS Certified Solutions Architect – Associate

AWS Cloud Practitioner Certificate

Simplilearn Data Scientist Master Program

Data Science Bootcamp

Python Pro Bootcamp

Get in Touch