Hi, my name is

Kaled.

I use my experience and skills in data engineering, science, analytics, and cloud computing;

To build and manage Data pipelines, CLoud Infrastructure, Databases, and Data Warehouses.

About Me

My background and experience allow me to perform in data engineering, AWS Cloud, data modeling, analytics, machine learning, and data science.

My expertise includes building a data pipeline, database, data warehouse, analyzing, machine learning model, relational and non-relational databases like MySQL, PostgreSQL, HBase, AWS Redshift, AWS DynamoDB, AWS Aurora, AWS RDS as well as big data solutions like Hadoop, Apache Spark, AWS EMR.

Here are a few technologies I've been working with recently:
  • Python
  • SQL
  • Snowflake
  • Apache Spark
  • Apache Hadoop
  • AWS Glue
  • AWS lambda
  • DynamoDB
  • Amazon Kinesis
  • AWS Athena
  • Docker
  • AWS QuickStart
  • MySql
  • PostgreSQL
  • SQS, SNS
  • Apache Airflow
  • AWS EMR

Experience

Data Manger - ChaoJi Wan
Sep 2023- present

I worked as a data manager and also developed web applications.

  • Build and manage the company database.
  • Weekly collect data for the company that is used for internal analysis.
  • Build a face recognition app connected with a Tencent cloud database to process the company’s photos.
Data Engineer/ Data scientist consultant - Consulting
Sep 2019-2023

I worked as an AWS data engineer freelancer, where I worked with some small companies to manage their data.

  • Build and manage data pipeline.
  • Researched and scrape data for companies and individuals.
  • Use AWS lambda to process and clean data and AWS Glue to transform and make the data ready for analysis.
  • Analyzed data sets using AWS Athena and or AWS Quicksight to helped companies make decisions based on findings.
  • Performed exploratory data analysis and discovered notable relationships.
  • Utilized algorithmic and programming tools to build helpful predictive models.
Data Engineering - BMW China
Mar 2019 - Aug 2019

I worked as a Data Engineer/ software Engineer where I was building and managing data pipeline and websites.

  • Responsible for Scraping, collecting and Ingesting data from the main website and other sources, images, videos, news.
  • Creating and maintaining a pipeline to store different type of data.Creating local websites to visualized news.
  • Scraped over 4 different websites every day, using python, Scrappy, Selenium, and Beautiful-Soup.
  • Wrote scripts in python that test the company’s new projects and websites and improving software testing performance by 50%.
  • Provide solutions for IT R&D-related activities, responsible for maintenance and update of news and conference room screens.
Web Manager - Zolors
Jun 2015 - Jan 2016

I was working at Zolors as a Web Developer.

  • Design and build the company website.
  • Manage the content in various ways, PHP error debug, Site speed optimization, and SEO.
  • Manage Database, Store maintenance, product updates, content write-ups on e-commerce websites (Zolors, Twitter, Instagram, etc.).
  • Testing web applications in various ways.
  • Gathered and synthesized business requirements and input from customers, stakeholders, business architects, and engineers.
  • Designed graphics for website decoration and layout. Built user interface, data visualizations, and designed overall user experience, resulting in a 20% increase in sales.

Education

2017 - 2019
Master of Computer Science
Beijing Technology and Business University
2012 - 2017
Bachelor of Software Engineering
Beijing JiaoTong University

Projects

Batch Data Pipeline using Airflow Spark, EMR, and Snowflake
Batch Data Pipeline using Airflow Spark, EMR, and Snowflake
The project will utilize Airflow to orchestrate and manage the data pipeline as it creates and terminates an EMR transient cluster to save on cost. Apache Spark will transform data, and the final dataset will be loaded into Snowflake.
ETL Pipeline With Apache Airflow, Snowflake,and AWS
ETL Pipeline With Apache Airflow, Snowflake,and AWS
We build an ETL pipeline using Apache Airflow, Snowflake, and different AWS Services.
Kafka Streaming Project
Kafka Streaming Project
The project is to simulate Real-time streaming for movie details using Kafka. We used different technologies such as Python, Amazon EC2, Apache Kafka, Glue, Athena, and SQL.
Event driven architecture with S3, Lambda, and snowflake
Event driven architecture with S3, Lambda, and snowflake
In this project, a lambda function will be triggered when a CSV file is uploaded into a bucket(source bucket); the function will extract the CSV file and load the data in a pandas data frame. Afterward, it will remove some unnecessary characters and then save the data in another S3 bucket(destination bucket), triggering the Snowpipe to load the newly created file automatically into a table in Snowflake.
Streaming Amazon DynamoDB data into a centralized data lake (S3)
Streaming Amazon DynamoDB data into a centralized data lake (S3)
DynamoDB Stream captures item-level changes in the DynamoDb table. Then, Kinesis Data Stream and Firehose save the changes to an S3 bucket. A lambda function transforms the data before dumping it into S3.
AWS Serverless Data Lake
AWS Serverless Data Lake
This workshop is to build a serverless data lake architecture using Amazon Kinesis Firehose for streaming data ingestion, AWS Glue for Data Integration (ETL, Catalogue Management), Amazon S3 for data lake storage, Amazon Athena for SQL big data analytics.
ETL pipeline with AWS
ETL pipeline with AWS
In this project we got to analyse the car insurance dataset. We get the dataset intro a S3 bucket, then we use AWS glue to transform the dataset,and write a Lambda script to clean the data, after what we query the dataset via AWS Athena then to finish we build a dashboard using AWS Quicksight.
Housing Prices Prediction
Housing Prices Prediction
We Build a machine learning model to predict the median house values in California.

Certificates

AWS Certified Solutions Architect – Associate
AWS Certified Solutions Architect – Associate
Badge owners demonstrated the ability to build secure and robust solutions using architectural design principles based on customer requirements. Badge owners are able to strategically design well-architected distributed systems that are scalable, resilient, efficient, and fault-tolerant.
AWS Cloud Practitioner Certificate
AWS Cloud Practitioner Certificate
The AWS Certified Cloud Practitioner offers a foundational understanding of AWS Cloud concepts, services, and terminology.
Simplilearn Data Scientist Master Program
Simplilearn Data Scientist Master Program
This program is in colaboration with IBM and the course includes hands-on experience with technologies like R, Python, Machine Learning, Tableau, Hadoop, and Spark.
Data Science Bootcamp
Data Science Bootcamp
This is a Complete Data Science Training using Mathematics, Statistics, Python, Advanced Statistics in Python, Machine & Deep Learning.
Python Pro Bootcamp
Python Pro Bootcamp
Use Python to building 100 projects in 100 days. Learn data science, automation, build websites, games and apps!

Get in Touch

My inbox is always open. Whether you have a question or just want to say hi, I’ll try my best to get back to you!