Salma Ouardi

Paris , France · (33) 7 58 22 41 69 · [email protected]

Data Scientist specializing in AI and ML, with strong problem-solving skills and a knack for bridging the gap between business and technical needs. Seeking a collaborative team where I can contribute to impactful projects and grow professionally. Passionate about using data science to improve lives.

16 Months of Work Experience

Data Scientist GCP

March 2023 - Present
Orange - Paris, France

At Orange, we were faced with a complex challenge: our existing data ecosystem was costly and required extensive data engineering services from third-party providers. The solution came in the form of Google Cloud Platform (GCP), which offered a comprehensive suite of services including cost-effective storage, serverless architectures, and robust data engineering tools.

Recognizing the potential of GCP, Orange brought me on board as a Data Scientist to lead this transition and to explore the full range of solutions that GCP could offer. Our initial focus was on classifying customer complaints.

Key Tasks

  • Led the transition of Orange's data ecosystem to Google Cloud Platform (GCP) to reduce costs and reliance on third-party data engineering services.
  • Developed a machine learning model on Vertex AI to classify customer complaints, improving the understanding of customer feedback and identifying areas for improvement.
  • Performed data engineering tasks including establishing the architecture for machine learning solutions on GCP's Vertex AI.
  • Conducted extensive data preprocessing and cleaning to prepare data for machine learning model training.
  • Selected and tuned machine learning algorithms to optimize model performance.
  • Implemented active learning techniques to handle a large amount of unlabeled data, iteratively improving the model's performance.
  • Achieved a model accuracy of 91%, demonstrating the effectiveness of the data science methodologies employed.
  • Successfully tested and validated the new data architecture, confirming its efficiency and robustness.
  • Utilized data science to drive business decisions and strategies, highlighting the importance of data-driven insights in business operations.

Manager Contact

  • Name: Benoit Eock Belinga
  • Role: Lead Data Scientist | Programme Data / IA
  • Email: [email protected]
  • Phone: +33 6 84 59 08 70

Recommendation Letter

For a detailed recommendation, please see the letter from my manager:

View Recommendation Letter

Data Engineer

March 2022 - September 2022
Stellantis - Casablanca, Morocco

As a Data Engineering Intern at Stellantis, I played a pivotal role in the Carflow MEA Dashboards project, a key initiative aimed at leveraging data from diverse sources to monitor supply chain operations in the MEA region. The existing manual ETL (Extract, Transform, Load) process posed challenges in terms of increased effort, potential human error, and inefficiencies. My primary responsibility was to automate this ETL process, thereby streamlining data management for the project.

Key Tasks

  • Collaborated closely with a data architect to establish an effective working environment, gaining valuable insights into the data team's operations.
  • Conducted in-depth research into Stellantis's Supply Chain business, facilitated by the supply chain business team, to understand the business context and requirements.
  • Analyzed the existing ETL solution, identified business requirements, and mapped out a strategic plan for process improvement.
  • Designed and implemented an automated ETL solution using PySpark, Apache Airflow, and Oracle Exadata, tools from the Stellantis Data department.
  • Conducted rigorous testing of the data pipelines and documented the end-to-end automation process to ensure knowledge transfer and future reference.
  • The implemented solution significantly improved the system's efficiency, reducing latency by 46ms and decreasing the failure rate by 82%.

Data Analyst

July 2021 - September 2021
ParallelFolio - Rabat, Morocco

As a Data Analyst at Parallelfolio, a third-party service provider, I worked closely with a team of data analysts and data scientists on a project for our client, NOKIA. Our primary objective was to clean, process, and analyze data related to NOKIA's projects in the MENA region, and to create insightful dashboards that would enable proactive decision-making.

Key Tasks

  • Actively participated in a project aimed at analyzing data pertaining to NOKIA's projects and sales in the MEA region, contributing to a comprehensive understanding of the business landscape.
  • Led the data cleaning and preparation process, ensuring the data was accurately formatted and ready for analysis. This resulted in an accuracy level of 86% and significantly improved the coherence of the dashboards.
  • Conducted in-depth data exploration to understand patterns and trends, providing valuable insights into NOKIA's operations in the region.
  • Developed interactive Power BI dashboards to communicate findings to managers. These dashboards facilitated more proactive and data-driven discussions during team meetings.

AI Engineer

July 2020 - September 2020
Abrar Invest - Casablanca, Morocco

As part of a strategic initiative to enhance customer engagement and provide real-time information to clients, I was tasked with designing and developing an advanced Natural Language Processing (NLP) chatbot for the company's website. The chatbot was intended to assist clients in obtaining information about ongoing projects, as well as available houses and apartments for sale.

Key Tasks

  • Designed and developed an NLP-powered chatbot utilizing pattern matching techniques to facilitate interactive and accurate communication with clients.
  • Conducted extensive data preprocessing using the Natural Language Toolkit (NLTK), ensuring data was appropriately formatted and prepared for pattern matching.
  • Constructed a comprehensive knowledge base by integrating information sourced from company databases and data mining techniques. This knowledge base was integrated into the chatbot's intents file to improve its response accuracy.
  • Implemented the pattern matching technique to enable the chatbot to accurately match user queries with appropriate responses from the knowledge base.
  • Developed a user-friendly graphical interface for the chatbot using Tkinter, enhancing the user experience and facilitating seamless interaction with the chatbot.

Education

Paris-Saclay University

Paris, France
Master of Science, Artificial Intelligence
Main Courses: ML Algorithms, Deep Leaning, Computer Vision, Large-Scale Distributed Data Processing, Probabilistic Generative Models, Applied statistics, Advanced Optimization, Signal Processing, NLP, Information Retrieval, Reinforcement Learning.
September 2022 - September 2023

Ecole des sciences de l'information

Rabat , Morocco
Master of Engineering, Data and Knowledge

GPA: 3.25

Main Courses: - Data Structures and Algorithm - Business Intelligence and Data Warehousing - Big data - Artificial Intelligence - Expert Systems - Statistics - Machine Learning - Network Security - Operating Systems - Knowledge Management.
September 2018 - August 2022

Classes Preparatoires Aux Grandes ecoles

Agadir , Morocco
MPSI, MP
Main Courses: - Mathematics - Physics - Engineering Sciences - Chemistry - Computer Science
September 2016 - August 2018

Skills

Soft Skills
  • Communication skills: Effectively communicated complex data findings to non-technical team members in my previous role as a Data Analyst at Parallelfolio, aiding in strategic decision-making.
  • Problem-solving: Identified and rectified a data inconsistency issue in a major project at Stellantis, improving the accuracy of the project's outcome by 86%.
  • Collaboration and teamwork: Worked closely with a team of data scientists and analysts at Parallelfolio to deliver a comprehensive data analysis project for our client, NOKIA.
  • Adaptability: Successfully transitioned from a Data Engineering role at Stellantis to a Data Scientist role at Orange, demonstrating the ability to quickly learn and apply new skills.
  • Attention to detail: Ensured high data quality in all projects by meticulously cleaning and preprocessing data, leading to more accurate and reliable results.
  • Critical thinking and analysis: Conducted in-depth analysis of supply chain data at Stellantis, providing valuable insights that informed strategic decisions.
  • Project Management: Managed a project timeline and coordinated a team using Agile methodologies during my internship at Orange, ensuring the project was delivered on time and met all objectives.
Technical Skills
  • Programming: Proficient in Python, with experience in Java, R, and Matlab. Used Git for version control in all my personal and school projects.
  • Machine Learning & Deep Learning: Experienced with PyTorch, TensorFlow, and various ML algorithms (CNNs, VGG, BERT, Transformers, Resnet). Developed a machine learning model at Orange that achieved an accuracy of 91%.
  • Natural Language Processing: Skilled in recommender systems, semantic analysis, and speech-to-text. Developed an NLP-powered chatbot for a company's website using pattern matching techniques.
  • Data Analysis: Proficient in SQL and Oracle. Conducted extensive data analysis for NOKIA's projects in the MEA region while at Parallelfolio.
  • Cloud Platforms: Familiar with GCP, IBM WATSON, MS Azure, and Docker. Led the transition of Orange's data ecosystem to Google Cloud Platform (GCP) to reduce costs and reliance on third-party data engineering services.
  • Statistics: Strong understanding of statistical tests, distributions, and maximum likelihood estimators. Applied statistical principles to drive data analysis projects at Parallelfolio.
  • Data Visualization: Proficient in creating effective data visualizations using libraries like Matplotlib and Seaborn. Developed Power BI dashboards to communicate data findings at Stellantis.
  • Data Wrangling: Experienced in cleaning and preprocessing messy data for analysis. Improved data quality for a major project at Stellantis, leading to more accurate results.
  • Big Data Platforms: Familiar with big data platforms like Hadoop and Spark. Utilized these platforms to handle large datasets in projects at Orange and Parallelfolio.
Languages
  • English: Bilingual Proficiency
  • French: Bilingual Proficiency
  • Arabic: Native

PROJECTS

Wav2Vec2 Fine-Tuning for ASR

January 2023

In this project, I implemented a fine-tuning approach on the Wav2Vec2 model for Automatic Speech Recognition (ASR). Wav2Vec2, a pretrained model, was fine-tuned on Spanish and Finnish speech datasets, with the aim of developing a model that performs well on low-resource languages.

Project Steps

  • Setting up APIs
  • Loading and preprocessing the CSS10 dataset
  • Configuring the Wav2Vec2CTCTokenizer and Wav2Vec2FeatureExtractor
  • Fine-tuning and training the model
  • Evaluating the model using the Word Error Rate (WER) metric

Results

The fine-tuned model achieved a WER of 0.165 for Spanish and 0.376 for Finnish, demonstrating the effectiveness of the fine-tuning approach on Wav2Vec2 for ASR.

Future Work

Future work will include fine-tuning Wav2Vec2 on more low-resource languages and comparing the performance of different pretrained models.

  • Language : Python
  • Tools : Wav2Vec2 (Hugging Face Transformers), CSS10 dataset, Wav2Vec2CTCTokenizer and Wav2Vec2FeatureExtractor (Hugging Face Transformers), Word Error Rate (WER) metric, APIs for data access and model storage, Google Drive for log storage.
  • GitHub : Wav2Vec2 Fine-Tuning for ASR
  • Bias Mitigation For Age Detection

    October 2022

    I've been involved in developing models that estimate age from images, with a particular focus on mitigating biases related to protected attributes such as age, gender, ethnicity, and facial expression. To reduce these biases, I employed three primary strategies:

    • Data Augmentation using Albumentations and Manual Augmentation using OpenCV.
    • Create a customized loss.
    • Change of base model ( NASnet , RESnet)

    Probabilistic Generative Models

    October 2022

    Implementing several probabilistic generative models from scratch using PyTorch. Three models were implemented :

    • Variational AutoEncoder
    • Restricted Boltzmann Machine
    • Real-NVP Normalizing Flows

    Flower Recognition with Fine-tuned VGG16

    Mars 2022

    This project involved the execution of a multi-class image classification task on a flower dataset. The dataset consists of 3670 images divided into five classes: daisy, dandelion, roses, sunflowers, and tulips. The TensorFlow implementation of the VGG16 architecture was leveraged through Transfer Learning to achieve the image classification task.

    Project Steps

    • Implementation of two image generators for training and validation data using the ImageDataGenerator class from the TensorFlow library.
    • Design of the generators to apply multiple data augmentation techniques to enhance the robustness of the model.
    • Conducting fine-tuning on the pre-trained VGG16 architecture.
    • Evaluating the model using the validation set.

    Results

    The fine-tuned model achieved an accuracy of 85% on the validation set, demonstrating the model's efficacy in solving the image classification problem.

    Dataset

    The dataset used in this project contains 4242 images of flowers. The data collection is based on the data from Flickr, Google Images, and Yandex Images. The dataset is divided into five classes: chamomile, tulip, rose, sunflower, and dandelion. Each class contains about 800 photos. The photos are not high resolution, about 320x240 pixels, and have different proportions.

  • Language : Python
  • Tools : TensorFlow, ImageDataGenerator, VGG16, Data Augmentation Techniques.
  • GitHub : Flower Recognition with Fine-tuned VGG16
  • Classification with PySpark

    January 2022

    This project involved the training of an Alternating Least Squares (ALS) recommendation model using the MLlib library and the MovieLens 100k dataset. The dataset was stored on the Hadoop Distributed File System (HDFS). The objective of the model was to effectively leverage the MovieLens 100k dataset to generate insightful recommendations. The project was inspired by the book "Machine Learning with Spark".

    Project Steps

    • Building a recommendation model using data about user preferences.
    • Using the trained model to compute recommendations for a given user and compute similar items for a given item (related items).
    • Applying standard evaluation metrics to the created model to measure its predictive capability.

    Results

    The notebook "Recommendation_system_with_Pyspark.ipynb" contains a full description of each step of this project and the results achieved.

  • Language : Python
  • Tools : PySpark, MLlib, ALS, HDFS, MovieLens 100k dataset.
  • GitHub : Classification with PySpark
  • /li> project was inspired by the book "Machine Learning with Spark".

    Project Steps

    • Extracting appropriate features from raw input data using PySpark.
    • Training a number of classification models using MLlib.
    • Making predictions with the classification models.
    • Applying a number of standard evaluation techniques to assess the predictive performance of the models.
    • Exploring the impact of parameter tuning on model performance and using cross-validation to select the most optimal model parameters.

    Results

    The notebook "Classification_with_Pyspark.ipynb" contains a full description of each step of this project and the results achieved.

  • Language : Python
  • Tools : PySpark, MLlib, Feature Extraction, Model Evaluation, Parameter Tuning, Cross-Validation.
  • GitHub : Classification with PySpark
  • Interests

    Beyond the world of data and algorithms, I find joy and inspiration in a variety of interests. Traveling is my passport to adventure, as I seek out new cultures, flavors, and landscapes that expand my horizons. Music is the universal language that touches my soul, bringing me solace and filling me with joy.

    Chess is my strategic playground, where I immerse myself in the complexities of each move, testing my intellect and problem-solving skills. On the court, Basketball fuels my competitive spirit, combining athleticism with tactical thinking.

    When I'm not exploring new territories or engaged in mind games, I find serenity in the great outdoors. Hiking takes me to new heights, both literally and figuratively, as I embrace the tranquility of nature and push my physical boundaries.

    These diverse interests define who I am, nourishing my creativity, resilience, and boundless curiosity in every aspect of my life.