Data Scientist specializing in AI and ML, with strong problem-solving skills and a knack for bridging the gap between business and technical needs. Seeking a collaborative team where I can contribute to impactful projects and grow professionally. Passionate about using data science to improve lives.
At Orange, we were faced with a complex challenge: our existing data ecosystem was costly and required extensive data engineering services from third-party providers. The solution came in the form of Google Cloud Platform (GCP), which offered a comprehensive suite of services including cost-effective storage, serverless architectures, and robust data engineering tools.
Recognizing the potential of GCP, Orange brought me on board as a Data Scientist to lead this transition and to explore the full range of solutions that GCP could offer. Our initial focus was on classifying customer complaints.
For a detailed recommendation, please see the letter from my manager:
As a Data Engineering Intern at Stellantis, I played a pivotal role in the Carflow MEA Dashboards project, a key initiative aimed at leveraging data from diverse sources to monitor supply chain operations in the MEA region. The existing manual ETL (Extract, Transform, Load) process posed challenges in terms of increased effort, potential human error, and inefficiencies. My primary responsibility was to automate this ETL process, thereby streamlining data management for the project.
As a Data Analyst at Parallelfolio, a third-party service provider, I worked closely with a team of data analysts and data scientists on a project for our client, NOKIA. Our primary objective was to clean, process, and analyze data related to NOKIA's projects in the MENA region, and to create insightful dashboards that would enable proactive decision-making.
As part of a strategic initiative to enhance customer engagement and provide real-time information to clients, I was tasked with designing and developing an advanced Natural Language Processing (NLP) chatbot for the company's website. The chatbot was intended to assist clients in obtaining information about ongoing projects, as well as available houses and apartments for sale.
GPA: 3.25
In this project, I implemented a fine-tuning approach on the Wav2Vec2 model for Automatic Speech Recognition (ASR). Wav2Vec2, a pretrained model, was fine-tuned on Spanish and Finnish speech datasets, with the aim of developing a model that performs well on low-resource languages.
The fine-tuned model achieved a WER of 0.165 for Spanish and 0.376 for Finnish, demonstrating the effectiveness of the fine-tuning approach on Wav2Vec2 for ASR.
Future work will include fine-tuning Wav2Vec2 on more low-resource languages and comparing the performance of different pretrained models.
I've been involved in developing models that estimate age from images, with a particular focus on mitigating biases
related to protected attributes such as age, gender, ethnicity, and facial expression. To reduce these biases, I
employed three primary strategies:
Implementing several probabilistic generative models from scratch using PyTorch. Three models were
implemented :
This project involved the execution of a multi-class image classification task on a flower dataset. The dataset consists of 3670 images divided into five classes: daisy, dandelion, roses, sunflowers, and tulips. The TensorFlow implementation of the VGG16 architecture was leveraged through Transfer Learning to achieve the image classification task.
The fine-tuned model achieved an accuracy of 85% on the validation set, demonstrating the model's efficacy in solving the image classification problem.
The dataset used in this project contains 4242 images of flowers. The data collection is based on the data from Flickr, Google Images, and Yandex Images. The dataset is divided into five classes: chamomile, tulip, rose, sunflower, and dandelion. Each class contains about 800 photos. The photos are not high resolution, about 320x240 pixels, and have different proportions.
This project involved the training of an Alternating Least Squares (ALS) recommendation model using the MLlib library and the MovieLens 100k dataset. The dataset was stored on the Hadoop Distributed File System (HDFS). The objective of the model was to effectively leverage the MovieLens 100k dataset to generate insightful recommendations. The project was inspired by the book "Machine Learning with Spark".
The notebook "Recommendation_system_with_Pyspark.ipynb" contains a full description of each step of this project and the results achieved.
The notebook "Classification_with_Pyspark.ipynb" contains a full description of each step of this project and the results achieved.
Beyond the world of data and algorithms, I find joy and inspiration in a variety of interests. Traveling is my passport to adventure, as I seek out new cultures, flavors, and landscapes that expand my horizons. Music is the universal language that touches my soul, bringing me solace and filling me with joy.
Chess is my strategic playground, where I immerse myself in the complexities of each move, testing my intellect and problem-solving skills. On the court, Basketball fuels my competitive spirit, combining athleticism with tactical thinking.
When I'm not exploring new territories or engaged in mind games, I find serenity in the great outdoors. Hiking takes me to new heights, both literally and figuratively, as I embrace the tranquility of nature and push my physical boundaries.
These diverse interests define who I am, nourishing my creativity, resilience, and boundless curiosity in every aspect of my life.