Skip to content
View souhiab's full-sized avatar

Block or report souhiab

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
souhiab/README.md

Hi, I’m Souhaib Benbouazza 👋

Data Scientist | AI & Quantum ML Researcher | Builder of practical AI systems

I apply machine learning & generative AI to real problems—ranging from OCR/NLP pipelines and computer vision to LLMs & agentic workflows—and I research Quantum Machine Learning (QML) for healthcare (e.g., breast-cancer detection). I enjoy shipping end-to-end: data engineering → modeling → evaluation → deployment.


🔭 What I’m working on

  • Quantum ML (PhD @ UM5, Rabat): QSVMs, VQCs, QNNs with Qiskit, noise-aware kernels, feature selection, and benchmarking vs classical baselines.
  • Ports analytics & recommender systems: Data pipelines & KPIs to optimize operations and decision-making.
  • LLM & Agentic AI apps: Retrieval + tool-use + multi-step reasoning for practical assistants.

🧰 Tech Stack (selected)

Core AI/ML: ANN • CNN • RNN • LSTM • Transformers • LLMs (instruction-tuned & RAG) • Generative AI (prompting, fine-tuning) • Agentic AI (tools/planning)
NLP & OCR: spaCy / NLTK • Hugging Face • TF-IDF/Word2Vec • PDF/Image OCR (Tesseract & DL-based)
Vision & Audio: OpenCV • PIL • basic ASR/audio feature extraction
Quantum: Qiskit • quantum kernels • variational circuits
Data & MLOps: Python • NumPy • Pandas • scikit-learn • SQL • Excel • Matplotlib • Git/GitHub
Cloud & Tools: Google Cloud • VS Code • PyCharm • MATLAB
Langs: Python • R • SQL • (some) C/Java


🧪 Selected Work & Impact

  • QML for healthcare: Designed kernels & circuits for small, noisy biomedical datasets; compared scaling strategies and feature-subset sizes against classical baselines.
  • Port operations analytics: Built pipelines to clean, join, and model multi-source data, surfacing KPIs and decision recommendations.
  • Resume intelligence (PFE): End-to-end OCR → NLP classification with quality enhancement (GANs) and feature optimization (PCA).

Detailed roles, talks, and certifications are in my CV; highlights include national conference presentations and an IBM Qiskit Fall Fest hackathon win.


📌 Project ideas to pin (rename once repos are ready)

  • quantum-breast-cancer-qsvc — QSVM/VQC vs classical baselines with scaling & feature maps, confusion matrices, and AUC/F1 dashboards.
  • agentic-rag-porter — LLM+tools assistant for KPIs and recommendation queries over port datasets (RAG + evaluations).
  • ocr-nlp-resume-pipeline — OCR → text cleanup → embedding/classification; includes dataset cards & reproducible notebooks.
  • vision-violence-detection-mvp — Simple CV/audio fusion baseline (CLIP/AST) with a FastAPI inference endpoint.

🎓 Talks & Certs (short list)

  • Variational Algorithm Design (IBM) • Azure Fundamentals (AZ-900, in progress)
  • Oral/poster presentations at Moroccan conferences on QML for medical diagnostics and Classical vs Quantum ML
  • Qiskit Fall Fest Hackathon Winner (IBM)

🤝 Let’s collaborate


📜 README Badges (optional)

Python Qiskit scikit-learn Transformers LLMs Agentic AI OpenCV SQL


✍️ About me

I love turning research into working prototypes and teaching complex topics simply. When I’m not coding or running experiments, you’ll find me doing calisthenics, swimming, or creating accessible educational content.

Pinned Loading

  1. AtlasGuard-Project AtlasGuard-Project Public

    Forked from veldos/AtlasGuard-Project

    Multimodal Incident Classification segments crowd video and synchronized audio into 5-second clips and classifies each segment as violent, non-violent, distress, benign, or uncertain, leveraging bo…

    Jupyter Notebook