Aissa Ben Yahya

Bio

aissabenbang _at_ gmail.com
ai.benyahya _at_ edu.umi.ac.ma

I am a Ph.D. student at Faculté des sciences, where I am currently pursuing my thesis entitled "An Automatic Software Vulnerability Classification for Critical Infrastructure Systems", as a member of CNS (Computer Networks and Systems team), under the supervision of Professor Abdelbaki El Belrhiti El Alaoui.

My PhD research focuses on using deep learning and Natural Language Processing (NLP) techniques for vulnerability classification and detection. Recently, I have been exploring the application of transformers in this domain. My broader interests include cybersecurity, deep learning for NLP, and the use of advanced NLP models such as transformers for various tasks.

You can download my cv here.

Scholar GitHub LinkedIn

News

28-07-2025

Paper accepted at Journal of Information Security and Applications

Our paper "Improving critical infrastructure security through hybrid embeddings for vulnerability classification" has been published at the prestigious Journal of Information Security and Applications. Click here to read the paper, a temporary free access link here
17-03-2025

Paper accepted at Computer & Security Journal

Our paper "Bayes-Based Word Weighting for Enhanced Vulnerability Classification in Critical Infrastructure Systems" has been accepted for publication at the prestigious Computer & Security Journal. Click here to read the paper
16-03-2025

Paper published at LNCS

Our paper "Enhanced Classification of Embedded System Vulnerabilities Using Ensemble Embedding and BiLSTM Networks" presented at IDEAS 2024. has been published at LNCS, you can check out the paper here.

Education

Ph.D.October 2020-September 2025

Ph.D. thesis: "An Automatic Software Vulnerability Classification for Critical Infrastructure Systems"

Université Moulay Ismail, Faculté des sciences, Meknès
MSc2018-2020

Master in Computer Networks and Embedded Systems

Université Moulay Ismail, Faculté des sciences, Meknès
BScMay 2014

Bachelor in Mathematical Sciences, Computer Science and Applications (4-year curriculum)

Faculté Polydisciplinaire, Ouarzazate

© Aissa Ben Yahya, powered by Bootstrap, last updated:

Research

My main research interests and activities.

Topics

Cyber Security
Natural Language Processing
Machine Learning, Deep Learning
Vulnerability Detection/Classification, Computer Networking

Reviews

Code

Presentations

Publications

My published and submitted work.

Improving critical infrastructure security through hybrid embeddings for vulnerability classification

Aissa Ben yahya, Hicham El Akhal, Abdelbaki El Belrhiti El Alaoui

Jounal Journal of Information Security and Applications

Abstract

The growing prevalence of vulnerabilities in embedded devices poses a significant risk to critical infrastructure. While deep learning has advanced vulnerability classification, its effectiveness is often hindered by limitations in word representation. Traditional word embeddings struggle with out-of-vocabulary (OOV) words common in domain-specific reports, while pre-trained language models (PLMs), despite their contextual power, may lack specialized domain knowledge. To address these challenges, we propose a novel Two-Stream hybrid embedding architecture that combines Vuln2Vec, a custom domain-specific word embedding, with a large pre-trained language model (PLM) using a learnable weighted feature fusion. Our approach leverages the rich domain-specific vocabulary of Vuln2Vec to understand specialized terminology, while the PLM captures broader contextual relationships and effectively handles OOV words. We validate our method through rigorous experiments, including ablation studies and comparative analyses on vulnerability databases such as the National Vulnerability Database (NVD), the Chinese Vulnerability Database (CNNVD), and a challenging manually collected dataset. Our experiments demonstrate that the proposed hybrid embedding method achieves a state-of-the-art F1-score of 94.25% and an accuracy of 94.88% on the challenging test dataset, validating the superiority of fusing specialized and general-purpose knowledge for this critical task.

Bayes-Based Word Weighting for Enhanced Vulnerability Classification in Critical Infrastructure Systems

Aissa Ben yahya, Hicham El Akhal, Abdelbaki El Belrhiti El Alaoui

Jounal Computer & Security

Abstract

The increasing number of vulnerabilities in embedded devices poses a significant threat to the critical infrastructure security where these devices are used. While deep learning approaches have advanced software vulnerability classification, they exhibit critical limitations regarding word weighting. Conventional methods like term frequency-inverse document frequency (TF-IDF) prioritize global term distributions but overlook intra-class distinctions. While improved variants of this technique have been proposed, they often fail to consider that a word’s importance can vary across categories and struggle to prioritize rare but distinctive words adequately. Additionally, high inter-class semantic overlap and terminological ambiguity in vulnerability descriptions hinder model performance by failing to separate intra-class keywords from background noise. To address these gaps, we propose a novel vulnerability classification and word vector weighting approach based on Bayes theorem. Our method dynamically adjusts term relevance by calculating posterior probabilities of word-category associations, emphasizing rare tokens with high intra-class specificity. We validate the approach on four test datasets derived from databases such as the National Vulnerability Database (NVD) and the Chinese Vulnerability Database (CNNVD). Rigorous ablation and comparative studies demonstrate that Bayes-based word weighting outperformed other methods by achieving a performance of 97.63% accuracy, and 97.60% F1-score on the most challenging test data. All our models and code to produce our results are open-sourced.

Enhanced Classification of Embedded System Vulnerabilities Using Ensemble Embedding and BiLSTM Networks

Aissa Ben Yahya, Hicham El Akhal, Abdelbaki El Belrhiti El Alaoui

Conference IDEAS 2024

Abstract

Critical infrastructure increasingly relies on embedded systems, making them particularly vulnerable to cyber attacks due to their complexity and interconnectivity. Unlike general-purpose systems, embedded systems need specialized security solutions tailored to their unique vulnerabilities. Accurate classification of embedded system vulnerabilities is essential for targeted analysis and mitigation. Traditional methods using pre-trained embeddings like Word2Vec, GloVe, and FastText often struggle with Out-of-Vocabulary (OOV) words, reducing their effectiveness. We address this with a novel ensemble embedding technique that combines multiple pre-trained embeddings, enhancing the classification of embedded system vulnerabilities. Our BiLSTM-based model, tested on datasets such as NVD and CNNVD, achieved 82.61% accuracy on unseen data, outperforming traditional embeddings.

Positive discrimination of minority classes through data generation and distribution: A case study in olive disease classification

Hicham El Akhal, Aissa Ben yahya, Abdelbaki El Belrhiti El Alaoui

Journal Engineering Applications of Artificial Intelligence

Abstract

Deep learning models have achieved remarkable success in various tasks, especially in classification. This success is particularly evident in the precise classification of plant diseases, which is crucial for effective agricultural management. However, accurate classification faces challenges, particularly in data collection, where certain classes are underrepresented, namely the minority classes. This issue can significantly impact model performance. To tackle this challenge, this paper introduces a novel methodology that differs from existing approaches. We focus on addressing the issue of minority classes in image-based classification tasks, particularly for olive diseases. We employ data generation methods, including basic transformations, to produce augmented data and utilize Deep Convolutional Generative Adversarial Networks (DCGAN) to produce generated data. Next, we apply the Frechet Inception Distance (FID) to the generated dataset to select the highest-quality images. We then distribute varying percentages (25%, 50%, 75%, 100%) of this new data into the minority classes of the original dataset. Our data distribution strategies involve incorporating specific amounts of (1) augmented data, (2) generated data, and (3) a combination of both augmented and generated data to achieve target percentages (T.P) in the resulting dataset. Our experiments focus on classifying olive diseases into seven distinct categories using a pre-trained Convolutional Neural Network (CNN) architecture. We observe significant improvements in the model’s performance, particularly in the accurate classification of minority classes. This approach enhances diagnostic accuracy and optimizes data distribution, which is crucial for effectively addressing the challenges posed by minority classes.

Machine Learning-Based Collection and Analysis of Embedded Systems Vulnerabilities

Aissa Ben Yahya, Hicham El Akhal, Abdelbaki El Belrhiti El Alaoui

Conference MIPSC'23

Abstract

The security of embedded systems is deteriorating in comparison to conventional systems due to resource limitations in memory, processing, and power. Daily publications highlight various vulnerabilities associated with these systems. While significant efforts have been made to systematize and analyze these vulnerabilities, most studies focus on specific areas within embedded systems and lack the implementation of artificial intelligence (AI). This research aims to address these gaps by utilizing support vector machine (SVM) to classify vulnerabilities sourced from the national vulnerabilities database (NVD) and specifically targeting embedded system vulnerabilities. Results indicate that seven of the top 10 common weakness enumeration (CWE) vulnerabilities in embedded systems are also present in the 2022 CWE Top 25 Most Dangerous Software Weaknesses. The findings of this study will facilitate security researchers and companies in comprehensively analyzing embedded system vulnerabilities and developing tailored solutions.

A novel approach for image-based olive leaf diseases classification using a deep hybrid model

Hicham El Akhal, Aissa Ben Yahya, Abdelbaki El Alaoui El Belrhiti

Journal Ecological Informatics

Abstract

The olive tree is affected by a variety of diseases. To identify these diseases, many farmers typically use traditional methods that require a lot of effort and specialization. These methods include visually observing the tree or conducting tests in a laboratory. Fortunately, recent progress in machine learning (ML) and deep learning (DL) has demonstrated promising potential to automatically classify diseases with both high accuracy and speed. However, as indicated by the literature, only a few studies are utilizing ML and DL techniques for identifying and categorizing diseases that affect olive trees. Therefore, in this study, we collected a dataset containing 4138 images of olive leaves from various sources. The dataset comprises four categories: three representing diseases and one denoting a healthy category. We also introduced an innovative approach to classify olive leaf diseases by combining deep learning architectures, specifically convolutional neural networks (CNNs), with machine learning classifiers. In this approach, we developed a total of 30 distinct deep hybrid models (DHMs), utilizing six pre-trained convolutional neural network architectures (VGG19, ResNet50, MobileNetV2, InceptionV3, DenseNet201, and EfficientNetB0) as feature extractors, along with five machine learning classifiers (MLP, LR, RF, SVM, and DT). To assess the performance of the DHMs, we used performance evaluation metrics (Accuracy, Precision, Recall, F1-score) and we conducted an assessment to validate the reliability rating of the DHMs using a cross-validation technique. Additionally, we employed the Non-Parametric ScottKnott ESD (NPSK) test to assess the ranking of the best DHMs. The study’s findings revealed that the most efficient deep hybrid model was achieved by using the EfficientNetB0 model in combination with a logistic regression classifier, achieving an impressive accuracy score of 96.14%. Our approach has the potential to significantly assist olive farmers in rapidly and accurately identifying diseases, thereby potentially reducing economic losses.

Teaching

Competitions

Contact

Feel free to send me an email to discuss about research or even arrange a meetup.

Department of Computer Science
Meknes
PB 11201 Zitoune, 50000 Meknes, Morocco

ai.benyahya _at_ edu.umi.ac.ma
aissabenbang _at_ gmail.com
@AissaBenYahya1
My LinkedIn profile

Aissa Ben Yahya

FS-UMI @edu.umi.ac.ma

Bio

aissabenbang _at_ gmail.comai.benyahya _at_ edu.umi.ac.ma

News

Paper accepted at Journal of Information Security and Applications

Paper accepted at Computer & Security Journal

Paper published at LNCS

Education

Research

Topics

Reviews

Code

Presentations

Publications

Improving critical infrastructure security through hybrid embeddings for vulnerability classification

Abstract

Bayes-Based Word Weighting for Enhanced Vulnerability Classification in Critical Infrastructure Systems

Abstract

Enhanced Classification of Embedded System Vulnerabilities Using Ensemble Embedding and BiLSTM Networks

Abstract

Positive discrimination of minority classes through data generation and distribution: A case study in olive disease classification

Abstract

Machine Learning-Based Collection and Analysis of Embedded Systems Vulnerabilities

Abstract

A novel approach for image-based olive leaf diseases classification using a deep hybrid model

Abstract

Teaching

Competitions

Contact

aissabenbang _at_ gmail.com
ai.benyahya _at_ edu.umi.ac.ma