News

  • 28-07-2025

    Paper accepted at Journal of Information Security and Applications

    Our paper "Improving critical infrastructure security through hybrid embeddings for vulnerability classification" has been published at the prestigious Journal of Information Security and Applications. Click here to read the paper, a temporary free access link here
  • 17-03-2025

    Paper accepted at Computer & Security Journal

    Our paper "Bayes-Based Word Weighting for Enhanced Vulnerability Classification in Critical Infrastructure Systems" has been accepted for publication at the prestigious Computer & Security Journal. Click here to read the paper
  • 16-03-2025

    Paper published at LNCS

    Our paper "Enhanced Classification of Embedded System Vulnerabilities Using Ensemble Embedding and BiLSTM Networks" presented at IDEAS 2024. has been published at LNCS, you can check out the paper here.

Education

  • Ph.D.October 2020-September 2025

    Ph.D. thesis: "An Automatic Software Vulnerability Classification for Critical Infrastructure Systems"

    Université Moulay Ismail, Faculté des sciences, Meknès

  • MSc2018-2020

    Master in Computer Networks and Embedded Systems

    Université Moulay Ismail, Faculté des sciences, Meknès

  • BScMay 2014

    Bachelor in Mathematical Sciences, Computer Science and Applications (4-year curriculum)

    Faculté Polydisciplinaire, Ouarzazate

© Aissa Ben Yahya, powered by Bootstrap, last updated:

Topics

  • Cyber Security
  • Natural Language Processing
  • Machine Learning, Deep Learning
  • Vulnerability Detection/Classification, Computer Networking

Reviews

Code

Presentations

Improving critical infrastructure security through hybrid embeddings for vulnerability classification

Aissa Ben yahya, Hicham El Akhal, Abdelbaki El Belrhiti El Alaoui
Jounal Journal of Information Security and Applications

Abstract

The growing prevalence of vulnerabilities in embedded devices poses a significant risk to critical infrastructure. While deep learning has advanced vulnerability classification, its effectiveness is often hindered by limitations in word representation. Traditional word embeddings struggle with out-of-vocabulary (OOV) words common in domain-specific reports, while pre-trained language models (PLMs), despite their contextual power, may lack specialized domain knowledge. To address these challenges, we propose a novel Two-Stream hybrid embedding architecture that combines Vuln2Vec, a custom domain-specific word embedding, with a large pre-trained language model (PLM) using a learnable weighted feature fusion. Our approach leverages the rich domain-specific vocabulary of Vuln2Vec to understand specialized terminology, while the PLM captures broader contextual relationships and effectively handles OOV words. We validate our method through rigorous experiments, including ablation studies and comparative analyses on vulnerability databases such as the National Vulnerability Database (NVD), the Chinese Vulnerability Database (CNNVD), and a challenging manually collected dataset. Our experiments demonstrate that the proposed hybrid embedding method achieves a state-of-the-art F1-score of 94.25% and an accuracy of 94.88% on the challenging test dataset, validating the superiority of fusing specialized and general-purpose knowledge for this critical task.

Bayes-Based Word Weighting for Enhanced Vulnerability Classification in Critical Infrastructure Systems

Aissa Ben yahya, Hicham El Akhal, Abdelbaki El Belrhiti El Alaoui
Jounal Computer & Security

Abstract

The increasing number of vulnerabilities in embedded devices poses a significant threat to the critical infrastructure security where these devices are used. While deep learning approaches have advanced software vulnerability classification, they exhibit critical limitations regarding word weighting. Conventional methods like term frequency-inverse document frequency (TF-IDF) prioritize global term distributions but overlook intra-class distinctions. While improved variants of this technique have been proposed, they often fail to consider that a word’s importance can vary across categories and struggle to prioritize rare but distinctive words adequately. Additionally, high inter-class semantic overlap and terminological ambiguity in vulnerability descriptions hinder model performance by failing to separate intra-class keywords from background noise. To address these gaps, we propose a novel vulnerability classification and word vector weighting approach based on Bayes theorem. Our method dynamically adjusts term relevance by calculating posterior probabilities of word-category associations, emphasizing rare tokens with high intra-class specificity. We validate the approach on four test datasets derived from databases such as the National Vulnerability Database (NVD) and the Chinese Vulnerability Database (CNNVD). Rigorous ablation and comparative studies demonstrate that Bayes-based word weighting outperformed other methods by achieving a performance of 97.63% accuracy, and 97.60% F1-score on the most challenging test data. All our models and code to produce our results are open-sourced.

Enhanced Classification of Embedded System Vulnerabilities Using Ensemble Embedding and BiLSTM Networks

Aissa Ben Yahya, Hicham El Akhal, Abdelbaki El Belrhiti El Alaoui
Conference IDEAS 2024

Abstract

Critical infrastructure increasingly relies on embedded systems, making them particularly vulnerable to cyber attacks due to their complexity and interconnectivity. Unlike general-purpose systems, embedded systems need specialized security solutions tailored to their unique vulnerabilities. Accurate classification of embedded system vulnerabilities is essential for targeted analysis and mitigation. Traditional methods using pre-trained embeddings like Word2Vec, GloVe, and FastText often struggle with Out-of-Vocabulary (OOV) words, reducing their effectiveness. We address this with a novel ensemble embedding technique that combines multiple pre-trained embeddings, enhancing the classification of embedded system vulnerabilities. Our BiLSTM-based model, tested on datasets such as NVD and CNNVD, achieved 82.61% accuracy on unseen data, outperforming traditional embeddings.

Positive discrimination of minority classes through data generation and distribution: A case study in olive disease classification

Hicham El Akhal, Aissa Ben yahya, Abdelbaki El Belrhiti El Alaoui
Journal Engineering Applications of Artificial Intelligence

Abstract

Deep learning models have achieved remarkable success in various tasks, especially in classification. This success is particularly evident in the precise classification of plant diseases, which is crucial for effective agricultural management. However, accurate classification faces challenges, particularly in data collection, where certain classes are underrepresented, namely the minority classes. This issue can significantly impact model performance. To tackle this challenge, this paper introduces a novel methodology that differs from existing approaches. We focus on addressing the issue of minority classes in image-based classification tasks, particularly for olive diseases. We employ data generation methods, including basic transformations, to produce augmented data and utilize Deep Convolutional Generative Adversarial Networks (DCGAN) to produce generated data. Next, we apply the Frechet Inception Distance (FID) to the generated dataset to select the highest-quality images. We then distribute varying percentages (25%, 50%, 75%, 100%) of this new data into the minority classes of the original dataset. Our data distribution strategies involve incorporating specific amounts of (1) augmented data, (2) generated data, and (3) a combination of both augmented and generated data to achieve target percentages (T.P) in the resulting dataset. Our experiments focus on classifying olive diseases into seven distinct categories using a pre-trained Convolutional Neural Network (CNN) architecture. We observe significant improvements in the model’s performance, particularly in the accurate classification of minority classes. This approach enhances diagnostic accuracy and optimizes data distribution, which is crucial for effectively addressing the challenges posed by minority classes.

Machine Learning-Based Collection and Analysis of Embedded Systems Vulnerabilities

Aissa Ben Yahya, Hicham El Akhal, Abdelbaki El Belrhiti El Alaoui
Conference MIPSC'23

Abstract

The security of embedded systems is deteriorating in comparison to conventional systems due to resource limitations in memory, processing, and power. Daily publications highlight various vulnerabilities associated with these systems. While significant efforts have been made to systematize and analyze these vulnerabilities, most studies focus on specific areas within embedded systems and lack the implementation of artificial intelligence (AI). This research aims to address these gaps by utilizing support vector machine (SVM) to classify vulnerabilities sourced from the national vulnerabilities database (NVD) and specifically targeting embedded system vulnerabilities. Results indicate that seven of the top 10 common weakness enumeration (CWE) vulnerabilities in embedded systems are also present in the 2022 CWE Top 25 Most Dangerous Software Weaknesses. The findings of this study will facilitate security researchers and companies in comprehensively analyzing embedded system vulnerabilities and developing tailored solutions.

A novel approach for image-based olive leaf diseases classification using a deep hybrid model

Hicham El Akhal, Aissa Ben Yahya, Abdelbaki El Alaoui El Belrhiti
Journal Ecological Informatics

Abstract

The olive tree is affected by a variety of diseases. To identify these diseases, many farmers typically use traditional methods that require a lot of effort and specialization. These methods include visually observing the tree or conducting tests in a laboratory. Fortunately, recent progress in machine learning (ML) and deep learning (DL) has demonstrated promising potential to automatically classify diseases with both high accuracy and speed. However, as indicated by the literature, only a few studies are utilizing ML and DL techniques for identifying and categorizing diseases that affect olive trees. Therefore, in this study, we collected a dataset containing 4138 images of olive leaves from various sources. The dataset comprises four categories: three representing diseases and one denoting a healthy category. We also introduced an innovative approach to classify olive leaf diseases by combining deep learning architectures, specifically convolutional neural networks (CNNs), with machine learning classifiers. In this approach, we developed a total of 30 distinct deep hybrid models (DHMs), utilizing six pre-trained convolutional neural network architectures (VGG19, ResNet50, MobileNetV2, InceptionV3, DenseNet201, and EfficientNetB0) as feature extractors, along with five machine learning classifiers (MLP, LR, RF, SVM, and DT). To assess the performance of the DHMs, we used performance evaluation metrics (Accuracy, Precision, Recall, F1-score) and we conducted an assessment to validate the reliability rating of the DHMs using a cross-validation technique. Additionally, we employed the Non-Parametric ScottKnott ESD (NPSK) test to assess the ranking of the best DHMs. The study’s findings revealed that the most efficient deep hybrid model was achieved by using the EfficientNetB0 model in combination with a logistic regression classifier, achieving an impressive accuracy score of 96.14%. Our approach has the potential to significantly assist olive farmers in rapidly and accurately identifying diseases, thereby potentially reducing economic losses.

Department of Computer Science
Meknes
PB 11201 Zitoune, 50000 Meknes, Morocco
  • ai.benyahya _at_ edu.umi.ac.ma
  • aissabenbang _at_ gmail.com
  • @AissaBenYahya1
  • My LinkedIn profile