Events

Public defence in Computer Science, M.Sc. Mohammadreza Mohammadnia Qaraei

Public defence from the Aalto University School of Science, Department of Computer Science.
Doctoral hat floating above a speaker's podium with a microphone

Title of the thesis: Efficient and robust algorithms for extreme multilabel classification

Doctoral student: Mohammadreza Mohammadnia Qaraei
Opponent: Associate Professor Grigorios Tsoumakas, Aristotle University of Thessaloniki, Greece
Custos: Associate Professor Pekka Marttinen, Aalto University School of Science, Department of Computer Science

Many real-world problems can be seen as classification tasks, which machine learning algorithms aim to automate. For instance, tagging an email as spam or classifying the sentiment of feedback as positive, neutral, or negative are classification tasks. In these examples, the input, an email or feedback text, is assigned one label from a small set. However, some classification tasks are much more complex, involving hundreds of thousands to millions of possible labels, where there can be more than one relevant label for each sample. These problems are referred to as extreme multilabel classification (XMC).

Real-world examples of XMC include product recommender systems, where a few products are recommended from millions of possible ones when a product is visited, or tagging web-scale documents, like Wikipedia articles, with tags selected from a large set of possible ones. This thesis studies two key challenges in XMC: efficiency and robustness.

Specifically, a key challenge in XMC is efficiently performing computations and managing storage requirements in the presence of large label spaces. This thesis proposes a method for reducing model size and speeding up training while maintaining the performance of state-of-the-art approaches for extreme multilabel classification. 

Having a large label space also results in data irregularities, such as missing labels and class imbalance. These issues significantly diminish the model’s prediction accuracy. This thesis introduces methods to improve the robustness of current algorithms in XMC against these data irregularities, enhancing prediction accuracy. 

Another challenge in machine learning is ensuring robustness against input samples specifically designed to fool the model, known as adversarial examples. The thesis explores adversarial attacks in the context of extreme multilabel text classification. Specifically, it investigates the generation of adversarial examples for XMC on textual data, evaluates the robustness of XMC models against these examples, and discusses methods to improve model robustness.

Overall, this thesis proposes methods for enhancing efficiency, improving accuracy, and increasing the robustness of XMC models. These contributions have significant implications for applications involving large label spaces and multilabel data, including recommender systems and large-scale document classification.

Thesis available for public display 10 days prior to the defence at: https://aaltodoc.aalto.fi/doc_public/eonly/riiputus/ 

Doctoral theses of the School of Science: https://aaltodoc.aalto.fi/handle/123456789/52 

  • Published:
  • Updated: