Types of Machine Learning exemplified by spam analysis: Part 1
The first two articles in our series on AI recognition, Human or machine? Distinguishing between real and AI-generated content and GBS tests the most popular tools for AI content detection, already presented methods and tools that can be used to analyze whether a (text) content was created by a human or by an AI-based system. In this article, Dr. Rolf Kremer, R&D Manager at GBS, and Dirk Nolte, Solution Architect, take a closer look at the different types of machine learning (ML), an intriguing category of artificial intelligence (AI). The application of machine learning will be illustrated through the example of spam analysis of emails and more specifically, how ML technology can recognize whether an email should be classified as spam or can be safely delivered to the inbox.
The Spam functionality of iQ.Suite – the email solution for security and productivity by GBS, uses machine learning-driven CORE – Content Recognition Engine for superior results. It combines different analysis methods to provide email classification, which enables to improve business processes, such as response management, customer support and communication.
Types of Machine Learning
In general, machine learning uses a data set (training data) to generate a model which, in the example of spam analysis, can distinguish whether an email is classified as spam or non-spam. This data is labelled as marked (spam, non-spam). All data that has not been classified is referred to as unmarked data. Depending on the type of machine learning, the output, i.e. the prediction or recognition, can then be evaluated and the training data expanded (see Figure 1). The different forms of ML encompass supervised learning, semi-supervised learning, unsupervised learning, reinforcement learning and active learning.
Figure 1: Machine learning (Based on Trabold, D., LAMARR Institute for Machine Learning and Artificial Intelligence, 2021)
Types of Machine Learning: Supervised learning
Supervised learning is mostly used for classification processes in which the system learns from examples. In the case of spam detection, the existing emails are already divided into the categories “spam” and “not spam”. Each new incoming e-mail is checked for certain characteristics, e.g. whether it comes from a sender address or contains a certain word that classifies e-mails as spam. If these characteristics are present, the e-mail is also categorized as spam. If a user discovers an e-mail in a mailbox that they recognize as spam, they can report it as such and the system can continue to learn. Figure 2 provides a schematic illustration of supervised learning.
Figure 2: Supervised learning (Based on Trabold, D., LAMARR Institute for Machine Learning and Artificial Intelligence, 2021)
Types of Machine Learning: Unsupervised learning
With unsupervised learning, the system does not have examples at its disposal, but tries to recognize patterns and correlations in the data on its own. For spam detection, emails can be divided into clusters based on their similarity. The aim is to group emails in such a way that the emails within a cluster are more similar to each other than those in other clusters. Such similarities in spam detection can be based on various characteristics such as text content, sender address, salutation, personal names, greetings, subject lines, use of HTML, links and attachments. For example, emails with a nameless salutation can be classified as spam. Emails with a salutation containing a full personal name are not classified as spam. Typically, it requires very large amounts of data to create an assessment of an unknown data set (new emails).
The clustering method can be used to detect emerging or rapidly changing spam mechanisms for which no labeled data is yet available. However, the clusters can be difficult to identify, especially if they are not clearly differentiated. Figure 3 illustrates unsupervised learning.
Figure 3: Unsupervised learning (Based on Trabold, D., LAMARR Institute for Machine Learning and Artificial Intelligence, 2021)
Types of Machine Learning: Reinforcement learning
Reinforcement learning is another type of machine learning. Through reward and punishment, the algorithm gradually learns how to act. When it comes to spam detection, the system is trained through interaction to correctly classify emails as spam or non-spam by rewarding it for correct decisions and penalizing it for errors. The procedure is illustrated schematically in Figure 4.
Figure 4: Reinforcement learning (Based on Trabold, D., LAMARR Institute for Machine Learning and Artificial Intelligence, 2021)
The other listed types of machine learning, semi-supervised and active, are presented in part 2 of the article.
Author: Dr. Rolf Kremer & Dirk Nolte