Published On: March 21, 2024Tags: AI, AI content, cyber security

GBS tests the most popular tools for AI content detection

Close-up hand on a computer keyboard. Diverse pro gamer team with african ethnicity leader competing at video game eSport championship

In our previous blog article, we dived into the topic of identifying AI-generated texts. Dr. Rolf Kremer, R&D Manager at GBS, presented methods for analyzing the content of documents or emails to determine whether they were written by a human or an artificial intelligence (AI). In this blog post, Dr. Kremer and Dirk Nolte, Senior Software Developer, provide an overview of the AI detection tools available on the market dedicated to this task – they evaluate an input text to determine whether it was created by an AI or a human.

This detection proves to be quite difficult at the moment, as the technology is still in the early stages of development. Additionally, most of these tools have been developed in Anglo-Saxon countries, which is why they are most successful with the English language. For other languages, such as German, the accuracy of the results sometimes suffers considerably. Yet it is only a matter of time before these tools will be able to process other languages with the same accuracy as English.

A particularly critical situation is, of course, when a tool incorrectly misidentifies a text written by a human as an AI-generated text. Depending on the intended use, this can have negative consequences for the person or a company. For example, if it is an exam-related text in a school or university environment. Another example would be a client of a consulting firm who mistakenly assumes that they have to pay the fee for a results report generated by an AI.

Manual vs. Online AI Detection Tools

When it comes to AI tools (AI detection tools), a general distinction can be made between manual and automatic use via an interface. When used manually, the text to be checked must be entered by a human in the AI detection tool which then provides the result of the authorship. This only allows a small number of texts to be checked per time period. In addition, these tools often can be used free of charge only for short texts. For more extensive texts, a paid account is needed.

The use of AI detection tools via a programmable interface (e.g. via REST API) has the advantage that larger quantities of texts can be checked. In most cases, such use is only available for a fee. It is also possible to integrate the AI detection into the company processes. For example, incoming emails can be automatically scanned for spam, phishing, fake content and so on. In this case, the internal recipient receives the email with an attached note indicating that the email was not generated by a human but by an AI tool. Table 1 shows which AI detection tools listed have a programmable interface.

iQ.Suite – the email solution for security and productivity by GBS – also has a REST interface. This allows the customer to create their own tool that runs the email texts through an AI detection tool. The tool pulls the texts from the email out of a quarantine, then checks them using the AI recognition tool and enters the result back into the email in the quarantine. This could be used to set a label, for example.

Learn more

In the case of AI detection tools for online use, it is important to understand that the data leaves the company’s boundaries. These AI detection tools are therefore not suitable for confidential data. For data protection reasons, it is safer if the AI detection tools can be integrated into the company infrastructure so that the data does not leave the company. However, this is difficult, as external sources on the internet are often used to check the AI detection tools.

GBS tested the most common AI recognition tools for accuracy

An up-to-date overview of AI detection tools for various purposes can be found on the websites ki-suche.io or TopAI.tools, for example. In addition to these AI detection tools, analyses of AI detection tools can also be found online, such as the article Evaluating the efficacy of AI content detection tools in differentiating between human and AI-generated text in the “International Journal for Educational Integrity” or Testing of Detection Tools for AI-Generated Text by a panel of authors from the “European Network for Academic Integrity”.

GBS has tested a few of these AI detection tools itself to check their accuracy and presents the results below. For the test were used three texts in English. One text was generated using ChatGPT 4, another text was formulated independently and the third was mixed, i.e. the first paragraph was generated using ChatGPT 4 and the second was written by human. Table 1 lists the different AI detection tools and their results. Logically, the recognition rate for the AI-generated text should ideally be equal to or close to 100%. For the human-generated text B, a recognition rate equal or close to 0 % is best and for the mixed text C, the recognition rate should be equal or close to 50 %.

Table 1: Results for the AI detection tools used (all tests were carried out on 09.03.2024)

Figure 1: Test results for Copyleaks (mixed text, left) and GPTZero (AI-generated text, right)

The findings of the test reveal that the four AI detection tools AI Content Detector, AI Detector, Copyleaks and Plagiarismcheck performed best for AI-generated text. For human-generated text, Copyleaks and GPTZero delivered the best results. For mixed text, a value around 62% would be best, as this is the ratio between the AI-generated text part and the text part written by a human. Here, GPTZero yields the best result, while the AI Detector, Copyleaks, Plagiarismcheck and ZeroGPT misidentify (almost) all of the text is AI-generated. Figure 2 shows the results graphically.

Figure 2: Comparison of the AI detection tools used

By the way, OpenAI, the provider of ChatGPT, had launched its own detection tool called AI-Classifier. However, this was discontinued due to insufficient accuracy.

Takeaway

The AI-detection technology will certainly continue to develop so that the accuracy of predictions will increase. Yet, generative AI systems are constantly being developed as well, so that in future it will be very difficult to distinguish the texts and images generated from those generated by humans. For example, texts generated with ChatGPT 4 are more difficult to identify than texts generated with ChatGPT 3.5 (see Figure 3). In this respect, it is questionable what future such AI detection tools will have. If needed, such texts or images may be watermarked in the future. This will make it possible to recognize whether the text or image was created by an AI or a human, even without an AI tool analysis. Of course, it is also to consider that a human may generate the text by an AI first and then modify it to a greater or lesser extent.

All tools presented use machine learning methods for verification. The next blog article will therefore describe the different methods of machine learning.

Figure 3: Comparison of ChatGPT 3.5 and ChatGPT 4 with different tools (excerpt). Source: Evaluation the efficacy of AI content detection tools in differentiating between human and AI-generated text, Table 3.

Texts used for test:

Text A: Text generated by ChatGPT 4 (recipient and sender name added manually)

Dear Mark,

I hope this email finds you well. Today, I’d like to share with you an insightful overview of Keynesian economics, a theory that has significantly influenced modern economic policies and thought. Developed by the British economist John Maynard Keynes during the 1930s, in response to the Great Depression, Keynesian economics challenges the classical economic idea that markets are always clear and that economies can self-correct through supply and demand adjustments. Keynes argued that, during periods of economic downturn, private sector demand might not be sufficient to maintain full employment. He suggested that, in such times, government intervention through increased public spending and lower taxes could stimulate demand, thereby pulling the economy out of recession. This approach advocates for an active role of the government in managing economic cycles, emphasizing the importance of fiscal policy alongside monetary policy in regulating economic activity.

Best regards,

Ken Miller

Text B: Text was self-generated (translation of a paragraph from the previous blog article)

In recent years, systems based on artificial intelligence have been constantly developed so that they can now generate texts that increasingly resemble texts generated by humans. As technology advances, these texts become more sophisticated, making them more difficult to distinguish from human-generated texts. Below are some features that can be used to recognize AI-generated texts, which can also be contained in emails, for example.

On the one hand, this can be done by analysing the writing style. AI-generated texts tend to have a monotonous and formulaic writing style. This contains recurring patterns, excessive neutrality, or a lack of personal nuance. It can also happen that unusual wording or abrupt changes of topic can be found in the texts. Such inconsistencies in context should not be present if the texts were written by a human or at least proofread by a human before publication. Longer contents of documents that are intended to be perceived as authentic and trustworthy should contain comprehensible and trustworthy sources. AI-generated texts usually do not contain any indication of sources. If the content also contains images, they often show a lack of realism in the details or inconsistencies in light and shadow. Especially when people are depicted in the picture, the colour tones usually appear unrealistic.

Text C: First paragraph is from ChatGPT4 (1,026 characters), the second part was added manually (631 characters)

The General Data Protection Regulation (GDPR) is a comprehensive data protection law that came into effect in the European Union on May 25, 2018. It aims to give individuals control over their personal data and to simplify the regulatory environment for international business by unifying the regulation within the EU. The GDPR mandates that personal data must be processed lawfully, transparently, and for a specific purpose. Once that purpose is fulfilled and the data is no longer needed, it should be deleted. It also grants individuals the right to access their data, correct inaccuracies, and, in some cases, have their data erased. Importantly, the GDPR requires organizations to obtain explicit consent from individuals before processing their data, implement measures to protect data, and promptly notify authorities and individuals of data breaches. Non-compliance can result in hefty fines, making it imperative for organizations that process the data of EU citizens to ensure they meet GDPR requirements.

The GDPR is not yet fully implemented in many companies now. On the one hand, this is because many company processes must be changed. On the other hand, the controls by the data protection supervisory authorities have not yet been carried out extensively. In Germany, each federal state has its own data protection supervisory authority. This means that checks are carried out with varying intensity. On the other hand, the supervisory authorities have had to be set up in recent years and are staffed differently depending on the federal state. Each data protection supervisory authority is led by a state data protection officer.

Author: Dr. Rolf Kremer & Dirk Nolte

GBS tests the most popular tools for AI content detection

Manual vs. Online AI Detection Tools

GBS tested the most common AI recognition tools for accuracy

Takeaway

Types of Machine Learning exemplified by spam analysis: Part 2

Post Title

GBS tests the most popular tools for AI content detection

Manual vs. Online AI Detection Tools

GBS tested the most common AI recognition tools for accuracy

<img decoding="async" class="alignnone wp-image-9063 size-full" title="AI detection test results" src="https://gbs.com/wp-content/uploads/KI-Tools-11.jpg" alt="" width="700" height="1200" />

<img decoding="async" class="alignnone wp-image-9067 size-full" title="AI detection test" src="https://gbs.com/wp-content/uploads/AI-test.jpg" alt="" width="1400" height="800" />

<img decoding="async" class="alignnone wp-image-9070 size-full" title="Comparison of the AI detection tools" src="https://gbs.com/wp-content/uploads/Screenshot-2024-03-20-174709d.jpg" alt="" width="1300" height="900" />

Takeaway

<img decoding="async" class="alignnone wp-image-9073 size-full" title="ChatGPT 3.5 vs. ChatGPT 4" src="https://gbs.com/wp-content/uploads/Picture7.png" alt="" width="1000" height="1200" />

Types of Machine Learning exemplified by spam analysis: Part 2