Explainer: Watermarking in AI

When generative AI first broke through, it was easier to spot if a text was AI-generated. It had that unmistakable GPT tone of voice. But since then, it has gotten harder to tell the difference between human and machine creation.

We can look at a video and debate whether it is real or deepfake, or a piece of writing and think: who or what wrote this? This is bad news for professors marking essays and for artists protecting their craft.

Wouldn’t it be handy if AI left a trace, like the faint design of a watermark on copyrighted photography? Now it does, and it’s so faint that it’s cleverly imperceptible to humans but detectable for identification – bad news for students hoping to cheat.

Soon, it will be even legally necessary in the EU. The EU AI Act says that all AI systems must “ensure that the outputs of the AI system are marked in a machine-readable format and detectable as artificially generated or manipulated”.

This blog explains watermarking methods that developers can use.

FLock’s research team is exploring trends in a series of educational blogs to help our community stay ahead. Our last two explainers were on model distillation and data synthesis, and reinforcement learning (RL) in LLMs – stay tuned for more.

What is watermarking?

Watermarking, in both physical and digital contexts, asserts ownership, verifies authenticity, protects intellectual property, and prevents unauthorised use or duplication, often by embedding a visible or invisible identifier like a logo or text.

The tool modifies how the model selects words, subtly embedding a watermark into the text. This watermark is invisible to readers, and remains intact even if the text is copied, without degrading the quality of the output.

Which tech companies have rolled out watermarks?

Releasing watermarks is obviously the responsible thing to do, but some companies are hesitant. Many everyday users prefer to pass off GPT-fodder as their own voice – almost 30 per cent of users OpenAI surveyed said that they’d use ChatGPT less if watermarking was implemented.

Google Deepmind and Hugging Face have been watermarking their AI-generated content since last autumn. Together, they released a tool called SynthID, which can scan images, audio or video, helping users determine if something was generated by their AI tools.

The tool is open source, so it’s available to other developers to check if outputs have come from their own LLMs. But it’s far from foolproof, with tools emerging promising to “humanise” your AI-generated text.

Meta released Video Seal to help detect deepfakes. OpenAI is also reportedly testing a new image generation watermark, possibly because of the swathes of people using it to create Studio Ghibli-style pictures.

Despite having the tools, OpenAI has decided not to watermark ChatGPT yet … Until it has achieved 100% accuracy.

Why watermark AI models?

Models, especially LLMs, require significant time, data, and computational resources to develop. However, they can easily be fine-tuned or copied, leading to unauthorised use and intellectual property theft.

When training models, especially in collaborative or open environments, protecting the ownership is essential. If someone gains access to a model artifact – say, via a shared repo or public leaderboard – they could potentially steal or misuse that model without attribution.

Watermarking provides a lightweight and unobtrusive way to verify authorship and detect unauthorised reuse. This is particularly valuable in settings where models are publicly validated or shared across multiple users.

If you’re building or fine-tuning models, watermarking can help:

Protect your IP: Tag your models with identifiers to assert ownership.
Ensure compliance: Align with upcoming EU and global regulations.
Build trust: Signal to users and collaborators that your models are traceable and responsibly developed.

It’s not a silver bullet. Obfuscation and “de-watermarking” tools already exist, but it’s a meaningful step toward more transparent AI.

How does AI model watermarking work?

Watermarking involves embedding a hidden signal or pattern that is difficult to remove without significantly altering the model’s functionality.

A good watermark should be invisible to end users and undetectable unless you're looking for it. It should persist through moderate model modification, such as fine-tuning. And you should be able to clearly demonstrate the watermark’s presence in an audit.

Open-source frameworks like Hugging Face’s transformers library can be useful starting points for testing techniques.

There are several methods to achieve this, and suggestions for testing whether they are right for your model.

1. Parameter-based watermarking

This method involves modifying specific model parameters to contain a unique identifier. It is effective for detecting stolen models but can be bypassed if a model is retrained or fine-tuned.

How to test it?

Train an AI model and modify certain weights to encode a unique identifier.
Use a second model to verify if the identifier remains after fine-tuning.

2. Dataset watermarking

Here, specific patterns—like custom tokens, phrases, or prompt-response pairs—are embedded in the training data. These propagate through to the final model, creating detectable signals in its outputs.

This method is more resilient against model modification and can survive moderate fine-tuning.

Common patterns:

Secret trigger prompts that yield unique outputs.
Injected data pairs where a specific input yields a known, trackable response.

How to test it?

Introduce a small, imperceptible modification to a subset of training data (e.g., adding specific words, patterns, or noise).
Train a model on both watermarked and unwatermarked datasets and compare their outputs.
Use statistical analysis to determine if the watermark is detectable in the trained model.

3. Output-based watermarking

This method introduces prompt-response pairs that act as “proof-of-authorship”. If the model consistently responds in a predefined way to a hidden prompt, it's likely your model, or derived from it.

How to test it?

Design a set of unique prompts known only to the model owner.
Monitor whether the same outputs are generated post-fine-tuning or in third-party deployments.

Real-world applications of AI watermarking

1. Protecting AI startups and researchers

Startups and research institutions invest significant resources into developing AI models. Watermarking helps ensure their work is not copied without attribution.

2. Regulating AI in sensitive domains e.g. healthcare and finance

Watermarking ensures compliance with regulations by tracking model usage and preventing unauthorised deployment.

3. Fighting AI-generated misinformation

Deepfakes are in the news every month, with malicious AI-generated nudes, identities created by scammers, and fake videos of politicians spreading like wildfire.

Watermarking can help identify whether content was created by an AI model, aiding in the fight against deepfakes and misinformation.

The future of AI watermarking

Watermarking is a promising but evolving solution.

Researchers are exploring:

More robust watermarking techniques that are resistant to fine-tuning and adversarial attacks.
Standardisation efforts to create industry-wide watermarking practices for AI.
Legal and ethical considerations regarding the implementation and enforcement of AI watermarking.

Watermarking will play a crucial role in securing intellectual property and ensuring responsible AI development as the landscape evolves.

About FLock

FLock.io is a community-driven platform facilitating the creation of private, on-chain AI models. By combining federated learning with blockchain technology, FLock offers a secure and collaborative environment for model training, ensuring data privacy and transparency. FLock’s ecosystem supports a diverse range of participants, including data providers, task creators, and AI developers, incentivising engagement through its native FLOCK token.

Explainer: Watermarking in AI

Explainer: Watermarking in AI

What is watermarking?

Which tech companies have rolled out watermarks?

Why watermark AI models?

How does AI model watermarking work?

1. Parameter-based watermarking

2. Dataset watermarking

3. Output-based watermarking

Real-world applications of AI watermarking

1. Protecting AI startups and researchers

2. Regulating AI in sensitive domains e.g. healthcare and finance

3. Fighting AI-generated misinformation

The future of AI watermarking

Weekly newsletter

More Articles