Emerging best practices for disclosing AI-generated content
More content is now being generated by AI. What responsibilities do online publishers have to disclose their AI use, and what best practices should they follow?
Published on Aug 30, 2023
More content is now being generated by AI. What responsibilities do online publishers have to disclose their AI use, and what best practices should they follow?
Published on Aug 30, 2023
As generative AI changes content operations, online publishers face new decisions. Not only do they need to decide how and when to use AI to support the development of content. They must also decide how to communicate that decision to their customers.
How can organizations using AI to develop their content also ensure they are creating a good experience for content users?
Generative AI is already being used widely in enterprises by employees and small teams, often without the knowledge of executives or content leadership. A recent article in Business Insider discussed “the hidden wave of employees using AI on the sly,” what is known as “shadow IT.”
Generative AI is also becoming embedded within content development tools. Google’s Duet, for example, allows content creators to use generative AI capabilities within Google Office applications. Similar functionality is appearing in many enterprise applications.
In many respects, it’s encouraging to see employees embrace new technology instead of hesitating to change their work routines. But ungoverned use of generative AI carries a host of risks. Given that the use of generative AI is already becoming the norm in many organizations, organizations need policies and processes governing its use, especially around disclosing that use.
Disclosing AI use enhances the value of the content. It can improve its internal governance as well as its acceptance by readers. It gives visibility to your organization’s values relating to AI use.
Recent research on the user acceptance of AI notes: “Transparency plays an important indirect role in regulating trust and the perception of performance.”
Employees may be keen to use AI, but not all customers are so sure. The issue is not just how customers feel about AI but how they feel about it being used without their knowledge.
At stake are the highly emotional issues of fairness and deception. Online users can feel vulnerable to the dangers of deception and manipulation, such as the hiding of consent-related information or the misrepresentation of the source of messages that occurs in phishing and other kinds of social engineering. Many readers are on guard while online, evaluating if what they are viewing is genuine or fake.
The confusing dichotomy of real and fake. Unfortunately, judging content as real or fake based on how it was created doesn’t predict whether the content is truthful or helpful. Human-created content may seem “real,” but it is not necessarily safe, while synthetic or machine-created content is not necessarily unsafe or unhelpful. That’s why more transparency is required.
Transparency about AI usage can set more realistic expectations about the content and reduce misunderstanding concerning the helpfulness or accuracy of the content.
Transparency can mitigate risks when using AI to develop content. Customers won’t feel deceived. Transparency is the foundation of explainability: How was the content created? It helps users learn to trust the content.
The US National Institute of Standards and Technology’s Trust and Artificial Intelligence report notes that “learned trust is a result of system performance characteristics as well as design features that color how performance is interpreted.”
Transparent content will comply with a growing number of policies and standards being adopted by tool vendors and content distribution platforms.
Generative AI introduces a significant new dynamic shaping how brands are perceived and trusted. Cringy or inaccurate content damages a brand. Publishers can’t afford to explain their use of AI only after a problem arises. They need to preempt misunderstandings beforehand.
Trust is non-negotiable. AI needs to prove that it is trustworthy. A recent US poll revealed that “62% of people said they are somewhat or mostly ‘concerned’ about AI.”
In the popular imagination, AI can have a reputation for being disruptive and opaque. Brands can’t presume that the public will trust their use of AI, even when they use it to provide better quality, more useful content. They need to earn that trust.
A growing number of organizations are embracing the principle of Responsible AI and committing to the “three Hs”: that their outputs are helpful, honest, and harmless. Disclosures provide an opportunity to highlight a brand’s commitment to responsible AI practices.
Respect customer concerns about AI. AI is a black box: what the AI is doing is not obvious to consumers. But they know that the misuse of AI can cause harm. Brands have an obligation to be forthcoming about how they use AI.
Trust depends on transparency. In times of rapid change and major technology transition, transparency is even more important for building trust. It’s critical for organizations to clarify how they use generated AI. Brands must show they have the interests of their content readers at heart. They must demonstrate that they are making good faith efforts to mitigate harm that would result from content containing misleading or dangerous information, toxic language, or an aggressive tone.
Show that you have nothing to hide. Both brands and their customers are harmed when consumers are misled. The easiest way to mislead is to pretend content was created by humans when, in fact, it was AI-generated.
Revealing the use of AI will build trust in the content that incorporates the AI’s outputs. Readers can make their own decisions about the usefulness of the content, empowered with the knowledge of how the content was developed. They gain the ability to distinguish honest actors who are transparent from bad actors who hide their intents.
Communicating AI use promotes credibility. Remove doubts about whether machines wrote any of the content. By revealing the use of AI, a brand demonstrates its confidence that the content it offers meets its standards. Provided that the content is good, readers will associate AI-generated content with high standards. They trust the brand to choose the best process to provide them with the right information.
Disclosing AI use also promotes vigilance within an organization to verify that their content is appropriate. By being transparent with their process, content teams will be motivated to show they are not just copying and pasting generated output but actively shaping and checking that output. One publication instructs its writers: “Do not plagiarize! Always verify originality. Best company practices for doing so are likely to evolve, but for now, at a minimum, make sure you are running any passages received from ChatGPT through Google search and Grammarly’s plagiarism search.”
Customers need help navigating the changing world of online content. Nowadays, it can be difficult to tell if content is human-created or not since the quality of AI-generated content has improved dramatically. Readers can’t see how the content they are using was developed.
Generative AI can be used both to create ideas and to present them. Because generative AI can play several roles, publishers should adopt more than one approach to announcing the use of AI.
Publishers can disclose AI involvement through three kinds of signals:
The first thing readers notice about content is its “feel”: how it looks and sounds. Do they like it, and are they inclined to believe it? Anticipating these first impressions is getting more complicated with the rise of generative AI. The public expects the use of computers in content development, but they don’t want to feel snookered. The wrong presentation of the content can undermine the content’s credibility.
Bots increasingly present content or determine how ideas are represented. They do so in multiple ways, depending on the media:
Some of these choices extend beyond the content itself and are defined on the presentation layer. With a decoupling of content from UI design, the content can be presented in alternative ways across different channels. Generative AI makes it easier to convert content from one medium to another, allowing channel-agonistic omnichannel content to become multi-modal transmedia – content that can be represented in different media formats and accessed via different modalities.
Who is behind the content? Media formats represent content in alternative ways. Content presentation expresses the personality of the content. Content creators commonly develop content to convey a specific persona that will appeal to the target audience. It’s important that the outward-facing persona matches the content’s details and intention.
The presentation of AI-generated content must convey credibility. Machine-generated content, when presented clumsily, will look phony. The content must strike audiences as honest and avoid coming across as creepy.
Don’t try to fool audiences into believing that synthetic content is “real.” How the generated content is represented will shape audience expectations of it. Publishers should signal that the content is not human-created if audiences would be misled by believing it was.
Britain’s Royal Society notes the difficulty of distinguishing AI-generated images from photographs: “The realism of these outputs means that trust in genuine images could be undermined and that there may now be a need for an image verification system to support users in distinguishing between real and artificial content.” When it is not clear what content is AI-generated, it becomes impossible to know which content was human-written as well.
Be careful not to give bots human attributes. The AP, whose style guidance is relied upon by numerous corporations and news outlets, recently issued guidelines about how to refer to ChatGPT and similar tools: “Avoid language that attributes human characteristics to these tools, since they do not have thoughts or feelings but can sometimes respond in ways that give the impression that they do.”
Practice “creator fidelity.” Give viewers hints that the content is AI-generated. Make clear from the representation that the content is bot-created rather than human-created.
Hyperrealism is misleading. If generating photorealist images, include artistic treatments in the image to signal that the image is not a candid photo. The recently formed Partnership on AI, an industry group that includes Adobe and Microsoft, notes the potential misuse of “creating realistic fake personas.” Synthetically generated avatars may resemble generic people but shouldn’t resemble a known person too much. Few people seem to find videos of dead celebrities endorsing new products very convincing.
Let bots be bots. The bad implementation of voice synthesis in audio or video content can interfere with user acceptance, especially within interactive content such as voice bots. Users want to know that they are dealing with a bot. They don’t want the bot to be either too artificial or too lifelike. People are turned off by robotic-sounding voices, but they will feel tricked or unsettled if the voice sounds so lifelike that it hesitates and mumbles, implying it is thinking like a human. Don’t call unnecessary attention to the voice; try using gender-neutral voices, for example. Bots should tell users they are bots. It’s better to adopt a straightforward, businesslike voice whose output is consistent in tone, signaling it’s obviously a bot.
Chatbots offer fewer cues to users that they are bots and not a human on the other end who is typing responses. Again, steer away from hyperrealism, such as mimicking human language traits such as slang. The voice and tone of the generated content will be important. Avoid sounding overly familiar, as if the bot was an old friend. Train your bot on business content rather than more informal user-generated online content. And clearly identify the bot as a non-human actor without resorting to a cartoonish avatar that won’t be treated seriously. Aim for transparency and be on guard for how the bot might appear phony and noncredible.
Bots shouldn’t offer a pale imitation of human-created content. They don’t need to pretend they are human to deliver a good experience. They can deliver content as useful as humans can in many contexts. They just need to be true to what they are.
Verbal signals: explicit statements to audiences
In most situations, behavioral signals aren’t enough to inform audiences that content is generated by bots. They need more explicit statements.
Crafting statements involves two decisions: what to disclose and how – the format.
Deciding what to disclose involves communicating the role of generative AI in producing the content. More specifically, online publishers need to decide whether to focus on who created the content or how the content was created.
Bots aren’t authors. While human-created content has an author, AI-generated content is created by a machine. Should AI-generated content be listed as an author? Most experts agree that it should not be.
Authorship implies human involvement and a degree of originality. The US Copyright Office notes that “what matters is the extent to which the human had creative control over the work’s expression and ‘actually formed’ the traditional elements of authorship.” They refuse to allow “AI” to be listed as an author.
A growing number of online publications also refuse to list AI as the author. Distribution platforms such as Google also advise against it: “Giving AI an author byline is probably not the best way to follow our recommendation to make clear to readers when AI is part of the content creation process.”
Reveal your content development process. Rather than list AI as the author, brands can disclose how AI was involved in the development of the content.
Online publishers should take ownership and responsibility for everything they publish while at the same time being forthright about the role of generative AI in the production of the content.
There is less need to disclose AI generation of functional copy that doesn’t involve creative expression. This includes supporting elements of the content, such as a summary, a title, or a metadata description, as well as generic web and app copy that communicates functional messages. These kinds of content elements have long been machine-generated and have predictable outputs.
Communicate disclosure using a consistent content structure. Those publishing AI-generated content can draw on a range of disclosure elements, which can be used individually or in combination, depending on the circumstances.
Commonly used structures are:
Tag AI-generated content with meaningful labels. Labels should reflect plain language as far as possible. Avoid overly technical or scary terms that imply that the content is inherently untrustworthy.
Short content labels are useful for images and time-based media such as audio and video. These may appear as visual tags within the content or be inserted parenthetically within transcripts or dialogs. Labels provide a succinct, standardized set of terms to indicate the nature of the content generation or which parts/segments were AI-generated.
Many content distribution platforms are adopting content labels. Reports indicate that TikTok and Instagram intend to label AI-generated images, though how they plan to do that has not yet been revealed. As more platforms adopt labels, a common terminology may emerge.
Visible watermarks provide another way to indicate the use of AI. Readers are already accustomed to seeing watermarks on content saying “draft” or “sample.” Watermarks can indicate AI-generated content as well. “Watermarks are icons or filters that visually cover content to advertise or cite the tool used to create it. They are also often automatically applied by commercial AI manipulation tools.” By indicating the tool used, the watermark signals the use of AI.
Because bylines signal authorship, use them with care. Harvard’s NiemenLab blog notes: “The use of bylines to credit AI authorship has caused controversy because of what’s called the ‘attribution fallacy,’ a term coined by AI researcher Hamid Mattioli Ekbia to describe when people uncritically accept that a task was performed autonomously by AI.”
When AI-generated output is included in the content, how should bylines be handled?
If an individual rewrote AI-generated output substantially, that person should be listed in the byline. Recent guidance from the US Copyright Office acknowledges the situation where human authors develop content that incorporates AI-generated output or substantially transforms AI-generated content. It recommends that the individual should claim authorship for what they created in such a situation (but not claim authorship of machine-generated elements.)
When the bot substantially creates the output, and no individual writes any portion of the content, then a traditional author byline is not the best option. A better option would be to use credits indicating the content was “produced by” the organizational unit responsible for the content or was “reviewed by” an individual or staff instead of stating an individual authored the content.
When AI co-develops content, describe AI’s role in a disclosure field. Disclosure fields can explain or qualify what is presented. When used with synthetic media such as generated images, disclosures may indicate that the media does not depict a real person or place. For example, a disclosure might indicate “simulated image” or “machine-generated image” in the caption. This practice is similar to the disclosures about “representative images” that stand in for an image of the specific instance discussed within the content.
Disclosure fields indicate that some (or all) of the content was AI-generated. They are used to counter the possibility that the content could be misinterpreted. They are factual statements rather than instructions. More specific disclosures include:
Acknowledgments can provide more contextual detail about the contribution of AI, especially when there is the potential for confusion. The Partnership on AI notes the potential confusion that can arise from “inserting synthetically generated artifacts or removing authentic ones from authentic media.” Statements can explain if AI altered the original content.
An acknowledgments section can explain what modifications to existing content have been made or how AI was used to develop the final content. Some online publications are adopting this practice.
In many cases, readers won’t be familiar with the existing content that has been transformed by generative AI. Acknowledgments can indicate the provenance of the underlying information to reveal who is responsible for it. Provenance statements can signal the credibility or ownership of information within the content.
Provenance can be complex. For example, a brand could use copyrighted content licensed from a third party, use generative AI to transform it, and then revise the AI-generated version substantially.
Provenance statements are common for derivative content, whether AI-generated or not. Some examples of content provenance statements:
The above examples portray usage rights but are merely illustrations of these rights, not legal guidance on how or when to assert copyright ownership. Copyright involves many nuances, and rights can vary by location, so consult official guidance from relevant jurisdictions.
The US Copyright Office guidance counsels that authors can’t claim copyright over the source material from which they derive their work (whether it is machine-generated or developed by another party.) Authors can only assert copyright for the original work they have created – the substantive changes or additions they’ve made to the source material. It explains that “an artist may modify material originally generated by AI technology to such a degree that the modifications meet the standard for copyright protection. In these cases, copyright will only protect the human-authored aspects of the work.”
Technical signals: disclosing AI-generated content to machines
Lastly, publishers can disclose the use of generative AI through technical signals that will be understood by other platforms that distribute or present the content. Technical signals can help indicate two kinds of trust issues:
Technical signals offer several advantages over behavioral and verbal signals:
Technical signals have certain limitations. They are helpful to users only when:
Another limitation is that technical signals so far aren’t widely used for text-based content. They primarily target photo imagery, video, and software files.
Technical signals fall into three categories, which either address either the composition or the origin of the content:
Metadata relating to AI-generated content is evolving quickly. The major effort is being led by the IPTC in collaboration with other groups and tech companies. Several image tools, including Midjourney and Google’s image-making app, will add IPTC metadata to their outputs, as will the image library Shutterstock.
Google has announced it will use IPTC properties in search results and recommends publishers include them when appropriate. Google announced, “Google Image Search will read and display [labels] to users within search results.”
The IPTC largely focuses on the composition of the content, specifically photo imagery and video. They have developed the “digital source type” vocabulary, which now covers a range of AI-generated types. These terms may not be understandable to audiences; Google’s explanation of each is in parathesis:
These terms can be included in an image file. Image editing software such as Photoshop supports XMP and IIM (Information Interchange Model) metadata fields that host IPTC properties.
Technical watermarking is an invisible method used to verify content, developed to counter forgery and spoofing. It will be especially useful to verify the integrity of third-party content that could be republished. They are “technical signals embedded in content that are imperceptible to the naked eye or ear.” According to IBM researchers, technical watermarking “consists of both a watermark stamping process which embeds a watermark in a source image and a watermark extraction process which extracts a Watermark from a stamped image. The extracted watermark can be used to determine whether the image has been altered.”
Anthropic, Inflection, Amazon.com, OpenAI, Meta, Alphabet, and Microsoft have pledged to develop “a system to ‘watermark’ all forms of content, from text, images, audios, to videos generated by AI so that users will know when the technology has been used.”
There have been some concerns that watermarks could be removed from text content, but one researcher indicates that nearly 50% of words would need to be changed to remove the watermark.
Cryptographic signatures are another technique to signal the provenance of AI-generated content. These provide a way to verify who created the content – a recognition that AI generation is not itself bad, but rather, the focus should be on outing bad actors who misuse AI generation. In this process, a publisher “signs” an encrypted digital certificate that contains metadata about when the content was created and by whom. The Coalition for Content Provenance and Authenticity has developed a standard that Microsoft is adopting in its Bing Image Creator, which also includes the IPTC’s digital source type metadata.
Implementing disclosures in content models and workflows
Online publishers need a repeatable process to disclose AI-generated content. Content models and workflows can institutionalize the capture of details to disclose.
Structure disclosures into your content model. Structured content within a headless CMS can accommodate the diverse ways to disclose AI. Structured content allows publishers to add different kinds of disclosures to satisfy various use cases and stakeholder requirements.
Online publishers should add structured fields to content types that incorporate generative AI. Images should be a top priority since both their use and their requirements are evolving quickly.
Human readers and technology platforms have different needs, so be sure to incorporate both human-visible fields and machine-readable metadata.
Use machines to reveal the work of machines. An emerging practice is to automate the generation of disclosures. AI-generation tools are starting to incorporate technical signals into their output. Workflows can autogenerate verbal statements to display to readers when generated AI is used. This can be done on either:
The goal of disclosure automation is to reduce the work for authors rather than to take authority away from them. Only authors will know all the ways and the full extent to which they have transformed the content.
“AI is involved to some degree in nearly every part of our information ecosystem. What is the threshold for AI’s involvement that qualifies it for labeling?” asked the nonprofit First Draft coalition a couple of years ago. They cautioned that “the best practices around labeling AI outputs as they travel across social media and messaging platforms are still nascent.”
Much of the early work on AI disclosure was motivated by the desire to label harmful content that’s produced to deceive. But as generative AI becomes more mainstream and supports a growing range of legitimate business goals, it is no longer appropriate to only discuss “deep fakes.” A broader descriptive vocabulary is needed.
Industry-wide disclosure standards and practices are emerging. And we can expect more developments to come as vendors, industry bodies, and regulators add requirements and guidance.