6 Risks Your Organization Faces from Shadow AI

The following is an excerpt from “Shadow AI: Managing the Unseen Copyright Risks in Your Organization,” published by KMWorld. You can read the full piece here.

In today’s rapidly evolving digital landscape, organizations face a new challenge that combines technology adoption, information governance, and copyright compliance: Shadow AI. While business leaders work to develop formal AI governance frameworks, employees are adopting generative AI tools at unprecedented rates—often without official approval or oversight.

This disconnect creates significant risks, particularly regarding copyright compliance throughout the AI lifecycle. From a knowledge management perspective, copyright considerations are critical at every stage of AI interaction within an organization:

Training Data Concerns & Downstream Liability

Foundation models are trained on massive datasets often scraped from the internet—datasets that inevitably contain copyrighted text, images, and code. The legality of this training practice is highly contested and currently the subject of approximately 40 ongoing lawsuits filed by copyright holders against major AI developers. This unprecedented wave of litigation will take years to resolve through the courts, potentially resulting in conflicting decisions that are highly fact-dependent.

For global organizations, the legal complexity multiplies across jurisdictions. While U.S. litigation primarily centers on fair use doctrine, the focus in other jurisdictions has been markedly different. The EU AI Act, for instance, imposes specific transparency requirements mandating summaries of training data and respects rightsholder opt-out notices for text and data mining (as outlined in Article 4(3) of the Digital Single Market Directive). This creates a complex regulatory patchwork where training practices deemed potentially permissible in one region may constitute clear infringement in another.

If a model’s training is ultimately found to be infringing, the model itself could be deemed problematic, creating significant downstream copyright liability risks for any organization using that model or its outputs—regardless of whether the company had any involvement in the initial training of the model.

Fine-tuning Risks with Proprietary or Licensed Content

Organizations often fine-tune pre-trained models using more specific datasets—which may include internal proprietary data and copyrighted materials. For third-party copyrighted content, proceeding with fine-tuning requires careful rights assessment to determine whether the material is properly licensed, and whether that license explicitly permits use for AI model training or fine- tuning. This verification is critical, as standard content licenses typically don’t cover AI training use cases, creating significant infringement risks when organizations assume existing licenses extend to these novel applications.

Prompting Issues with Protected Works

Employees frequently input existing text or upload images into AI tools for summarization, analysis, or translation. The simple act of copying and pasting substantial portions of a copyrighted work into an AI tool may constitute an act of infringement as it creates and stores one or many unauthorized copies.

RAG (Retrieval-Augmented Generation) Implications

RAG systems enhance AI responses by allowing the model to access and retrieve information from designated external knowledge bases to generate an answer. Implementing RAG requires ensuring the organization has the rights for the AI to access, copy, process, and use the content within those designated repositories.

If the RAG system pulls data from internal sources containing unlicensed third-party materials, or from licensed databases where the license doesn’t cover AI-driven retrieval and generation, that act and resulting AI outputs can constitute infringement and violate license terms.

The outputs themselves might be substantially similar to copyrighted works that were used in training, prompting, or retrieving an answer, potentially constituting infringement of the reproduction right or creation of an infringing derivative work.

Legal Developments to Monitor

The legal landscape around AI and copyright is rapidly evolving. A recent landmark ruling in Thomson Reuters v. Ross Intelligence (February 2025) rejected a fair use defense for AI training. The court determined that Ross’s use was commercial and non-transformative, as it aimed to develop a competing product serving the same purpose as Westlaw using Westlaw’s copyrighted headnotes to train its model. Furthermore, the court emphasized that Ross’s product could adversely impact a potential derivative market for Thomson Reuters to license its headnotes as AI training data.

Although this decision is currently under appeal, this ruling is one of the first substantive judicial examinations of fair use in the context of AI training. It indicates that courts may scrutinize the use of copyrighted materials in AI development, especially when the resulting product competes directly with the original work or interferes with a licensing market.

Knowledge managers should also stay informed about the U.S. Copyright Office’s ongoing guidance. The Copyright Office has consistently reinforced that copyright protection only extends to works created by human authors, and AI itself cannot be an “author” under U.S. copyright law. This raises key questions about copyright in AI-assisted works and the level of human creative input required for protection. This is especially important for organizations that seek to protect, commercialize, or restrict the use of outputs generated with the assistance of AI, including publishing houses, media companies, financial services firms, research organizations, consulting firms, and any enterprise that produces proprietary reports, analyses, newsletter, or other protected informational content.