AI-Ready Content: Why the Debate Between Books and Articles Misses the Real Issue
What important benefits are we trading off to increase volume and velocity?
As organizations race to become AI-ready, many are rethinking how they create and manage content. Documentation teams are investing in knowledge bases, article-centric publishing models, retrieval-augmented generation (RAG), and conversational AI experiences. In many cases, the assumption seems to be that moving from traditional book structures to standalone articles is a necessary step toward supporting AI.
There is some truth to that idea. AI systems generally work better with modular, self-contained content than with large monolithic documents. Search engines, chatbots, and generative AI tools all benefit from content that can be retrieved, ranked, and assembled in smaller units.
But AI-readiness is about much more than content granularity.
AI systems depend on structure, relationships, metadata, consistency, and trustworthiness. They need to know not only what a piece of content says, but also how it relates to other content, what product or audience it applies to, whether it is current, and how authoritative it is. In other words, AI depends on information architecture just as much as it depends on individual articles.
This is why I believe many organizations are focusing on the wrong distinction. The real question is not whether content should be organized as books or articles. The real question is whether we are confusing content architecture with content delivery.
Many large organizations appear to be moving away from traditional book-based content structures and toward article-based authoring and delivery models. The reasons are understandable. Articles feel faster. They are easier to publish independently. They fit naturally into search-driven experiences. They seem more compatible with continuous delivery, agile product teams, and the expectation that documentation should evolve at the same pace as the product.
But I wonder whether, in some cases, organizations are conflating two very different concerns: how content is planned, organized, and developed, and how content is delivered to users.
A knowledge base is a delivery model. A book, or more specifically a top-down content architecture made up of maps, topics, metadata, and relationships, is an authoring and governance model. When we replace one with the other too quickly, we may gain publishing velocity while losing structural intelligence.
And structural intelligence is precisely what both humans and AI systems need to make sense of content at scale.
The shift from books to articles
Traditional technical documentation uses a book-like structure. Whether the output is a PDF, a web help system, or a documentation portal, the underlying architecture is essentially hierarchical. A product, module, or audience has a defined information set. That set is planned as a whole, organized into chapters and sections, and maintained through maps and reusable topics.
... the most important feature of a book is its front and back cover.
By contrast, we refer to knowledge bases as bottom-up architectures. Authors create individual articles that answer specific questions, describe specific features, or address specific tasks. Each article stands on its own. The corpus grows incrementally as new needs arise.
There is a lot to like about this model. It supports rapid publishing. It reduces the overhead of maintaining large deliverables. It aligns well with support workflows, where content often begins as an answer to a specific customer issue. It can also make content easier to find through search, especially when users do not want to browse a manual or understand a product hierarchy before getting an answer.
This approach is closely associated with the ideas in Mark Baker’s influential book Every Page is Page One1 . Baker argues that users increasingly arrive at content through search rather than through a table of contents. As a result, each page should be written as a self-contained unit that can satisfy a user’s need without requiring them to read preceding chapters. The principle has had a profound impact on modern documentation because it aligns so well with how people consume information online.
However, Every Page is Page One is often interpreted as an argument against hierarchy or structured content architectures. I do not believe that was Baker’s intent. The book is primarily about designing content for how users find and consume information. It says much less about how organizations should plan, govern, and maintain large bodies of content over time. This distinction is important.
The most important feature of a book
I often say that the most important feature of a book is its front and back cover.
By that I mean the cover creates a boundary. Everything between the front cover and the back cover represents the total sum of knowledge about that subject, product, release, or user journey at the point of publishing. The book may be imperfect. It may become outdated. But it has an important architectural property: it makes an implicit claim of completeness.
A knowledge base cannot make that same claim. A knowledge base has no natural concept of completeness. It can always grow. Content can be continuously added, revised, merged, or removed. That flexibility is powerful, but it also means the corpus can become a collection of useful fragments rather than a coherent representation of what users need to know.
This matters because users do not only need answers. Sometimes they need orientation. They need to understand scope, sequence, relationships, dependencies, and boundaries. They need to know not just “how do I do this?” but “what do I need to understand before I start?” and “what else belongs to this subject?”
AI systems face a similar challenge. A collection of disconnected articles may provide answers, but without an underlying model of completeness and relationships, AI has a harder time determining what information is missing, what information is authoritative, and how different pieces of knowledge fit together.
A well-structured book can provide that context. A knowledge base can provide it too, but only if the architecture behind the articles has been deliberately designed.
Top-down architecture versus bottom-up accumulation
In a top-down architecture, content is planned from a broader conceptual model. We start with the product, audience, journey, or domain. We define the major information sets. We identify the relationships between them. We determine ownership, metadata requirements, reuse opportunities, and delivery needs before content creation begins.
This is also the foundation of our microcontent approach2. Microcontent is not simply about breaking information into smaller pieces. It is about designing a content architecture that allows those pieces to be reused, assembled, and delivered in multiple contexts. Achieving that outcome requires careful planning and research upfront. Teams must understand what content already exists, identify opportunities for reuse, and avoid creating duplicate content that will become a maintenance burden later.
Velocity improves at the article level, while complexity increases at the system level.
In a bottom-up architecture, content often emerges from immediate needs. A product team ships a feature, so an article is created. A support issue appears, so an article is added. A gap is discovered, so another article fills it.
Both approaches have value. The problem arises when bottom-up accumulation replaces top-down architecture entirely. Over time, the knowledge base may become faster to publish but harder to govern. Authors may produce more content, but the organization may have less confidence in the structure, completeness, consistency, and accuracy of the overall corpus.
Velocity improves at the article level, while complexity increases at the system level.
Ironically, a strong top-down architecture can actually make it easier to implement the principles of Every Page is Page One. When authors have a clear content model, defined topic types, and well-understood relationships between topics, they can create focused, self-contained pages without repeatedly recreating context. The architecture supports the page; it does not constrain it.
The same principle applies to AI. Well-structured architectures provide the relationships and context that AI systems need while still allowing content to be delivered as independent articles.
Perhaps most importantly, the investment in planning pays dividends quickly. The first edition of a content set may require additional effort to model, structure, and identify reusable assets. However, once the content moves into subsequent releases, maintenance becomes dramatically more efficient. Updates can be made once and reused everywhere. New deliverables can be assembled from existing components. Governance becomes easier because ownership and relationships were established from the beginning.
Metadata challenges
One of the overlooked advantages of top-down architecture is the ability to apply metadata at structural levels and have it inherited by the topics beneath them.
In a book or map-based structure, metadata can be assigned to a product, release, audience, platform, region, lifecycle state, or information type. Topics within that structure can inherit some of those properties automatically. This reduces manual effort and improves consistency.
In a flat or loosely connected article-based model, authors may need to apply metadata article by article. That can work at small scale. But at enterprise scale, it becomes difficult to manage. The more metadata dimensions an organization needs, the more expensive and error-prone manual tagging becomes.
This is not a minor operational issue. Metadata increasingly drives delivery. It affects search, filtering, personalization, content reuse, governance, analytics, automation, and AI-readiness. If metadata is inconsistently applied, the user experience suffers, and the organization loses trust in its content systems.
For AI applications, metadata is often the difference between a useful answer and a misleading one. Metadata helps retrieval systems determine relevance. It helps AI distinguish between versions, products, audiences, and lifecycle states. Without reliable metadata, AI may retrieve content that is technically correct but contextually wrong.
Top-down architecture gives us a way to manage metadata as part of the structure, not just as an afterthought applied to individual articles.
Articles tend to expand beyond one idea
Another risk of article-based content is that articles often begin with a single purpose but gradually expand to include more context.
That expansion is usually well-intentioned. Authors want the article to be useful. They do not want users to feel stranded. They add background, prerequisites, warnings, related concepts, troubleshooting notes, exceptions, and links to adjacent procedures. But the more context an article includes, the more likely it is to overlap with other articles. Overlap can become redundancy. Redundancy can become contradiction.
One article explains a concept one way. Another explains it slightly differently. A third includes an older version of the same information because it was written for a specific support scenario. Individually, each article may seem useful. Collectively, the corpus becomes harder to trust.
This is where topic-based architecture and microcontent design can help. In a well-designed content model, each topic has a clear purpose and each content component has a defined role. Concepts explain. Tasks instruct. Reference topics provide structured facts. Reusable microcontent delivers common information wherever it is needed. Context is not eliminated, but it is managed through relationships and reuse rather than repeated everywhere.
The goal is not to force users into a rigid book experience. The goal is to prevent every article from becoming a miniature book.
In fact, one of the challenges organizations face when adopting an article-centric model is that authors often compensate for the lack of visible structure by embedding more and more context into individual pages. The result can be the opposite of the Every Page is Page One ideal. Instead of concise, focused pages, organizations end up with sprawling articles that attempt to answer every possible question in one place.
From an AI perspective, redundancy and contradiction are particularly problematic. AI systems cannot reliably distinguish between competing versions of the truth unless the content architecture provides clear signals about authority, ownership, and currency.
Delivery should not dictate authoring architecture
The strongest argument for article-based knowledge bases is often user experience. Users search. Users scan. Users want direct answers. They do not want to download a 400-page PDF or navigate a table of contents before they can complete a task. But that does not mean the underlying content must be authored and governed as isolated articles.
We can deliver content as articles while still planning and managing it through top-down architecture. We can preserve maps, relationships, metadata inheritance, reuse, and completeness models behind the scenes while presenting users with focused, searchable, web-native pages.
In other words, the delivery experience can be bottom-up without the authoring architecture being bottom-up.
Books gave us boundaries. Maps gave us relationships. Topics gave us modularity. Metadata inheritance gave us governance at scale. Microcontent gave us reuse.
This distinction is critical. A user-facing knowledge base does not require a flat authoring model. Search-driven delivery does not require abandoning structured content. Continuous publishing does not require giving up on completeness, governance, or information architecture.
Organizations should be asking: What structure do authors need in order to create, maintain, govern, and scale high-quality content? Separately, they should ask: What experience do users need in order to find, understand, and apply that content? Those answers may not be the same.
The cost of mistaking speed for scalability
Publishing velocity is important. But speed is not the same as scalability.
A team can publish articles quickly and still create long-term maintenance debt. They can reduce the effort required to create one article while increasing the effort required to manage thousands. They can make individual pages easier to ship while making the entire knowledge base harder to audit, reuse, personalize, translate, or automate.
At small scale, a bottom-up article model may feel liberating. At enterprise scale, it can become chaotic unless it is supported by strong architecture, metadata governance, content modeling, and lifecycle management.
The question is not whether articles are good or books are bad. The question is whether we are preserving the architectural capabilities that made book-based structures valuable in the first place.
Books gave us boundaries. Maps gave us relationships. Topics gave us modularity. Metadata inheritance gave us governance at scale. Microcontent gave us reuse. These are not obsolete ideas simply because users prefer web pages over manuals.
Increasingly, they are also the capabilities that make content usable by AI systems.
Toward a hybrid model
The future is probably not a return to traditional books as the primary user experience. Nor is it likely to be a completely flat knowledge base made up of disconnected articles.
The better model is hybrid. Plan top-down. Author modularly. Reuse aggressively. Govern structurally. Deliver flexibly.
That means investing upfront in content architecture, research, and content modeling. It means identifying reusable assets before creating new content. It means defining metadata, ownership, relationships, and delivery requirements early in the process. And it means delivering content in the formats and experiences that best serve users: searchable articles, guided journeys, contextual help, support answers, PDFs, embedded assistance, or AI-mediated responses.
The underlying architecture should be strong enough to support all of these delivery channels without forcing authors to duplicate effort or users to navigate structures that do not match their needs.
A hybrid model also reconciles the strengths of traditional documentation architecture with the insights from Every Page is Page One. Users can arrive at any page through search and find a complete answer, while authors benefit from the governance, reuse, metadata inheritance, and completeness models that come from a structured content architecture.
Most importantly, a hybrid model recognizes that AI-readiness is not achieved simply by breaking content into articles. AI-readiness comes from combining modular content with strong architecture, reliable metadata, clear relationships, deliberate reuse, and governance at scale.
The real issue
For content leaders, the challenge is to look beyond the visible output to see the real issue. The real issue is that we’re conflating the shape of publishing with the shape of authoring.
When we see a knowledge base, we are seeing a delivery surface. We are not necessarily seeing the architecture underneath it. The real question is not whether content appears as books or articles. The real question is whether the organization has a coherent model for planning, governing, and maintaining knowledge over time.
Article-based delivery can be excellent. But article-based authoring without architecture can become expensive, inconsistent, and difficult to scale.
The organizations that succeed in the age of AI will not be the ones that publish the fastest.
Before abandoning book structures entirely, we should ask what we might be losing: completeness, inheritance, relationships, governance, reuse, and a shared understanding of the whole. The front and back cover may no longer be the user experience. But the idea behind them still matters.
Users need answers. Organizations need velocity. AI systems need structure. And content teams need architecture that protects quality, consistency, and trust as the corpus grows.
The organizations that succeed in the age of AI will not be the ones that publish the fastest. They will be the ones that invest in the planning, research, and architectural discipline necessary to create reusable content assets that become easier—not harder—to maintain over time.
The goal should not be to choose between books and articles. The goal should be to separate how we structure knowledge from how we deliver it.
References
Baker, Mark. Every Page Is Page One: Topic-Based Writing for Technical Communication and the Web. XML Press, 2013.
Hanna, Rob. “Microcontent Architectures for DITA Deployments.” Webinar. BrightTALK. Accessed June 17, 2026. https://www.brighttalk.com/webcast/9273/366632






I think this will tie in nicely with my post on Monday where I’m going to make the distinction between data and information and how they connects to the ways we make meaning in tech comm … and with AI.