A Practical Guide to Automating the Digital Supply Chain with the Digital Object Identifier (DOI)

by

David Sidman and Tom Davidson

Content Directions, Inc.

 

© 2001 Content Directions, Inc. All rights reserved.

First published in Publishing Research Quarterly, Vol. 17, No. 2 (Summer 2001)

 

As far back as 1994-95, when the advent of the Web led publishers to recognize that at some point they would have to migrate significant amounts of their print content to the online world, two concerns were paramount: the need to protect copyright in the digital environment, and more generally, the need to facilitate interoperability among the many different systems and vendors involved in the digital distribution of content. Not coincidentally, the first publishers to recognize this were the publishers of peer-reviewed scientific journals, because their customer community was the same community from which the Internet itself, and later the Web, sprang: the university and corporate research community.

Now in 2001, a much wider spectrum of publishers—as well as their counterparts in other content industries, most notably music—has also recognized that even for more consumer-oriented content (and certainly for professional content, educational content, etc.), these two goals of copyright protection and interoperability are the keys to servicing a market which is increasingly demanding digital delivery. If anything, the latter goal of interoperability (which really subsumes the first anyway, as simply one particular application) has become even more central. It is not uncommon, for example, to hear music industry executives, RIAA spokespeople, and others responding to the Napster crisis say publicly that if they had it to do over again, they would put far less emphasis on iron-clad security and far more emphasis on interoperability between players in the distribution chain—because only via such interoperability can a seamless, friction-free end user experience be guaranteed. In other words, only if the end user can find, purchase, and use content as easily and conveniently as is the case today with Napster, will there be "legitimate" alternatives to Napster-like distribution in which the stakeholders in intellectual property get paid.

The Digital Object Identifier, or DOI, goes a long way to addressing these needs. The DOI can be thought of as something of a supercharged bar code for content on the Internet. As a unique identifier for digital content, it enables the automation of the supply chain, the integration of distribution systems from many technology vendors, and the tracking of content use and distribution outside of traditional channels. As a persistent, actionable hyperlink (think permanent, context-sensitive URL) from the content back to the copyright owner that can travel with the file, the DOI enables sophisticated, transparent, and user-friendly DRM solutions, as well as many other applications.

Best of all, it’s not vaporware! Developed by the principal architect of the Internet itself, Dr. Robert Kahn of the non-profit Corporation for National Research Initiatives (CNRI), the underlying technology of the DOI system is robust, scalable (to the quadrillions of objects), and Internet standards-tracked. The DOI has been fully implemented in the scientific journals sector by more than 70 of the top publishers, who have tagged over 3 million articles and use the DOI system to enable interpublisher cross-linking, and has been recommended by the Association of American Publishers (AAP) as the identifier of choice for ebooks.

This article begins by introducing the DOI and its two major functions (as an identifier and as a hyperlink), and continues with a detailed, practical discussion of the process by which publishers can integrate DOI tagging into their production process. It concludes with a discussion of the prospects for the standard industry-wide.

Introducing the DOI: Unique ID and Supercharged URL

A Permanent, Globally Unique Identifier for Digital Content

While it may seem that the bar codes present on consumer goods in the supermarket are there to speed customers through the checkout line, they actually owe their existence to the power of unique product identifiers (in this case the universal product code, or UPC) to produce efficiencies throughout the supply chain. Once supermarkets and their suppliers adopted a unique identifier, all kinds of transactions, from inventory control to ordering to distribution to transportation to real-time financial reporting, could be automated efficiently and accurately (see Figure 1). The savings in labor costs realized at the checkout were sufficient to justify the costs of installing scanners and computer inventory systems in the first place, but were soon far exceeded by the savings from automation once a critical mass of industry players had adopted the new scheme. The UPC, the ISBN for books, and the CUSIP number for securities all exist so that diverse computer systems can communicate effectively about the item in question.

Figure 1 — The Universal Product Code (UPC) is used as a unique product identifier by all parties throughout the supply chain. In addition to automating transactions involving the item itself—sale, inventory tracking, just-in-time replenishment, physical distribution, etc.—the UPC also unifies all the back-office functions such as sales tracking, financial reporting, billing and settlement across all the players in the supply chain.

Online content has no physical inventory, transportation, or physical logistics, but there is a fully analogous, if not more complex, chain of transactions required to facilitate its sale, distribution, syndication, copyright protection, and re-use. These transactions will all be managed by diverse systems that will need to interoperate. Currently, every publisher, and every online bookseller uses some kind of identifier to reference their products internally, and every DRM software package, content management or hosting system, and e-commerce system is shipped with a blank field called "identifier." Pairs of players must work out bilateral agreements every time they wish to have their systems communicate, or else must work up pseudo-standards based on existing identification schemes–but the true efficiencies and advantages of digital distribution will not be realized until these identifiers are either synchronized or otherwise made universally interoperable. The DOI is just the ticket: a shared, globally unique identifier that enables these systems to talk to each other and to end-users successfully, reliably, and cost-effectively.

Since product identifiers are so integral to so many business processes, replacing an existing scheme outright is not always an attractive prospect. It is important to note here, then, that the DOI needn’t be a replacement for other identification systems, but can be implemented as an ‘upgrade’ for them. To understand this, we need to examine the structure of the DOI itself.

A typical DOI might read:

10.1065/abc123defg

In this example, the "10.1045" is the unique publisher prefix (assigned by the International DOI Foundation, or in the future, by Registration Agencies who will be chartered by the IDF; see below) and "abc123defg" is the item identifier assigned by the publisher. The format of that second part of the DOI is wide open, allowing publishers to incorporate legacy identification schemes, thereby avoiding the need to re-engineer existing systems or commercial relationships that may depend on them. For instance, a publisher could continue to use ISBNs to identify printed books internally and to physical distributors, and could construct DOIs that incorporated the work’s ISBN. DOIs for saleable component parts of the work, or for different formats of the work, could also be constructed in such a way that the legacy identifier was derivable from the DOI.

Perhaps the most immediate and exciting benefit to a publisher of adopting the DOI, though, is that by making the DOI available to end-users, the publisher can effortlessly turn any pre-existing identifier into an persistent, actionable identifier with an efficient, scalable, Internet-based resolution and routing system behind it, as we’ll see in the next section.

DOI: The Actionable Identifier

A DOI is more than just a flexible, globally unique ID. Like a telephone number, a DOI is a number you can do something with. In much the same way that the Internet’s DNS system looks up network addresses from domain names, a network service called the Handle System resolves DOIs to their current network location.

So what’s the DOI’s advantage over a standard URL? In a word, persistence. In 1997, Brewster Kahle estimated that the half-life of a typical URL is 44 days. When a publisher moves digital content from one server to another, renames a file or directory, changes content hosting providers, or even sells the content to another publisher, the URL for that content is likely to change. Every time this happens, all the URLs pointing to that content break. ‘Redirect’ pages address the problem, but they are a kludgy, stopgap solution to a truly fundamental problem: users and publishers care about "what" an item is; URLs care about "where" an item is.

The DOI provides a crucial layer of indirection between the identifier and the content it identifies (see Figure 2). When a piece of content is first published, its publisher registers its identity and current network location in the DOI system. Incoming requests for the DOI are resolved to the appropriate URL. When the content moves, the publisher of that content simply updates its DOI record with the new URL. Users resolving the same DOI are then correctly routed to the new location. Throughout the content’s lifetime, the publisher maintains dynamic control of all inbound links to its content. (In many current implementations, the DOI doesn’t resolve directly to the content, but rather to a Web page on a publishers’ site that provides the user with information about the content identified, and the option to view or purchase it.)

Figure 2

a) URLs point to the location of the content…

b) …but the URLs "break" if the content gets moved to another location, or if the publisher divests that product line, or if the publisher is bought by another.

c) Using the DOI, the global distributed DOI Directory simply redirects the DOI (which never changes) to the current location today (even if the location changes over time). The publisher need only make one update, in one place, at one time, to ensure that all links in existence still point to the correct location.

 

In keeping with the emphasis on the "what" and not the "where" of content, the Handle System itself is capable of doing more than a simple direct mapping of a DOI to a URL. Any number of URLs and other pieces of data can be included in a handle record, and client software will soon be able to parse these records and perform such neat tricks as auto-selecting the fastest server for downloads, or offering the user a choice of available formats.

How Does a Publisher Implement the DOI?

DOI Policy and Infrastructure

In order to register DOIs into the global DOI Directory, a publisher must first obtain a prefix from the International DOI Foundation (IDF). This prefix currently costs $1,000 as a one-time fee, and it entitles the publisher to register any number of DOIs associated with that prefix, with each DOI incurring an additional yearly maintenance fee (just as one pays an annual fee to maintain an Internet domain name).

In the future, registration agencies approved, certified and chartered by the IDF will issue publisher prefixes and accept DOI registrations. These agencies will establish their own pricing for these services, will help "market" the DOI to publishers and other types of content providers, and will offer various value-added products or services. Some of these services may be oriented toward publishers, such as consulting, training, and implementation services to help publishers get up and running with DOIs. Other services may be oriented more towards assisting other parties in the digital distribution chain (online bookstores, distributors, syndicators, DRM vendors, content management system vendors, etc.) derive additional benefits from the DOI, such as reduced costs, increased interoperability, and a more seamless production and distribution chain for online content.

Today, the registration agency role is performed by CNRI on behalf of the International DOI Foundation. (Complete instructions for obtaining a publisher prefix and undertaking DOI registrations can be found on the IDF Website at www.doi.org). However, both of these institutions are non-profit, and are not structured or funded to provide robust, commercial-grade registration services. For this reason, the IDF is seeking to establish commercial registration agencies to fulfill this role. Typically these agencies will be organized from within particular content industry sectors, where many of the market participants share common goals in their use the DOI. For example, the sole registration agency certified to date by the IDF is CrossRef, a self-organized, non-profit consortium of scientific journal publishers who use the DOI for reference-linking–i.e., using the DOI to point in a robust, persistent way from the footnotes of a journal article to the other referenced articles on the other publishers’ servers.

A publisher (or any other type of company) may also elect to join the IDF as a member. Although it is not necessary to join the IDF in order to obtain a publisher prefix, register DOIs into the global directory, develop DOI-based applications, or use DOIs in any way, membership in the IDF gives a company a voice and a vote in how the DOI system evolves, how it is governed, and what its policies and requirements are. Membership also buys the opportunity to participate in IDF-sponsored projects such as the DOI for eBooks project (or DOI-EB), where many issues regarding real-world DOI implementation are often worked out through hands-on experimentation, commercial application prototyping, and consensus-building.

Publisher Process: Overview

The following figure gives an overview of the process a typical publisher might follow in order to implement the DOI. The publisher need not follow all of these steps; this simply represents the most complete approach a publisher could take if desired. Content Directions offers a structured methodology for DOI implementation based on this approach, and undertakes projects of all sizes on behalf of publishers. In some cases, however, publishers with relatively advanced and flexible technical infrastructures are ready to begin assigning DOIs to their publications after only a few phone conferences with key members of the production and IT teams.

 

Figure 3 — Content Directions, Inc.’s consulting methodology, giving a high-level view of the process by which a publisher begins issuing and using DOIs.

 

Publisher Process: The Details

In this section, we undertake a discussion of the practical steps that a publisher will typically follow in order to implement the DOI in their production process. While these steps can be undertaken very rapidly and cheaply, none of them can be skipped:

1 - Target Which Content Should Be Identified With DOIs

2 - Obtain Publisher Prefix(es)

3 - Choose a Numbering Scheme

4 - Source the Metadata Within the Production Process

5 - Assign DOIs Within the Production Process (If Implementing DRM, This Includes Assigning Rights And Wrapping Content)

6 - Register DOIs & Metadata

7 - Maintain DOIs & Metadata (Ongoing)

8 - Integrate and Benefit From DOI-Based Applications (Ongoing)

Many of these steps involve editorial and marketing decisions as well as production and IT decisions. Also, the full benefits of DOI accrue throughout the entire content life cycle, from content creation through editorial development through production through publication through distribution through sales through financial reporting through royalty payment. Therefore the last step requires publishers to work closely with all their distribution-chain partners–especially those involved with DRM–in order to fully reap all of these benefits.

A more detailed discussion of these eight steps follows:

1) Target which content should be identified with a DOI. The key decisions for the publisher in this stage include:

The key business decision here is really to determine what kind of content will pay back the earliest if it is migrated to the online medium. This requires editorial and marketing judgments such as:

This also requires an assessment of who the publisher’s customers are, because the publisher needs to know how the content will impact them, what they’ll buy, and how they’ll buy it. Customers have different needs (and will pay different rates) depending on whether they are:

Also, the publisher must consider what role or market the publisher’s customers represent. This impacts what the customers will want to do with the content. This can vary considerably depending on whether the role is that of:

Publishers also need to consider the different business models to be enabled via the DOI, e.g. whether it will be sold:

An additional set of considerations comes into play in making these decisions, such as the publisher’s true "state of readiness" in terms of its production systems, IT systems, and back-office systems for tracking sales, calculating royalties, and feeding the general ledger for P&L management. For example, it is important to consider whether or not the publisher:

If any of these do not hold true, the publisher must consider implementation costs, and whether the payback (both from use of the DOI and from other efficiencies introduced by the upgrade) is justifiable for this content. If all is in place, then the publisher will likely next want to consider questions such as how cheaply (or not) these facilities can be adapted for more granular publishing, and how easily and cheaply DOI tagging can be integrated into the production workflow.

2) Obtain Publisher Prefix(es). This step is explained elsewhere in this document, (see DOI Policy and Infrastructure, above), and on the IDF Website, but two practical considerations are worth adding here. One is that a publisher may purchase as many prefixes as desired. This could be a single prefix for the entire publishing house, or it could be separate prefixes for each imprint or product line, or however else the publisher wants to organize its DOI activity. The driving consideration here is really whether the company needs or wants to have a single DOI Administrator responsible for all its content, or multiple DOI Administrators in different parts of the company. The DOI Administrator is the Production, Marketing, or IT person responsible for registering DOIs into the global DOI Directory and keeping it up to date if the content is moved.

A word about prefix pricing: although the IDF has from time to time considered variances in the $1,000 price tag for a publisher prefix, e.g. to offer volume discounts if a publisher wants to buy many of these, this one-time cost is actually minor next to the ongoing costs associated with meeting the obligations of a prefix-holder (i.e., keeping a server up & running to respond to DOI requests for the publisher’s content over time, and faithfully maintaining the global DOI Directory in the event of any changes in the URLs to which these DOIs point.) Note, however, that this is precisely the kind of question that publishers can speak out on if they join the IDF as members.

3) Choose a numbering scheme. One of the most valuable features of the DOI is that it does not replace or compete with any existing numbering schemes, whether they represent industry standards such as the ISBN, EAN/UPC, CUSIP, etc., or whether they represent proprietary numbering schemes that the publisher might use internally in its existing systems. In fact, far from competing with these legacy identifiers, the DOI leverages them by turning them into universal, actionable identifiers–i.e., identifiers which now have an Internet-based resolution and routing system behind them, and that actually take the user somewhere on the Internet instead of simply existing as a number in the abstract. All the publisher needs to do in order to "supercharge" a legacy identifier is to incorporate it into the suffix portion of its DOIs, and register it into the global DOI Directory through an IDF-approved registration agency.

It is important to note, however, that the limitations of legacy identification systems for content identification in the digital world (e.g. today the ISBN refers to an entire book title, but not to each individual chapter, illustration, table, etc.) will still apply, and publishers will generally decide to issue a more "granular" DOI which further identifies components of the object identified by the legacy identifier.

Choosing a numbering scheme is therefore easy when the content corresponds directly to what is already identified by some legacy numbering scheme, whether standard or proprietary. It gets a little more difficult if there is no such legacy numbering scheme, or if the content is now of a new form which has no traditional identifier.

The saving grace of the DOI here is that ultimately it does not really matter what numbering scheme the publisher chooses, because once the DOI is registered the number becomes a "dumb number," whose only required characteristic is uniqueness within the global directory. Although the chosen numbering scheme could be used by the publisher for internal content management prior to publication, or possibly for archiving purposes after publication, the DOI as it exists in the outside world is not "interpretable" except as a string to be resolved by the global DOI resolution service. The reason the DOI was deliberately designed to be a "dumb number" is that if it were "intelligent" (e.g., if you could look at a DOI and decipher it to indicate what the content is or even who the publisher is–the way a telephone number gives some indication of its country or region), then this would inevitably clash with its permanence. Even the "publisher prefix" section of the DOI, which initially indicates which publisher registered the DOI, will eventually become obsolete as the original publisher is bought by another, or as the original title or product line is sold to another publisher, or even simply if the original title is someday reorganized within the original publisher’s product lines or imprints.

Thus publishers should give some thought to how they want to number their DOIs, but they needn’t worry that a scheme chosen today will become obsolete in the future: as long as all DOIs issued by a given publisher are unique, the system maintains its integrity.

4) Source the Metadata within the Production process. When the publisher registers DOIs, it also registers some metadata associated with the DOIs–not a large amount, but enough to allow trouble-shooting to occur during the registration process, to facilitate updates to the directory over time if the URLs change, and to enable others (individuals, vertical portals, retailers, etc.) to look up the DOI if they do not already know it.

In order to register this small amount of identifying metadata, the publisher must be able to extract it from its internal systems. Some publishers have a single, integrated database which houses the basic metadata for all its publications, but not all do–especially when it comes to electronic content, which is still often produced in separate "islands" scattered throughout the publishing house.

5) Assign DOIs within the Production process (if implementing DRM, this includes assigning rights and wrapping content). The publisher must decide at what point in the Production process it wants to assign DOIs to its content. Typically this must be late enough in the process that the publications are close to their final form and their URLs known, but it should not be after publication or else there will be a lag between the content being offered and the time that the rest of the world can access it via the DOI. If the publisher uses the DOI for internal tracking and for identification of components of works, assignment may take place significantly earlier, and provisional, non-registered DOIs may be used.

6) Register DOIs & Metadata. The step of registering the DOIs and their metadata with a Registration Agency (RA) is a significant undertaking. Not only must a regular production feed be established, but the RA must also offer a registration process that includes quality-assurance, error-detection/-reporting/ -correction, etc. The process also needs to be robust and fast, since eventually publishers will be registering large numbers of DOIs on a regular basis as content "rolls off the assembly line." The guidelines for approving commercial RAs that the IDF is now finalizing will be a tremendous help, since building and maintaining these processes requires a significant investment on the part of the RA.

7) Maintain DOIs & Metadata. This step consists of the publisher’s ongoing responsibility to maintain the global DOI Directory over time–e.g., as the URLs change because the publisher has reorganized its file structure, or moved the content to a new server, or transferred ownership of the content to a different publisher. Even if the publisher goes out of business or otherwise "orphans" the content, the Registration Agency will still ensure that end users reach closure in terms of any DOI request–e.g., returning a message to the effect that "The last publisher of record for [this title] was Acme Publishing, which is no longer maintaining this content in circulation."

8) Integrate or Benefit From DOI-based Applications. The DOI is fundamentally an "enabling technology;" in fact, it originated within the "Enabling Technologies Committee" of the AAP. Even by itself, it gives end users a permanent link to the content (or to the content owner’s Website)–i.e., a "URL which never breaks"–but the most powerful benefits of the DOI are only delivered through applications which make use of it–e.g., Digital Rights Management, E-Commerce, Sales Tracking, etc. Therefore the ultimate success of the DOI rests with technology vendors who will build DOI-based products and services.

Conclusion: Why Implement the DOI Now?

We have just stated that the true potential of the DOI will only be realized when DOI-based applications come to market, and that before such applications can be deployed, publishers will need to tag their content with DOIs. That this is not a classic ‘chicken-and-egg’ standards adoption problem can be shown by returning to our earlier discussion of the UPC, the Universal Product Code most commonly encountered as the number depicted in the barcodes on consumer products.

The similarity between the factors affecting adoption of the DOI and the UPC is uncanny. Just as the DOI is both a product identifier and a reliable, dynamic replacement for a static and unreliable linking system (the URL), so is the UPC both a product identifier and–in the form of the machine-readable barcode–a reliable, dynamic replacement for a static and unreliable pricing system (price stickers on products on the shelves.)

The greatest value held by both the DOI and the UPC lies in that first role: as a universal product identifier throughout the supply chain. But this value is governed by the ‘network effect’–while it is tremendous, it only materializes when a critical mass of players adopts the standard. To be successful, a standard must also offer early adopters a reasonable return on their investment, independent of the success of the overall initiative. In the case of the UPC, studies showed that the costs of printing barcodes on products and of installing barcode scanners in grocery stores could be recouped very quickly from savings in labor costs due to faster checkout. Due to these immediate and substantial benefits, adoption proceeded extremely quickly, and the vendors supporting ‘back office’ efficiencies–such as vendors of inventory-tracking and just-in-time ordering software using the UPC as the product key in their databases–piled on as predicted.

The DOI also promises to follow the same path to success by providing a substantial early, independent return on investment: both on the revenue side, by making it easier for publishers to get paid by the digital distribution channel, and on the cost side, by limiting the amount of investment necessary for publishers to begin publishing electronically in a more serious way.

The most immediate benefit comes in the form of the DOI’s significant advantages (including persistence, dynamic reassignability, and context-sensitive multiple resolution) over the URL as a hyperlink to digitally published content. Publishers can begin using and benefiting from DOI-based links in email promotions, on their own Websites, and anywhere else they would use a URL, on the very day they register their first DOIs. End-users don’t need to install any special software, or even to know that the links they are following are powered by the DOI.

The task of building the infrastructure to support the DOI has also been off-loaded from publishers. The Handle System, which is the core technology on which the DOI system is built, is robust, scalable, in operation today with millions of DOIs, and freely available to all users of the Internet. This is analogous to a third party having paid for the retrofitting of every grocery store in the world with state-of-the-art barcode scanners, and having promised to maintain and upgrade them as new functionality became available.

As has been detailed above, even the process of assigning new DOIs can be relatively painless for publishers whose internal production processes are already mostly electronic. For those that have yet to upgrade their internal systems, the DOI is simply one more good reason to do so. In fact, the DOI could well be the catalyst that leads many publishers to implement electronic workflows, content repositories, and other facilities for "Digital Asset Management." This is because the DOI greatly facilitates the internal management of content objects, by providing in effect an internal "SKU" (or Stock Keeping Unit, as physical stores refer to their IDs for managing items of physical inventory). This makes it much easier and cheaper for the publisher to track items of content throughout the entire Production life cycle. It also makes it easier and cheaper for the publisher to offer "recombinant" information products — i.e., different products or services tailored specifically to different audiences and different sales opportunities — thus yielding new, incremental revenue but over the same base of digital assets. Finally, the DOI’s ability to incorporate many legacy identification schemes can also do away with the strain and expense of retooling existing systems and distribution relationships that depend on them.

Once publishers begin assigning DOIs to their products in significant numbers, things really start to get interesting (and efficient). Third-party technology vendors will build DOI support into their products and services, and these solutions will be integrated into the entire online distribution chain, from publishers to distributors, aggregators, syndicators, e-tailers, and DRM vendors of all kinds. The DOI already has the endorsement and active support of many such vendors, of many publishers, and of major technology standards bodies, and is moving quickly towards commercial implementations beyond the already-successful CrossRef project that today links the world’s scientific journal literature.

No complex supply chain can go long without a standard identification scheme, and the DOI is well on its way to becoming the identifier of choice for the supply chain of digital content. In the meantime, a publisher’s decision to implement the DOI is already a wise and profitable one.

 

David Sidman (dsidman@contentdirections.com) is CEO of Content Directions, Inc., a DOI Registration Agency and consulting firm dedicated to promoting the adoption and implementation of the Digital Object Identifier (DOI) throughout all sectors of online publishing: text, music, video, etc. Prior to founding Content Directions in August 2000, David was Director of New Publishing Technologies at John Wiley & Sons, a leading global publisher. Tom Davidson was formerly Associate Director of Consulting and Product Development at Content Directions, Inc.

 

More information:

Content Directions, Inc. ("The DOI Experts"): www.contentdirections.com

The International DOI Foundation: www.doi.org

Association of American Publishers (AAP): www.publishers.org

AAP Press Release on Ebooks: www.publishers.org/home/press/ebookpr.htm

Corporation for National Research Initiatives (CNRI): www.cnri.reston.va.us

The Handle System: www.handle.net

The "CrossRef" implementation for scientific journal content: www.crossref.org

Contact:

Tina Aridas

Deputy Manager, Marketing & Press Relations

CONTENT DIRECTIONS, INC.

phone: (718) 965-8490

fax: (718) 768-6777

email: taridas@contentdirections.com

or

David Sidman

CEO

CONTENT DIRECTIONS, INC.

phone: (212) 792-1847

fax: (718) 768-6777

email: dsidman@contentdirections.com