THE ECONOMICS OF DIGITAL LIBRARIES

Robert M. Hayes

University of California, Los Angeles (rhayes@ucla.edu)

 

 

 

Context & Overview

Context

Before presenting the substance of this paper, I want to set the context and motivation for it. My examination of the "economics" of digital libraries arose from a concern about the extent to which some persons viewed them as replacing print forms of distribution and as replacing libraries as we know them by electronic delivery, especially online through the Internet. Those views have been expressed not only by the enthusiasts for electronic distribution as replacement for print but by academic administrators who hope that they will not need to continue erecting new library buildings and acquiring additional collections of dusty old books.

In my view, the two forms of distribution (print and electronic) are not substitutable for each other (what the economists call "fungible") but are complementary, serving substantially different functions and purposes. Indeed, instead of one substituting for the other, use of one increases the need for and use of the other.

Leaving that aside, though, it is also my view that the enthusiasts and the administrators have failed to recognize the economic facts in the costs involved in creation, distribution, and use of both media or in the decisions by publishers about how they will distribute and when they will change the means for distribution. Those facts are by no means easy to obtain or to rationalize, especially as they relate to the Internet which is growing at such a rate that data obtained at any given time cannot effectively be related to data obtained at other times. It is for this reason that what I present in this talk is exploratory, speculative, and largely descriptive.

Overview

My talk today will cover the following topics:

Definition, Sources, Economic Roles & Properties

The Micro-Economics of Digital Libraries

Capital Costs of Digital Libraries

Operating Costs in Distribution of Digital Libraries

Income from Digital Libraries to Producers

Value of Digital Libraries to Users

The Macro-Economics of Digital Libraries

Conclusion

Definition, Sources, and Economic Properties

Definition of "Digital Library"

The definition of the term "digital library" has become very diffuse. Some writers have included in their definition library technical processing functions (equivalent to selecting and cataloging); others have taken it more narrowly. Given that my objectives in this paper are focused on the means for distribution (rather than on the library staffing related issues), I have used the broadest definition:

A Public Digital Library is a collection of publications which is

distributed or made available in digital form (i.e., in which symbols

are recorded as bits and bytes – magnetic, electronic, or optical).

Unless otherwise qualified, "digital library" will be taken to mean

"public digital library".

More specifically, digital libraries are, using this definition, collections of publications that are distributed or made available either online (through the Internet or World Wide Web, for example), in optical formats, such as CD-ROM (compact disks, read-only-memory) and DVD (digital video disks), or in magnetic formats (such as ZIP-drive Bernoulli disks or even magnetic diskettes). Digital libraries are really just digital databases but differ from other databases in the respect that they are intended for distribution, that is for use by persons other than in the organization producing them.

Sources of Digital Libraries

Digital files used in production of current print publications

Retrospective conversion to digital form from prior print or

microform publications

Digital files with no parallel in prior print or microform

publications

It is relevant to note that today, throughout the world, a digital file is used at some stage in the production of virtually every form of print publication. Indeed, increasingly authors produce and submit their manuscripts in digital form. It was for this reason that the Association of American Publishers established guidelines for encoding of manuscripts; based on SGML (Standard Generalized Markup Language), they are now embodied in HTLM in the context of World Wide Web based publication. The implications are evident: Essentially everything now being published in print form can be included in digital libraries without the need for conversion to digital form. Of course, the conversion from one digital format to another is not a trivial task, but it can be programmed and, once that is done, carried out essentially automatically.

A second source of digital libraries is retrospective conversion. While that task certainly is huge, there are many parallel ad hoc projects underway for such conversion in specific subject areas and in specific institutions. Among the rationales for such conversion is the need for preservation of at least the information content of paper-based publications that literally are disintegrating in libraries throughout the world. But, whatever may be the reasons, the result is that today there are hundreds of CD-ROM digital libraries that include full-text and images each covering an entire corpus of historic publication – such as the classics of Greek philosophy, of American history, of genealogy, and so on. And virtually all of those corpi are available online, through the Internet and the World Wide Web.

A third source is digital files with no counterparts in print. Perhaps the most evident example is on the World Wide Web itself. Clearly it is a source for digital libraries in the sense of the definition being used in this paper. Although many of the Web pages include content derived from print publications, the great majority of them have no counterpart in print. For some kinds of publications (statistical databases for example), digital distribution is much to be preferred because of the ease of processing, and for such publications it is increasingly likely to replace print distribution. The U.S. Census for the year 2000, for example, may well be distributed primarily in electronic form, with print serving only archival purposes.

Economic Properties of Digital Libraries

Easily transportable, easily and cheaply shareable

Uncertain value and time affects that value

Differences in perceptions of value, in use, in ability to use, in assessment of costs, in ability to pay costs

Value increases as size grows

Expandable and self-generating

Value independent of scale of application -- indivisible in use with great economies of scale

Cost independent of scale of application

Difficult to appropriate or exclude access

Mixture of public good and private good

Need to balance rights of ownership and rights of use

Given that definition, digital libraries are evidently economic entities in the sense that associated with them are both costs and values and that people differ in their perception of the balance between the two. Beyond that almost self-evident fact, though, digital libraries, like other information resources, have more specific properties that directly affect decisions about them at both macro-economic and micro-economic levels. In a previously published review of the economics of information, those economic properties were discussed, as listed above. Among them, the following are of special relevance to this paper.

Cheaply Shareable. Digital libraries are easily and cheaply transportable and shareable. The first copy is likely to represent most of the costs, with costs for reproduction and distribution relatively minor. As a result, the number of copies that can be produced without serious depletion of physical resources is great.

Value Increases with Accumulation. The value of a digital library increases at more than a linear rate as it grows. This is perhaps one of the most distinctive and important features of a digital library as a resource. As it grows and when it is combined with other digital libraries, it may be transformed, new relationships developed, and new insights gained as a result of the inter-connections. As a result, indeed the value of an accumulation of digital libraries is far more than the total of the individual values.

Self-Generating. Digital libraries are expandable and self-generating. This is especially important because virtually an unlimited amount of intellectual goods can be created, and digital libraries have exponentially increased the ability to do so.

Costs Independent of Scale of Application. The cost of digital libraries is independent of the scale of application. Economists use the phrase "indivisible in use" to mean that, and digital libraries indeed are indivisible, so there are immense economies of scale. Putting this together with the value in accumulation provides strong incentives for large-scale users to acquire digital libraries. For the same reason, there is efficiency for shared rather than independent accumulation. As a result, joint consumption is likely because it is inefficient to exclude or withhold service from those who don't pay. This may well be the most significant contribution of the World Wide Web, as a digital library source.

THE MICRO-ECONOMICS OF DIGITAL LIBRARIES

The microeconomics of digital libraries will be discussed in terms of the costs to the producers (i.e., the costs incurred by them in producing digital libraries and related products and services) and the values to the producers in derived income. The data on which to base a quantitative analysis are only now beginning to become available, and they are still sparse, uncertain, and unreliable. For purposes of this paper, some generic data concerning costs in various kinds of information activities are used; they are in part from standard statistical sources (such as Statistical Abstract of the United States), in part from data available on the Internet, and in part from the already referenced review of the economics of information.

Digital libraries are produced by a range of publishing industries and distributed by a complex combination of distributors, retail outlets, traditional libraries, and online services of the Internet and the World Wide Web. Gross estimates of the total income for all types of sales in these industries in the United States can be made based on reported data, projecting them to 2000.

YEARLY SALES, PROJECTED TO 2000

FORMS OF PUBLICATION

Book Publishing

$15 Billion

Journal Publishing

$20 Billion

Database Publishing

$5 Billion

Software Publishing

$5 Billion

Multi-Media Publishing

?

MEANS FOR DISTRIBUTION

Book Distribution & Retail Outlets

$10 Billion

Academic/Research Libraries

$5 Billion

Public Libraries

$5 Billion

Online Services (Internet & WWW)

$1 Billion

The article on the economics of information provides estimates of the distribution of costs between capital investment and delivery expenses for most of these industries, and those estimates will serve as the starting point for analysis of those costs for digital libraries. Beyond that, there are some more recent data that give important insight about the current status of print and electronic publishing.

 

Print Serials

CD-ROMs

Printed Books

Electronic Journals

Full Text

1985

   

620581

   

1986

69000

 

491112

   

1987

68000

 

483177

   

1988

70000

 

372983

   

1989

108000

 

564750

   

1990

112000

 

459438

   

1991

116000

 

800000

110

 

1992

118500

2900

842000

133

 

1993

126000

3502

900000

240

 

1994

140000

5000

925000

443

4900

1995

165000

8000

950000

675

5500

1996

165000

9000

 

1689

 

1997

165000

11500

 

2459

 

1998

156000

13000

     

2000

218000

 

1050000

   

The Capital Costs Of Producing Digital Libraries

Digital libraries are simply one among the products within each of the industry groups listed above. The estimates in the article on economics of information were for overall costs for the entire range of products, and this paper will use them as the starting point for estimating the costs for digital libraries. For each type of digital library, there will be discussion of the typical size. This is necessary in dealing with the economics both of CD-ROM distribution, since it identifies the ratio of digital library size to storage capacity, and of online distribution, since it determines the necessary amount of online storage.

Digital Libraries of Books. Of all of the forms of publication, that of books seems most likely to continue in print form distribution, and sales of the printed book itself will be the primary expected source of income. That means that the digital libraries derived from current publications really represent marginal costs. The marketing problem is how to set the prices for the print and digital versions so as to maximize a combination of sales and profits; the underlying accounting problem is how to allocate the capital costs across the product lines.

The article on "Economics of information" shows costs as percentages of list price for printed books. Of them, the operating expenses and composition are treated as capital investments; all others, including royalty payments to the author, as costs for delivery, since they are all related to the number of copies sold. In passing, it must be said that royalties to authors, except for the few who are really successful, are not a great source of income. Authors of scholarly articles, in fact, may derive no royalty income and in many cases must pay "page charges" for their articles; for such authors, the rewards come in academic recognition and advancement, not in direct monetary return.

In principle, capital costs – for creation of content and of a production master – should be the same whatever the means for distribution. The differences lie in the costs for production and distribution of products and services. Today, a typical CD-ROM (text, game, or program) will cost about $4 for pressing and shipping individual units, in contrast to about 17% of list (say $9 for $50 list price) for a printed book, and the data storage capacity is orders of magnitude greater. It is this, of course, that makes CD-ROM publication and distribution so attractive, to the point that current projections are for growth rates on the order of 50% per annum to continue.

Using CD-ROM publication of a book as an example, post-composition production costs will be reduced, perhaps to half, but all other costs should remain essentially the same. The crucial fact about the digital library product for the same book is that the production costs other than composition are virtually zero and even that for composition will be negligible. A copy of a typical 300 page book in CD-ROM form (assuming it were bundled with a large number, such as 100, of other books) would cost about $0.01 instead of the nearly $9.00 for paper, printing and binding; an online copy would cost perhaps $30 per year for storage, from which unlimited numbers of copies could be distributed.

The size, in megabytes, of a typical digital library of current books is almost totally a marketing decision. If the size of a single book is, say, 300KB, a CD-ROM with 600MB capacity in principle could store 2000 books as ASCII text; in reality, it is likely to be as few as 1000 books, given inefficiencies in coding of the text and a variety of other structural elements. The facts are that typical CD-ROM packages of books will be much smaller than the maximum and are likely to contain not 1000 books, but 100 or even 10. Examples at immediate hand include a library of Greek Philosophers and a collection of American Classics, both with 80 titles, and a variety of more limited scope collections, with 20 titles each. In virtually all cases, the amount of the total capacity that actually is used will be at most 10% -- 60 MB, not 600MB – and frequently about 25MB. The point, of course, is that the market is willing to pay only a limited amount for a CD-ROM, so it is necessary to increase the number of CD-ROMs, not the content of each. Given those economic facts, the size of 25MB turns out to be an appropriate and convenient unit for measuring digital libraries, and it will be used in subsequent analyses.

Of course, if the book requires use of images in addition to or rather than text, ASCII or coded, the relevant parameters are very different. A typical page with acceptable resolution will require not 1K ASCII characters as text but 1,000K bytes to 10,000K bytes (which may be compressed to 100K bytes to 1,000K bytes for storage) as an image. Thus, a typical 300 page book instead of requiring 300K ASCII characters will require between 30MB and 300MB, and the capacity of a CD-ROM suddenly becomes not 1000 books but 20 books or even 2 books. As the unit for measuring a digital library, 25MB then effectively may be about one book.

Digital Libraries of Popular Journals. It is likely that magazines and newspapers will, like books, continue to be published in print form, with the digital library formats used as add-on products combined with related services. Indeed, already the specialized trade publications and many newspapers have already moved in this direction. A key point about magazines and newspapers, in contrast to books, is the role of advertising as a source of income. For some like the "controlled" circulation trade magazines, in fact, advertising is the dominant source of revenue, since many of them are distributed free to the readers. In any event, the analysis with respect to costs as applied to books, as outlined above, is likely to apply to magazines and newspapers, with perhaps minor changes.

With respect to the size of the related digital libraries, the nature of popular journals is likely to require the use of images in addition to text, so the parameters that characterize such storage must be used. For a typical popular journal (such as Time, Newsweek, or People) that is published weekly, a year of publication will be about 5000 pages and will require about 500MB for image storage, or one CD-ROM.

Digital Libraries of Scholarly Journals. For scholarly journals, however, the picture is likely to be quite different. The nature of scholarly journals and, especially, of their use is that the important thing is not the issues of the journal but the articles. The journal issue is simply a means for packaging the articles for efficient distribution. The online digital library environment provides an ideal means for making the individual articles readily available. In addition, it permits the publisher to avoid all of the up-front costs incurred in printing and mailing multiple copies of each issue of a journal. It therefore seems very likely that online distribution will become the preferred mode for virtually every scholarly journal. Furthermore, the CD-ROM or DVD now becomes an excellent means for packaging a large number of articles – the equivalent of one or more years of the journal – for archival purposes and for continuing important but comparatively low levels of use that characterize the historic journal volumes.

It must be said that the several experiments, jointly among publishers of scholarly journals and academic libraries, such as the agreements with Elsevier, all require that the participating libraries continue to acquire print copies at their current number of subscriptions, for a time period such as five years. The transition to access purely from digital libraries is therefore certainly not immediate. However, within the coming decade, there is likely to be an almost complete transfer of scholarly journals to digital library formats (online for current articles and CD-ROM or DVD for retrospective).

Again, the article "Economics of information" provides detailed estimates of functional costs for scholarly journals. Among them, the composition and graphics costs are substantially greater than for books, reflecting the more complex nature of scholarly journal publication. The costs for editorial work, composition, and general administration will be taken as capital costs; all other costs, as related to delivery.

ARL statistics show that the cost of a purchased journal in major academic research libraries averages about $300 per annum, so that will be taken as an average cost for a current scholarly journal. Of course, the distribution of subscription prices varies widely, with many commercial journals costing thousands of dollars. By conversion to digital libraries, all of the manufacturing costs for these scholarly journals, except for composition, will be reduced to literally pennies. And the distribution costs would be reduced even more dramatically. In particular, online distribution would incur cost for storage at perhaps $100 per annum for a year’s collection, and the delivery of an article would require mere seconds.

Turning to the size of the digital library associated with a scholarly journal, as before the relative role of texts and images determines the storage requirements. In general, though, it appears that scholarly journals will require image storage, so a year of a typical journal will require about 1500 pages and storage therefore of 150MB.

Digital Libraries of Retrospective Books & Journals. The major cost in creating digital libraries of retrospective books clearly is in conversion of the originals into digitized form, either as images or as text. The functions involved include identifying and selecting materials to be converted, maintaining appropriate catalog data about them, preparing the materials for conversion, scanning them (either for creating images or for optical character recognition), and quality control. None of these tasks is by any means simple or cheap, and the total cost for them is likely to be on the order of $100 for an average 300 page item. The storage requirements clearly are for the page images, so the typical 300 page book will require 30MB.

Digital Libraries of Databases. Database services, providing access to digitized text, numerical data files, images, reference and bibliographic databases are another widespread means for electronic publication. Again, the capital costs should be the same as for print. Production for distribution, though, is replaced by access and demand retrieval. The costs incurred by the database service are in storage, computer processing for maintenance, access, and retrieval, and telecom. Beyond those costs, though, may be costs incurred by digital libraries intermediaries – reference librarians, brokers, or digital libraries entrepreneurs; in a sense, those are counterparts of retail outlets for books. The royalties represent the payment to the database producer to cover the capital investment; the other costs, including that for the intermediary, are for delivery.

Among the databases are the online catalogs of major academic/research libraries. And these illustrate the complexities in assessing the economics of digital libraries. The point is that each of these catalogs is created and maintained primarily for the needs of the clientele served by the given library. To support those needs, it must be mounted on a server and must be accessible through the Internet or, at the least, through the institutional server itself (whose connection to the Internet is virtually a marginal cost, given the array of uses already requiring it). The result is that the availability of these catalogs as digital libraries on the Internet is essentially at minimal, virtually zero marginal costs.

Parenthetically, the capital costs represented by those library catalogs are huge. They reflect decades of investment in establishing OCLC and RLIN as shared cataloging utilities, in retrospective conversion of at least 30 million primary catalog records, and in the copy cataloging of literally hundreds of millions of individual catalog records.

The size of a database, as a digital library, simply cannot be evaluated in such simple terms as books and journals. It is completely determined by the nature and scope of the data stored in it, not by the effects of packaging.

Digital Libraries of Software. This example of digital libraries includes those that provide software – operating systems, application programs, educational multimedia packages, games. The software industry as a whole is huge, but the issue at hand here concerns the role of digital libraries in the distribution of software in contrast to the sale of individual programs.

The nature of software digital libraries can nicely be represented by shareware libraries, since they are well established in exactly that way. They are readily available in both CD-ROM and online, with dozens of servers providing access to them. A typical such library will contain 5K programs, ranging from 10KB to 1MB and more, so it can very effectively be distributed on a CD-ROM. Online distribution is equally effective, and the downloading of a single program takes only minutes.

Digital Libraries of Multi-Media. The final example of digital libraries are those that include films, videos, and other images of all kinds, music, voice, and sounds of all kinds, maps, graphs, and illustrations of all kinds. All of these forms of data requires massive amounts of storage, so the related digital libraries, whether in CD-ROM or online, will be huge.

The Operating Costs in Distributing Digital Libraries

Distributors & Retail Outlets. These are the traditional means for book distribution and have become important means for software distribution. It is of interest to note, though, that retail stores as examples of this means for distribution are now facing severe competition from the online services and mail order delivery houses.

As means for distribution in CD-ROM format, these are crucial agencies. They are likely to continue to represent the 40% cost (in the form of the discount to the distributor) that they do for print materials. They include stores that sell computer hardware and software and office equipment and supplies, mail order houses, and online versions of each that continue to proliferate.

Digital Libraries in Academic, Research, & Public Libraries. An issue of current concern in libraries and in the institutions they serve is the role of digital libraries in their operations. The predominant view is that print will continue to be important and that the two forms of libraries – print libraries and digital libraries – are complementary rather than competitive. In this view, print materials will continue to be acquired by libraries, and digital libraries will be both acquired by them, usually in the form of optical media (i.e., CD-ROM and DVD), and accessed by them (through the online services). Printed books will continue to be acquired so as to fulfill the library’s traditional imperatives of assuring preservation of the records of the past and providing economic access to them at minimal costs to the library’s clientele. Optical media will be acquired for the same purposes, in this case of the digital records, and the online services will be used for reference services to the most current materials.

It is likely, though, that journals of whatever kind, even though some perhaps may be acquired in print form, may not be stored in print form. Already many libraries have replaced their bound volumes of popular journals, magazines, newspapers, and scholarly journals by microforms, and the transition from that to digital libraries in optical formats is almost certain.

For the various kinds of libraries, collections of whatever kind must be selected, acquired, processed, cataloged, and stored. Those all represent capital costs. The effect of digital libraries upon the capital investment in buildings, at least as far as academic libraries are concerned, is likely to be very great. In particular, today journals typically represent half of the bound volumes in academic research libraries. Replacing them by optical media would thus cut to half the capital cost for collection storage. The costs for delivery of services include those for circulation and other uses of the collection and a variety of reference and other user-oriented services.

What is the future of academic libraries and what will be the impact of digital libraries upon that future? To a great extent, the main issues in my perception of the future have already been identified in the prior discussion and most of what I will say is simply an amplification of it.

Before discussing specific allocations among formats, it is worth discussing the economic aspects of acquisition policies. Current, acquisitions typically represent about 33% of total academic library budgets. Given an estimated annual income for academic libraries of $5 billion, that implies purchases of about $1.5 billion. In current acquisition practice, it is divided between monographs and journals in the ratio of about 34%/66% so, given that, $500 million goes to monographs and $1,000 million to periodicals. In passing, it is worth looking at the expenditures for materials by academic libraries in the context of the income to publishers. From the prior paper, that income was $15 billion for books. However, scholarly, reference, and professional books (which are the bulk of acquisitions by academic libraries) represent only 20% of the total, or $3 billion. Thus, the income from academic libraries to the publishers of scholarly, reference, and professional books would be about 20% of their total income (i.e., 0.6billion/3billion). In a study I did some twenty-five years ago, examining the sales from two publishers of professional books to special and public as well as academic libraries, I found that libraries in general represented 40% of their total sales, so the figure of 20% for academic libraries is reasonably consistent with that result.

In 1996, for the average ARL library these figures translate into 30K monographs acquired, at a cost of $1.74 million, and 11K serials purchased (of a total of 29K current serials acquired, the remainder being through means other than purchase) at a cost of $3.4 million.

I see no reason to expect a significant change in that distribution of expenditures between monographs and journals, but I do expect to see a significant change in the distribution in both the means of access and the format. Specifically, I will take the monograph budget as continuing much as it now it, and I will assume that it will be spent on the acquisition of print form books. But I will assume the journals budget will be distributed quite differently. The following (assuming constant dollars) are hypothetical values used to illustrate a possible pattern on which, then, other kinds of estimates can be based:

 

 

 

PERCENT

DOLLARS

Purchase of Print Monographs

34%

$500 million

Purchase of Print Journals

30%

$450 million

Purchase of Digital Journals

30%

$450 million

Document Delivery of Journals

6%

$100 million

For the average ARL library that would imply a purchase of 5K titles in print form (at say $1.5 million) and of 5K titles in digital form (again at $1.5 million) with dependence on document delivery services for the remaining 1K titles (at a cost of $0.4 million). Assuming a charge of $15 per document delivery request filled, that would imply nearly 27K requests. The current level of inter-library borrowing at the average ARL library is 16K, of which half is for document delivery. So the effect of these hypothetical values would be to more than triple the use of document delivery. Given the role that Internet access would play, that is not irrational.

One of the most significant effects of digital libraries will be on space requirements, since the digital media occupy dramatically less space than the print media. Of course, they do require equipment for their use, so already there have been substantial increases in the number of carrels with full multi-media capabilities in every academic library, but the space required for the equipment is orders of magnitude less than that for the storage. Taking the hypothetical figures in Table 3 as an illustration, future growth is space requirements would be 64%, less than 2/3, of what they would be for purely print acquisitions.

The existing print collections of academic libraries will continue to be important. While it is true that the use of library materials is highly skewed over their age, with current materials being much more heavily used than older materials, there still is a substantial amount of need for access to the older materials. In a very real sense, it is precisely this fact that makes the library important to society; by assuring the continued availability of older materials, despite a low level of use, we preserve the past for use in the future.

Of course, from the standpoint of "good business", that may make little economic sense. Why waste resources in preserving what is being little used when the resources could provide better access to the current information which is of greater immediate value? Indeed, for the individual person or company, such an investment is probably not worthwhile. But for society it is, and it is that which makes the library an essential institution. It is an investment by society at large to assure that individuals and companies will have access to information from the past without needing to make uneconomic and duplicative investments themselves. And the academic library or, more to the point perhaps, the research library is our primary means for doing so, in part because use of prior records is central to the academic functions of teaching, research, and public service.

Will print materials continue to be published and acquired? As I have said, my view is unequivocally that they will. The printed book is still a remarkably effective means for publication and distribution, and in my view it will continue to be so for the foreseeable future.

Now, I recognize that electronic technology advances rapidly. There is no doubt in my mind that an "electronic book" will be created soon, if it hasn’t already been so; indeed, there are announcements each year of the potential if not the reality of its development. Such an electronic book would be the size and weight of a printed book, even of a paperback, and could be easily held in the hand. It would contain high density digital storage, with text and images on call from potentially hundreds of books, instantly available for reading and replaceable with other collections. It would have a display with all the resolution and appearance of the printed page. It would have functionalities that will permit the user to read the text and view the images exactly as though it were a printed book, but it would have additional functionalities (such as searching, annotating, and processing) that are impossible with just the printed page. It is likely that the cost for such a unit will be such that prices can be set to generate a reasonable market. Already we see hand-held DVD players selling at well less that $1000.

Why then my unequivocal view that print publication will continue? I think that the economics of publishing, of the associated packaging decisions, of the mechanisms for distribution and sale, of the nature of the market, and of the orientation of the consumers all support my view, at least with respect to monographs (i.e., books) and popular journals.

Assuming, then, that print publication will continue, libraries will also need to continue to acquire materials in that form. But I think that the types of materials acquired in print form will increasingly be focused on monographs and popular journals.

It is interesting to note that few libraries have as yet made substantial investments in digitally formatted materials. There are, it is true, collections of CD-ROM reference databases, usually on a subscription basis, but very few collections of other kinds of materials, despite the increasing numbers of CD-ROM publications. It is especially surprising that few, if any, libraries acquire databases, software, or media in digital formats, despite the ease of use they provide.

For example, in all of the libraries I have visited, in none have I seen a collection of CD-ROM shareware packages, yet they would have great value to users. In few libraries have I seen significant numbers of databases, beyond the U.S. Census, in CD-ROM format, yet again they would be of great value to users.

I think that the current situation will change dramatically during the coming five years. First, the U.S. Census for 2000 is likely to be distributed primarily in electronic format, and that will add great impetus to distribution of other databases in similar form. Second, there are rapidly increasing numbers of packages based on computer-mediated instruction, and academic libraries will of necessity need to acquire them. Third, software distribution must be in digital form, and libraries should at least consider acquisition of packages for use by students and faculty.

Of course, the significant acquisition in digital form is expected to be of scholarly journals. Here, there are significant advantages to the digital formats, optical and online, to both the publishers and the users. For the publisher, the costs of printing and mailing are dramatically reduced. For the user, the journal issue is replaced by access to the journal article of specific interest or to tailored combinations of articles that match the needs of the individual user; these fit the real patterns of use far better than the current means for packaging and distribution.

If we take the hypothetical figures given above as a basis for estimate, there would be purchases by academic libraries of as much as $405 million of journals in digital format.

While preservation of the record of the past may be the primary imperative for the library, providing access to that record and assistance in its use is of virtually equal importance. These are the services provided by the library. The traditional services include circulation, in-house use, photocopying, and similar uses of the materials. They include reference, inter-library borrowing and lending, access to means for document delivery, consulting and information analysis, assessment of quality and value and selection of the most appropriate information resources, instruction and guidance in the use of them.

The effect of digital libraries and especially of online access to them, will surely be a shifting of the workloads on these services from those involved in use of materials to those involved in substantive aspects. Already academic libraries have expanded their roles in providing substantive information services, in particular in bibliographic instruction. As part of that, though, the need for assessment of quality, of reliability, of accuracy is becoming even more important. The Internet and World Wide Web are overwhelming in the sheer magnitude of information available through them, and the user must carefully assess those issues of quality. Even in the context of journalism, despite its historic commitment to assurance of accuracy, there have recently been too many instances in which information was taken from the Internet as though it were valid and yet turned out to be false or even fraudulent. Therefore, perhaps the most significant service that the library can provide is that in selection and quality control, in assessment of accuracy, reliability, and value.

Another service that is becoming increasingly important as a result of digital libraries is more technical perhaps but still vital. It is the creation of what is being called "meta-data" (i.e., "data describing data"). Historically, it is represented by cataloging, by indexing and abstracting, by bibliography, by reference databases. Today, we see need for this service for digital libraries, again as exemplified in the overwhelming magnitude of what is available on the Internet.

One of the great potentials for a dramatic increase in the role of academic libraries is the production of digital libraries. Already there are a number of cooperative efforts to create state and regional "digital libraries". Just to consider the one with which I am most familiar, in 1997, the University of California announced the founding of the "California Digital Library … a service that will make it possible to bring to computer screens statewide the holdings of UC libraries and others throughout the world". In the announcement, it was stated that, "UC libraries have already taken major steps towards digitizing parts of their collections, some of which are already available over the World Wide Web. For example, UC Berkeley had digitized much of the Bancroft Library's photos historical photos of California, UC Santa Barbara has put its vast map collection on the Web and the California Museum of Photography at UC Riverside also displays its rich store of photographs on-line.

Clearly, the existing print collections of academic libraries – books and especially rare books, journals, manuscripts, maps, photographs, and the rich array of other special collections – are a primary source for materials needed to create the vision of broad-scale digital libraries of the kind envisioned. Whether they are distributed online, as the description of the California digital library clearly implies, or in optical formats is not a significant issue. The important point is that academic libraries have a vital role to play in the future of digital libraries, beyond merely acquiring them and providing access to them. They will be the sources for much of what goes into them.

 

30 YEARS AGO

CURRENT

10 YEARS

FROM NOW

Central Administration

5%

7%

9%

Reader Services

47%

56%

65%

Selection

24%

23%

18%

Cataloging

24%

14%

8%

I turn now to the likely future for library staffing, both for internal, technical services – the processing of materials acquired by academic libraries – and for services to the readers. Even in the current context, significant changes have occurred as a result of automation. Thirty years ago, the staff in technical services and reader services were virtually equal in numbers and, in technical services, those in acquisition and in cataloging were also virtually equal. To put it most simply, the division of operating staff was about 50% in reader services, 25% in selection and acquisition, and 25% in cataloging. Today, the division is about 60% in reader services, 25% in selection, and 15% in cataloging. The underlying cause, of course, is the growth of the bibliographic utilities and the resulting replacement of original cataloging by copy cataloging. In some libraries, the shift of staff was effected by retraining and reorientation; in others, by simply allowing retirement to do so.

A second significant change even today has been in the staff for central administration, again as a result of automation. It is in the addition of "systems staff", required to manage the computer-related equipment and software

I think that those trends will continue as a result of digital libraries. There will be a continuing shift of staff from technical services into reader services and increasing requirements of systems staff in central administration. Shown above is a hypothetical distribution, say in a decade, compared with that of today and of 30 years ago:

The Internet And The World Wide Web. Turning now to that remarkable phenomenon, the Internet and the World Wide Web, as the means for online distribution, not only are production costs of digital libraries dramatically reduced, but distribution costs are possibly even more so. The cost of physical transport, whether of books or CD-ROMs, is replaced by the cost of online storage and of digital transmission. Mailing a book might cost $2.00; mailing a CD-ROM, the equivalent of say 20 to 30 books, perhaps a dollar; sending the text of the book at 28K baud, perhaps 3 minutes of connect time at zero cost through almost any of the online services.

Unfortunately, assessment of Internet costs clearly is very complex. Costs occur at several stages in the chain of distribution, and funding them are mixtures of public funding, institutional funding, advertising funding, and individual funding. Many of the costs are subsidized and thus buried in other accounting categories. The costs for network access are largely independent of actual use, being represented by connection charges that are a function of delivered bandwidth, reflecting anticipated demand. The rate of growth of the Internet and the World Wide Web is so great that it is virtually impossible to obtain data that will be consistent; data on one component of operations, reported at one point in time, cannot be compared with data for another component, reported at another point in time.

Complicating the assessment of costs in distribution on the Internet is the fact that many elements of costs are borne by the users instead of by the producers and/or distributors. In particular, the cost of printing is totally borne by the user, and it is by no means negligible. And if binding is required, the costs are a magnitude two or three times greater than experienced by the publisher. Beyond that are the very real costs in users’ time in all of the processes of acquisition, downloading, and managing of the digital files. Even in cases where the Internet is used as means for ordering rather than downloading, many of the distribution costs are shifted to the user. An example, is "shipping and handling costs", a set of costs that users apparently have become either oblivious to or willing to pay, and the charges for them are now at a level that they represent NOT recovery of costs but sources of profit. Presumably, shifting of costs to users represents a decrease in the costs of distribution as far as publishers and/or intermediaries are concerned and therefore either decreased prices to the users or increased profits to the vendors.

The most recent issue (1997) of Statistical Abstracts for the United States reported data for 1995 as follows. (In the following text and tables, K stands for "thousand", M for "million", and G for "trillion".) First, there is the underlying Telephone Infrastructure. The capital investment represented by this structure is reported to have a book value of $284B. Of course, that represents the several centers and the communication lines connecting them (a combination of classical telephone lines, co-axial cables, microwave transmitters, optical lines, and satellites). AT&T claims that 90% of its lines are optical fiber.

CENTERS

NUMBER

EMPLOY

INCOME

EXPEND

CAPITAL

’98 USAGE

Principal Providers

553K

$117.00B

$88.00B

$29.00B

22B mins

Other Providers

$79.00B

$69.00B

$10.00B

Total

$196.00B

AT&T Structure

1

Regional Centers

10

2

Sectional Centers

52

3

Primary Centers

168

4

Toll Centers

933

5

End User Services

18803

An interesting fact is that the percentage of the total income represented by Network Access has, despite the phenomenal growth of the Internet, been essentially stable; it has even slightly but steadily declined during the years from 1990 through 1995. The percentage from the basic services has in fact increased, and the growth of cellular phones has been even more spectacular. Aside from that, though, the costs in Network Access are almost independent of growth of the Internet, given the fact that they are based on connection charges rather than usage charges.

TYPE OF SERVICE

USERS

AVERAGE

TOTAL

Local Service, Residential

101M

$19.54

$19.73B

Local Service, Business

46M

$41.77

$19.21B

Long-distance Service

   

$81.67B

Network Access

   

$34.96B

Cellular Phones

   

$21.04B

Other Income Sources

   

$32.35B

       

Total

   

$208.96B

Let’s now examine Network Access in more detail. On top of the telephone infrastructure is that of the Internet (the data for numbers being as reported from Internet sources for 1998, and the financial data being estimated as will be discussed):

 

Level of Access

Number

Employees

Expend

Access

Operation

Capital

               
 

Backbones

39

1K

$35.00B

 

$26.00B

$9.00B

               
 

Domain Servers

850K

235K

$42.50B

 

$23.50B

$19.00B

 

Service Providers

13M

130K

$19.60B

 

$14.70B

$4.90B

 

Hosts

26M

         
 

Institutional Users

70M

   

$30.34B

   
 

Individual Users

70M

   

$4.66B

 

$2.80B

 

Total

   

B

B

B

B

The figures for Capital represent the amortization of investment in computers (hardware and software), with 5 years as the assumed period for amortization. As will be discussed later, the investment in network related computing is taken at 20% of the total, so the total shown in the table above implies that the total yearly expenditure for computer hardware and software would be $179B. That is reasonably consistent with the reported sales of $131B for computer hardware alone in 1995 (Statistical Abstracts of the United States, 1997).

The Backbones are those organizations that provide the primary means for distribution within the Internet; they include CompuServe, MCI, DIGEX, IBM, AT&T, Sprint, etc.. The Domain Servers (which might be called the Primary Access Agencies) are the central agencies, serving both the Service Providers and the Hosts (i.e., computers accessing the Internet) and, through them, the Users. They provide the connections to the Internet, the local processing, the data storage facilities. They include some domains identified as ".net" (such as "ans.net", "uu.net", etc.) that are also Backbones as well as some identified as ".com" (such as aol.com) that provide generic access services. The great majority, though, are institutions, including especially corporations that are among the ".com" domains, and are intended primarily to serve their institutional objectives, including services to the public.

The numbers shown in the following table are for US data and are based on interpretation of the Internet Domain Survey Data for July 1998.

PRIMARY DOMAINS

Hosts

Level 2

Level 3

Level 4

         

WORLD-WIDE TOTAL

36.7M

1.05M

13.1M

23.6M

         

USA TOTAL

25.7M

0..9M

10.1M

14.8M

         

com -- US Commercial

10.3M

0.7

5.3M

4.3M

net -- US Networks

7.0M

2.4

4.6M

edu -- US Academic

4.5M

1.5M

3.0M

us -- US Generic

1.4M

0.1M

1.3M

mil -- US Military

1.3M

1.3M

org -- US Non-Profit

0.6M

0.6M

gov -- US Government

0.6M

0.2M

0.4M

         

Other Countries

11.0M

0.2M

2.9M

7.9M

The spectacular rate of growth of the Internet is well exhibited by comparing those data, for July 1998, with data for July 1999:

PRIMARY DOMAINS

Hosts

Level 2

Level 3

Level 4

         

WORLD-WIDE TOTAL

43.2M

1.3M

14.9M

27.0M

         

USA TOTAL

30.8M

0.9M

10.8K

19.1M

         

com -- US Commercial

12.1M

0.9M

6.0M

5.2M

net -- US Networks

8.9M

0.1M

2.9M

5.9M

edu -- US Academic

5.0M

 

1.7M

3.3M

us -- US Generic

1.6M

   

1.6M

mil -- US Military

1.5M

   

1.5M

org -- US Non-Profit

0.7M

0.1M

0.6M

0.2M

gov -- US Government

0.7M

 

0.2M

0.5M

         

Other Countries

12.4M

0.4M

4.1M

7.9M

It is relevant to note that while the number of hosts and Level 2 and 3 installations in other countries have increased by between 10% and 100%, the number of Level 4 installations appears to have remained constant. Of course, the impact in this respect varies greatly from country to country. Brazil, in particular, rose from a ranking among countries of 18th in 1998 to 16th in 1999. Interestingly, though, Taiwan, which is much smaller in population, rose from 23rd in 1998 to 12th in 1999, directly reflecting a national commitment of resources to encouragement of Internet use.

In the U.S., the number of Level 4 installations has increased by about 20%. This reflects the extent to which personal use in the U.S. has so dramatically increased. The 1998 Statistical Abstracts of the United States reported that the number of persons with Internet access was as follows:

 

HOME OR WORK

HOME ONLY

WORK ONLY

HAVE ACCESS

46.3M

23.8M

22.9M

USED IN LAST 30 DAYS

28.1M

19.8M

13.9M

The number of corporate users is taken at 70M (the sum of those having access at work), and the number of individual users at 70M (the sum of those having access at home); while that clearly double-counts persons, it does not double-count access points. The rationale is that the issue in assessment of economics is the access, not the actual use. The crucial point is that both places for access – at work and at home – must be funded; that at work by the institution and that at home by the individual.

In the table above, the costs for the corporate users are assumed to be included among those for the corporate components of the Internet structure. As a start to estimation of those costs the Level 2 domains are taken to be the Domain Servers (what are called "registered domains"). They include 742K sites whose primary domain is ".com", among which "aol.com" is the largest with over 9M customers in early 1998 and 13M in August 1998, but the overwhelming majority are corporations serving their own internal operations; 3.5K sites are ".edu" (among which the universities collectively probably serve the largest numbers of individuals); 64K sites are ".org"; and 0.4K sites are ".gov". It is the corporate ".com" institutions that really are funding the operation.

Given the 850K Domain Servers, the revenue of $35B from them to the Backbones implies an average cost of $41K per Domain Server. In July 1998, the average "quoted price" for a T-1 connection (which provides 1.5 MBPS) was between $20K and $25K per annum (varying from $12K to $36K), but there are a variety of discounts and add-ons from the "quoted discount". The number of T-1 servers, or equivalent larger bandwidth options, required at any given Domain Server is a function of the expected traffic.

Commercial services, such as "aol.com", that are Domain Servers presumably are funded by the usage fees paid by their users. In the case of "aol.com", the fees are about $252 per year (based on $21 per month), but the average payment apparently is closer to $133 per year, the difference presumably reflecting the large number of "free" subscription and other promotions. In any event, "aol.com" appears to be profitable at that level of income, and it is likely that the other commercial Domain Servers are also.

The other Domain Servers – those serving industries, educational institutions, non-profit corporations, and government – are heavily if not totally subsidized by the institutions to which they relate. In the table above, it is assumed that the Expenses ($42.5B) and Access costs ($30.34B) shown for those Domain Servers represent whatever may be the source of funding, whether user fees or institutional subsidy.

To put that value in perspective, let’s look at the 200 or so major universities and colleges. The yearly expenditures for libraries in those 200 or so institutions totaled about $3B ($2.5B for ARL and $0.5B for ACRL); while they are less than 10% of the ".edu" sites, unquestionably they are the major ones and probably represent at least 90% of the total Internet activity. There are no similarly consolidated data for expenditures in computing at those institutions, so the totals for them will arbitrarily be taken as twice those for libraries. It will be assumed that they are divided 80% in centralized computing and 20% in decentralized. Therefore, the ".edu" expenditures for computing are $7B. The institutional expenditures for higher education totals $200B, so expenditures for computing (hardware, software, staff, and overhead) would be about 3.5% of total expenditures. If we assume a similar pattern for commercial organizations, and take the commercial total at $14G, we would have ".com" expenditures for computing at a total of $490B. The estimated $72.8B for Internet activities at the Domain Server level would thus be 15% of the total. Is that reasonable?

Let’s try this: Take Domain Servers as part of centralized computing at all of these institution, and the Service Providers and Hosts as part of decentralized. The result is that Domain Server activities represent 20% of centralized computing – i.e., $77.5B/(.80*$490B) = .20. That’s a reasonable start, since a qualitative impression is that Internet access is only a small part of the total computing requirements.

The Level 3 domains are taken to be the Service Providers, the points from which delivery of services, such as access to digital libraries, are obtained. Level 3 and Level 4 are taken as the users of those services (with the Level 3 sites being both users as well as sources and thus being double counted).

Let’s then assign a similar portion (i.e., 20%) of the decentralized budgets to Service Providers. That gives us a total budget of $19.6B (i.e., $490*.20*.20 = $19.6B) for Service Providers. Given 13M domains at Level 3, we have a cost of $1507 per Service Provider. Assume that’s divided 75% operations and 25% capital. That would give $1130 for operations and $376 for amortized hardware and software (representing an investment of say $2K). To illustrate, the charges at one such Services Provider (CELCEE, at UCLA) are about $129/month (for up to 25MB storage). That’s $1548 per year, which is close to the picture given above. Assuming that the $1130 for operations is an allocation of staff time and overhead, a Service Provider represents about 1% of an FTE. The 13M Service Providers thus represent 130K persons.

Now, clearly the amount for staff must be a mere allocation of time, and the amount of hardware and software is a mere allocation of a large facility. We might assume that there must be an intermediate level of aggregation, with say 10 service providers at an aggregation facility, on the average. That may be illustrated by the "aol.com" statistics, which show 1M hosts covering 9-13M customers; in other words, there is a pattern of 10/1 aggregation.

Thus, this model would assume that a Service Provider is simply a small part either of a group of Service Providers or of a much larger activity (for example, the producer of the digital library). That makes sense. To illustrate the former, the Service Provider is served by one of the Domain Servers (along with many others); for the latter, the cost of the Service Provide is really small in the total operation which includes the labor to create the digital library. A ten-fold aggregated facility would have a budget of $15K, which would cover operating costs of $11.3K and amortized capital of $3.8K. The capital costs imply hardware and software costing about $20K, which is a reasonably sized facility. The operating costs of $11.3K must cover costs for connection, staff, overhead, and other operating expenses.

Taking "aol.com" as an example of a Domain Server, it has 1M Level 2 and Level 3 domains that it serves. It has a total budget of $1.2B. Taking the figure of $1.5K as the nominal Service Provider, "aol.com" would fund 800K such unit servers. They have 9M-13M customers (at $133/annum), so that’s 11 users per unit server. (Actually, they have 1M unit servers, with 9 users per unit server.)

The hosts are the computers connected to the Internet network, either permanently or by dial-up, and they are the focal points for the users to obtain delivery of access to digital libraries (as well as the other products and services of the Internet and the WWW). Users are the ultimate users of the services of the Internet, including use of digital libraries.

Now, given the fact that the costs of access to and delivery of data through the Internet are essentially independent of the actual use (all being funded by the connection charges for the Domain Servers), the only real costs incurred in providing access to a digital library are those in storage of the data. The available data from commercial providers of Domain Servers show yearly costs of $6.00/1MB (megabyte).

In the context of digital libraries, if we take a Service Provider as essentially a typical digital library consisting of 25MB, the yearly storage cost is $150, a negligible amount compared to the capital cost it represents, but still not trivial, and virtually half of the capital costs for a typical Service Provider. In any event, the yearly cost for making that digital library available is $1500 – the total yearly cost of the Service Provider.

The Income from Digital Libraries to the Producers

Given the investment and operational costs involved in digital libraries, their production and distribution, there must be a sufficient market at a sufficient price to warrant investment. What is that market and what are the appropriate pricing decisions?

Consider a digital library (for example, one consisting of 20 books each of 300 pages). Assume that the storage requirements total 25MB, which could be the basis for a marketable CD-ROM package or the file for a Service Provider. The cost for production and delivery of a CD-ROM will be taken as $4.00; the cost for the Service Provider at $1600 per annum.

Assessment of the two means for distribution, in comparison with each other and with print, depends upon many pricing decisions: What price will be set for each means for distribution? What will be the units of sale? Who will be responsible for distribution of each product type? How will the discount to the distributor be handled? What will be the basis for royalty payment? How will the fixed costs in capital investment be allocated across the different product types?

Books And Popular Journals. Let’s look at these issues with respect to books and popular journals. On the issue of price, the decision for printed books has a well established basis of sales data that clearly shows what the market is willing to pay; a price of around $30 for a 300 page book is typical, and a significant departure from it would need real justification. That for CD-ROMs seem to be very consistent, with price for most of them averaging $30 per CD-ROM. With respect to online distribution, the picture is by no means clear. In fact, a very high proportion of material on the Internet is available at no charge.

The unit of sale for printed books, is obviously the individual book. That for CD-ROM, the CD-ROM, each containing perhaps 20 to 30 books. For online delivery, it is unlikely that it will be used for printing out a full book; instead it is likely to be just portions of a chapter – say on the order of 10 pages at a time.

Turning to responsibility for distribution and associated discounts, it seems likely that the current patterns will continue, much as they now are, for both print and CD-ROM formats. But for the online version, there would seem to be no reason for the publisher not to take on that responsibility, removing the need for a discount and transferring those costs either into income or price reduction.

The basis for royalty payment is unclear, but surely is a matter for negotiation between the author and the publisher.

With respect to allocation of capital investment, for books a possible strategy would be to adopt a phased entry of products, with the first distribution restricted to the print version and then, once print sales have peaked, the two digital versions being launched in parallel. Given that strategy and the likely realities of the prices that markets will bear, it would make sense to have the print version carry the great burden of capital costs and the digital versions only a token amount.

Scholarly Journals. If the patterns develop as expected, the likelihood is that scholarly journals will be increasingly distributed not in print but as digital libraries. The primary unit of sales would continue to be the subscription. For individuals, membership in a society publishing a scholarly journal would include a subscription, as it now does, but with the journal delivered either in CD-ROM form, online form, or both, and with essentially unlimited access to the online version. For libraries, a subscription to a journal of any kind would be paid as it now is; again, the journal would be delivered either in CD-ROM form, online form, or both, and with essentially unlimited access to the online version. The subscription price would therefore need to bear all of the costs – in capital investment, in production and distribution, in subscription fulfillment, etc. Of course, the purpose of current experiments, jointly between academic research libraries and commercial publishers, are to explore the financial implications of digital libraries.

Having said that, for the online digital library of a journal, the unit of delivery, as distinguished from unit of sale, is likely to be the individual article rather than the journal issue or volume. That means that there is at least the potential of sales of individual articles beyond those covered by the subscriptions. This is clearly represented by the various "document delivery services" which market exactly that kind of distribution, at prices per article on the order of $10 to $20 each. In this respect, they indeed serve as distributors and their price clearly reflects an effective "discount" rate of 40%.

Other Forms of Publication. For the other forms of publication – databases, software, multimedia, etc. – similar pricing decision must be made, but the basis for assessment of them and of their effects is even more uncertain.

Summary Picture. The following might well be the resulting price structure for the representative digital library of 25MB:

ELEMENT OF COST

Print Product

CD-Rom Product

Online Product

Royalty

$60.00

$3.00

$1.00

Capital Invest

$120.00

$7.00

$4.00

Marketing

$80.00

$4.00

$1.00

Discount

$240.00

$12.00

$4.00

Produce, Distribute

$100.00

$4.00

 

TOTAL

$600.00

$30.00

$10.00

For the online product, in that table it has been assumed that the sales from the digital library are small portions of the file – say on the order of 0.1MB. That might be an article from a journal, 10 pages from a book, selected records from a database, a shareware program. For the online version, it has been assumed that the Service Provider would continue to be an independent distributor, as illustrated by the example of the document delivery services, and therefore require the same kind of discount as the other means for distribution; the $4.00 per sale shown for the Discount would therefore require at least 400 sales to cover the costs of operation. However, it would seem very reasonable for the publisher to be the Service Provider, with the yearly cost of operations included in the capital investment and the discount becoming income.

A typical Service Provider site might experience between 15K and 60K hits per month – a hit being an Internet call, either by direct connection to the URL or by hyper-link referral from another site. What percentage of hits results in a download? Based on experience at one such Service Provider, perhaps 10%, but that involves no charges. If there is a charge, it might be about 1%. That would generate income from 300 uses per year of the database. At $4.00 discount per use, that generates $1200 on the expenditure of $1500, hardly a world-shattering profit picture. And indeed, there are few Service Providers today. Of course, there are some spectacular successes, like "amazon.com", but even "amazon.com" is not yet profitable.

As has been mentioned earlier, one of the characteristics of digital libraries is that they shift many of the costs from the producers and distributers to the users, including in particular the costs for printing and binding. The problem in making comparison of the digital means with print is how to deal with that fact in assessing total costs, and there is no simple answer. The extreme would be if the user were to produce a set of printed and bound volumes for the entire contents of a digital library. In the example given above, that would mean printing and binding 20 books, for which the cost to the user would be at least $200 (i.e., $10.00 per book, assuming costs of $0.03 per page for printing plus $1.00 per volume for binding) and possibly as much as $300. And that doesn’t even consider the costs of personal time in managing the process. For the CD-ROM as source, the total cost would thus be from $230 to $330. For the online source, it would be almost prohibitively expensive, if the charge of $10 for 10 pages applied, so a different pricing structure would be needed. On the surface of it, though, use of online digital libraries as sources for such a process of local printing simply does not make sense.

The Macroeconomics of Digital Libraries

I now want to make some brief comments on the macroeconomics of digital libraries, that is, on the effect they have on the distribution of the work force. To those comments I bring a model of the more general impact of information economies. It built on a four sector structure (first proposed by Marc Porat) which adds an Information sector to the traditional three sectors of the national economy – Agriculture, Industry, and Services. The following table presents the resulting summary four-quadrant picture for the economy of a country, such as the United States, which has progressed through the stages in development from subsistence to agriculture to industry and, now, to information. Later in this paper, I will extend that picture to countries at earlier stages in development.

Full-Scale development, representing nations like the United States

 

FUNCTIONS

INDUSTRIES

Non-Information

Information

     

Non-Information

50%

30%

Information

6%

14%

These data reflect the situation of some decades ago, and there have been substantial increases in the percentages of the U.S. workforce engaged in information activities since then.. However, for the purposes of this article, this general picture will suffice. Within that structure, digital libraries fall into the quadrant "information industries/information functions". It would be useful from a macro-economic perspective to know what proportion of that quadrant’s percentage of each national workforce was involved in distribution of digital libraries. To that end, I will use the discussion presented above concerning the Internet as means to explore possible means for estimation.

To do so, though, requires a finer grain than the simple four quadrant picture shown in Table 6. Fortunately, the paper presented in 1992 developed a schema for sub-dividing the quadrants with results as shown in Table 7 (now expressed in terms of millions of persons rather than in percentages), in which the types of information industries shown are of special importance to assessing the impact of digital libraries.

Total Workforce Distribution in the United States (1992)

 

-------------- CATEGORY OF FUNCTION ------------------

 
 

Non-Information Functions

----------- Information Functions -----------

 

INDUSTRY

           

Manage

Support

Hardware

Substance

           

NON-INFORMATION

           

Low Tech

48.98

6.95

6.25

1.46

6.25

69.89

High Tech

10.78

2.32

2.08

4.17

4.17

23.52

           

INFORMATION

           

Facilities

2.03

0.29

0.26

0.06

0.26

2.90

Transactions

1.69

0.64

1.27

1.14

1.14

5.89

Hardware

1.84

0.40

0.36

0.79

0.71

4.10

Distribution

3.77

1.42

1.27

2.55

5.66

14.67

           

TOTAL

69.09

12.00

9.58

8.48

15.16

120.97

First, note that it was estimated that the 13M service providers represented about 130K FTE persons. In the United States, the total workforce in 1992 was about 121M. From the table, the percentage of them engaged in information work is 44%, divided virtually 1/3 in the information industries and 2/3 in non-information industries. On the one hand, it must be said that the nature of digital libraries is that they clearly fall into the category of information industry labeled "Distribution" – along with all of the other agencies for producing and distributing information products – and the persons engaged in maintenance of them are all performing substantive information functions. On the other hand, the evidence is that digital libraries are being made available by many non-information companies, so there does not seem to be any reason to treat the distribution of digital library related persons as substantially different from that for information workers in general.

For purposes of this paper, therefore, I will treat the 130K estimated persons providing Internet digital library services as all falling within the substance functions but divided between the non-information industries and information industries in a ratio of 2/1, which would put 87K in non-information industries and 43K in information industries. The total number of those persons, as shown in the Total row of the table, is 15.16M, and 130K is about 0.8% of that total. Incidentally, the providers of network services – the telephone companies and the domain servers – all fall within the Transactions industries in the table.

Online services overall represent only about 5% of the total of $20B for organizations engaged in information distribution, though they are totally concentrated on digital information while the other agencies (book distributors and retail outlets, academic and public libraries) cover the full range of information media. In academic libraries, for example, computer data files in 1996 represented only 5% of total acquisitions. Assuming that a similar proportion applies to the other Distribution agencies and that staff are roughly in the same proportion, that would imply another $1B (i.e., 5% of $20B) in digital library distribution. Hence, the total commitment of manpower in digital library distribution – Internet and CD-ROM – would be about 260K with 87K in non-information industries and 173K in Distribution agencies. The total of 173K represents about 3% of the total substantive staffing within Distribution industries.

Now, the evidence is clear that digital libraries are increasing exponentially. CD-ROM publication has been doubling every two to three years for several years, and the Internet overall has been doubling every six months to one year. If a doubling at say every two years were to continue for the coming five-year period, the number of persons in total would increase by a factor of about 6 to nearly 1.5M persons. That is not a dramatic proportion of the substantive information workers, amounting to just 10% of the 15M total.

Is that rational? Well, just consider one expected effect of digital libraries – the conversion of library acquisitions of scholarly journals from print formats to digital formats. It is potentially possible that within the coming five year period, 20% of scholarly journals might experience such a change and within the coming ten year period, perhaps 50%. Beyond that, libraries themselves are increasingly becoming involved in the production of digital libraries. This would imply a shifting of staff in both the producers of them and in the libraries that acquire them and provide services from them from a focus on print to one on digital libraries.

An even more important phenomenon is the extent to which companies in the non-information sector are using the Internet as means for distributing digital libraries related to their business. In effect, they are becoming a part of the information sector. In the macro-economic context, the result is a significant increase in the percentage of the nation’s workforce in substantive information work, all with a focus on digital libraries.

Another fascinating phenomenon is the growth of personal Web sites. Many of them are highly professional both in appearance and in content. In effect, the consumers are becoming producers of information products, and the result again is a significant increase in the percentage of the nation’s workforce in substantive information work. Note that the FTE involved in maintenance of a service provider Web site is really miniscule – perhaps 1% of an FTE (as calculated above in the discussion of service providers) This makes it completely feasible for a person to produce and maintain a site simply as a hobby. This is a growth in the number of persons focused on digital libraries that is a net increase, not simply a shifting of focus.

All of which makes the projected growth not only rational but likely.

Of course, everything said to this point has been in the context of a highly developed information economy such as is found in the United States. What is the picture for countries at other stages in development? The NIT’92 paper included means for estimating the structure of their economies in much the same fashion as that for the United States, as reviewed above. First, the following table shows the overall structure for two levels of development, parallel to that given above for the United States. It provides a picture of the impact of information economies at two levels of development.

Overall Structure for other Levels of Development

Substantial Development (most other industrialized nations )

 
 

FUNCTIONS

INDUSTRIES

   
 

Non-Information

Information

     

Non-Information

60%

25%

Information

6%

9%

Primitive Development (most peasant-based nations )

 
 

FUNCTIONS

INDUSTRIES

   

Non-Information

Information

     

Non-Information

80%

14%

Information

3%

3%

The data for Internet sites includes a row for "Other Countries". The following expands that into four sub-groups, essentially reflecting levels of development of information economies. To arrive at these values, the totals for each country, as reported in the Internet Domain Survey, were divided by the country’s population; the entire set of some 250 countries was sequenced in descending order by the values for Hosts/Population. The countries were divided into four groups (of 30, 30, 30, and 160); totals were calculated for each column and then divided by the total population for the group.

Internet Hosts in Groups of Countries

PRIMARY

Hosts

Level 2

Level 3

Level 4

per population

per population

per population

per population

Country Groups

Group 1

44681

5985

1116

13994

30 Countries

Group 2

1479

158

19

287

30 Countries

Group 3

214

9

1

19

30 Countries

Group 4

16

4

0

1

160 Countries

The countries in Group 1 are quite similar to the United States in their general economic development; those in the other three groups, successively less developed. In the NIT’92 paper, four levels of development were illustrated that are parallel to these four groups of countries. Those in Group 1 would be similar to the picture in Table 7 (which is for the U.S.). Those in Group 2 might be characterized by Table 11, those in Group 3 by Table 12, and those in Group 4 by Table 13.

Application of the Model, Level = .75

Category of Organization

Non-Information Functions

Information Functions

Total

Manage

Support

Hardware

Substance

 

NON-INFORMATION

Low Tech

51.15

6.83

4.61

1.08

4.61

68.27

High Tech

8.16

1.45

0.98

1.96

1.96

14.51

INFORMATION

Facilities

1.36

0.18

0.12

0.03

0.12

1.81

Transactions

1.58

0.37

0.74

0.50

0.50

3.68

Hardware

1.28

0.26

0.17

0.51

0.35

2.56

Distribution

2.73

0.92

0.62

1.24

3.67

9.17

           

TOTAL

66.25

10.00

7.24

5.31

11.19

100.00

Application of the Model, Level = .25

Category of Organization

Non-Information Functions

Information Functions

Total

Manage

Support

Hardware

Substance

 

NON-INFORMATION

Low Tech

75.99

8.94

2.01

0.47

2.01

89.42

High Tech

3.81

0.48

0.11

0.22

0.22

4.84

INFORMATION

Facilities

0.51

0.06

0.01

0.00

0.01

0.60

Transactions

0.75

0.12

0.25

0.06

0.06

1.23

Hardware

0.54

0.09

0.02

0.17

0.04

0.85

Distribution

1.32

0.31

0.07

0.14

1.22

3.06

           

TOTAL

82.91

10.00

2.47

1.05

3.56

100.00

Application of the Model, Level = .10

Category of Organization

Non-Information Functions

Information Functions

Total

Manage

Support

Hardware

Substance

 

NON-INFORMATION

Low Tech

84.27

9.58

0.86

0.20

0.86

95.77

High Tech

1.65

0.19

0.02

0.03

0.03

1.93

INFORMATION

Facilities

0.21

0.02

0.00

0.00

0.00

0.24

Transactions

0.33

0.05

0.10

0.01

0.01

0.49

Hardware

0.23

0.03

0.00

0.07

0.01

0.34

Distribution

0.58

0.12

0.01

0.02

0.49

1.22

           

TOTAL

87.26

10.00

0.99

0.34

1.40

99.99

Note that the percentage of persons in Substantive information functions steadily declines, by a factor of about 3 (from 11.19% to 3.56% to 1.40%). Of course, the decline in Internet hosts per population is dramatically greater, but that is reasonable given the fact that growth in the Internet requires such a formidable infrastructure – in telecommunications and in technical knowledge. The facts are that the Internet, democratic though it may seem to be, is by no means an answer to disparities in economic development. The information rich simply get richer because of the exponential effects of increasing size of the information resources.

China and India are worth examining further. Both fall into Group 4 because of the huge peasant population (about 80% of the total in each case). They provide the extreme example of a basic problem in dealing with countries in Group 4, since in both China and India industry and information activities such as universities, publishing, etc. are well developed. They really function as two-tiered economies, with minimal interaction between the two. The Internet operates solely in the industrial and information tier, so to explore that, I normalized the data by dividing by 20% of total population (i.e., just the industrial tier of their economies); the result was that each became part of Group 3, and China almost part of Group 2. Of course, the implication is that the peasant, subsistence agriculture, tiers have essentially zero involvement in the Internet, but that surely reflects reality. Similar treatment for other countries with two-tiered economies would result in similar shifts.

5. CONCLUSION

As was said at the beginning, this paper is exploratory, speculative, and largely a descriptive analysis. It is hoped that the data presented in it are reasonably accurate, but the likelihood is that they are not and for a variety of reasons. (1) There are serious problems in definition, not so much in the paper itself, in which every effort has been made to be specific in definition, but in the world at large. (2) The Internet and World Wide Web are growing so rapidly that it is essentially impossible to pin down values even for those data whose definitions are clear, and it is difficult to relate data which represent the situation at different times, even as close as days apart. (3) Many of the economic facts are obscured by subsidy, by the role of advertising, and by the self-serving relationships among participants: "I’ll mount a pointer to you on my Web site if you mount one to me on yours!" Indeed, many of those relationships are almost incestuous, as Service Providers spawn other Service Providers and provide a maze of hot-keys that inter-connect them. (4) Other economic facts are obscured by prevailing patterns of accounting that fail to recognize the role of investment in information capital resources.

Recognizing those difficulties with the data, there is value in using the data that are available for the purpose of illustrating the underlying analytical structure, and that is the reason that data are presented. The story is told of the man who played in a game of poker that he knew was dishonest. When asked why, he replied, "Because it’s the only game in town, and I want to play," and that’s the case here. If we want to play the game, in this case of understanding what is going on, we must use the available data.

There are some things that can be said, though, however uncertain or inaccurate the data are. The underlying economic fact is that, given the major capital investment in telecommunications infrastructure and in computer hardware and software, the costs of access to information on the Internet bear almost no relationship to the level of use. The costs of access are so small that they are, to all intents, zero. In fact, even the costs for creating a Web site that will serve as the means for intellectual access are minimal. This means that the costs for digital libraries that need to be recovered are not those in distribution and delivery – which historically have represented 40% to 60% of the costs – but the capital investments made in creating the information packages that they represent.

Perhaps the most significant result of this exploration is the confirmation of the exponential effects of information development. Of course, all of that simply reflects the general economic facts of information, but the descriptive model presented in this paper does make evident the reality that results from them.