032: Digital Publishing's Past

Like Ebenezer Scrooge in Charles Dickens’ A Christmas Carol, whose past, present and future spirits arrive on Christmas Eve to show him a side of himself he can’t see from within, I too will be taking you on a journey, although it will likely prove less prolific and transformative than Scrooge’s experience. Nevertheless, over the next three episodes of Talk Paper Scissors, I will be your guide through the past, present and future of digital publishing.

There’s lots to learn about digital publishing of yesteryear, including the technologies that made it mainstream. That’s what this episode is all about, where I will cover the first 40 years or so of the history of digital publishing from 1971 to about 2010.

The second episode will capture the snapshot of the last 10 years, from 2011 to 2021. Digital publishing of today represents a variety of formats on platforms both new and established.

And while we can never know about digital publishing of tomorrow until it becomes the present, there are exciting new opportunities for connection that we will explore, including the ways in which this hyper-connectedness may directly and indirectly impact the world of digital publishing. The final episode will cover trends and what might happen in 2022 and beyond.

A logical starting point before we look behind us, observe where we’re at right now and squint to see the future, is to simply define the term “digital publishing”. Let’s start there.

“Digital” refers to the digitization, or computerization, of content. Whether viewing content on a screen or listening to digitally stored music or a podcast in our ears, we live in a very digital world. Digital mediums and formats are useful for many reasons, such as their ability to be replicated an infinite number of times, shared instantly and globally and be remixed into something new. With the advent and commercialization of the Internet came new avenues to distribute content to newfound audiences.

This segues nicely into “publishing”, for which the dictionary definition states: “the occupation, business, or activity of preparing and issuing books, journals, and other material for sale”. However today’s digital landscape has broadened the scope of this definition in a way that the publishing industry even 20 years ago is unlikely to have predicted. We are all now publishers. When we create content and put it out for the world to see, whether in the form of photographs or video or writing or music or any combination thereof, we are publishers. The Internet has democratized the publishing process and if you want to get something out into the world, you no longer have to receive permission and approval and cut through red tape to do so, you can just publish it for the world to see. And if the Internet were a flame that ignited the self-publishing revolution then social media is the gasoline, the accelerant, poured onto the flames that caused them to shoot up into the air for all to see. Social media added fuel to the fire and facilitated new ways to share content with friends, family and like-minded individuals across a variety of platforms with precise algorithms getting content to people who will be most interested in seeing it. An audience can be found for nearly every imaginable subject; every unique niche topic and area of interest has a home on the Internet, unrestricted in mass, not bound by any physical size constraints. This has created a new phenomenon whereby appealing to the masses has been eclipsed by appealing to individuals, which one of my favourite creativity thought leaders, Seth Godin, explains in his wonderfully quirky little book ‘We Are All Weird”. But I digress. I’m getting ahead of myself.

When we put “digital” and “publishing” together, we come to a definition that could encompass everything from PDFs and eBooks to apps and games and social media. For purposes of this 3-part podcast mini-series, we’re sticking to digital publishing in the more traditional publishing sense. Documents and books that live in digital format, weather locally on individual computers and/or shared on interconnected platforms. That said, the potential role that social media could play in the future of the book can’t be denied. I’ll be veering into the mainstream media lane when we take the exit ramp into the future of digital publishing.

Let’s do this. Put on your slippers and overcoat, and let’s head out into the cold night air as I introduce you to digital publishing’s past.

We’ve arrived in the 1970’s.

1971 to be exact.

In his wildest dreams Gutenberg couldn’t have imagined the long-lasting impact of his efforts or the trajectory of his converted wine press. What started as a way to print bibles efficiently and profitably has transformed our entire world, enabling all of the communications technologies leading to today, as well as all other wonders of modern life.

It’s fitting then that the widely recognized birth of digital publishing started in 1971 (predating the modern Internet), with the launch of Project Gutenberg by Michael Hart. Project Gutenberg is an online library of free digital books; their mission “to encourage the creation and distribution of eBooks”. It started at the University of Illinois with Hart hand-typing ‘The Declaration of Independence’, as he tried to navigate how he would add value to the world and repay the immense amount of computer time he’d been gifted by the university (equivalent to $100,000,000).

In a 1992 article by Hart, entitled The History and Philosophy of Project Gutenberg, he states: “The Project Gutenberg Philosophy is to make information, books and other materials available to the general public in forms a vast majority of the computers, programs and people can easily read, use, quote, and search.”

Hart believed from the beginning of his Project Gutenberg journey that the “...greatest value created by computers would not be computing, but would be the storage, retrieval, and searching of what was stored in our libraries”. The entire premise upon which Project Gutenberg was founded was that anything that could be entered into a computer meant that it could be replicated an indefinite number of times. He used the term “Replicator Technology” and conceivably, anyone and everyone in the world (even outside of this world!) could have a copy of a document that lives inside of a computer. “Electronic Texts” (Etexts) are made available by Project Gutenberg so that works are in the simplest format and accessible to the greatest number of people. Project Gutenberg is all about free access of valuable works to the greatest number of people.

I recently searched for this first entry on the Project Gutenberg site and, sure enough, there it was; the very first entry with a release date of December 1, 1971, filed away on the Internet in an appropriately modest and historical URL that sums it up wonderfully: http://www.gutenberg.org/ebooks/1. That was 50 years ago. Project Gutenberg now boasts 60,000 free eBooks in both ePub and Kindle formats.

1980’s here we come.

The year is 1982 and the first CD-ROM (Compact Disc with Read Only Memory) just became commercially available. This changed the way people were able to share digital information. CD-ROM’s obviously had important uses in the music and video industry (a vintage Aqua CD still lives happily in my car), but publishers also found a way to make use of the technology. Publishers like National Geographic magazine sold and distributed CD-ROMs of their issues. This allowed for digital copies to be made available and viewed on screen at a time when the Internet wasn’t widely available for home use yet. Furthermore, digital rights management (or DRM, something we’ll be exploring in the next podcast episode) could be controlled by the publisher when distributed in CD-ROM format.

Additionally, CD-ROMs could hold so much text information that an entire’s encyclopedia’s worth of text could be contained on a single CD-ROM. The entire 21-volume, 65 lb., 9,000,000 word Grolier’s encyclopedia could fit onto a single compact disc. This was mind blowing at the time. To give you a little perspective, a CD-ROM in the 1980’s could hold about 700 MB of data. A fairly standard and cheaply available 16 GB USB stick contains 23x the data storage of a single CD-ROM. After the CD-ROM came some exciting and groundbreaking technological advances in the world of digital publishing...

Here’s looking at you, 1990’s.

Welcome to 1992! The Toronto Blue Jays baseball team won the World Series championship, Home Alone 2: Lost In New York premiered in theatres and the PDF was born. It was a BIG year.

A PDF (or Portable Document Format) is a cross-platform compatible, now staple of the modern digital publishing world. Appearing the same on any device or operating system makes information exchange seamless and elevates the PDF to modern marvel. They are easy to make, send and receive.

The year prior in 1991, the co-founder of Adobe, Dr. John Warnock, began ‘The Camelot Project’ with the aim of anyone being able to capture documents from any application and send it electronically anywhere and print them on any machine. By 1992, the format became a reality when the PDF was created.

PDF is now an open standard, overseen by the International Standards Organization (ISO). PDF’s can contain all sorts of information: text, images, links, buttons, form fields, audio, video and even business logic.

PDF files are also the defacto standard for the printing industry. Which means, whether a publication is intended for digital publication, traditional publication or both, PDF files will play an integral role in the publishing process.

There are many different PDF Standards, all with different purposes. Some of the most common in the publishing world include: PDF/X (for print and creative professionals because high resolution images, fonts and colour profiles are embedded within), PDF/UA (designed to aid in accessibility and readability for those who use screen readers - the UA stands for ‘Universal Access’) and PDF/VT (also for print professionals, but specifically for those who use PDFs to customize information, such as information in bank statements or marketing material - the VT stands for variable and transactional), among others.

We’re going to hear more about PDFs and what the future holds for this universal standard in digital publishing in a later episode.

Let’s jump ahead two years to 1994 when the first online diary (later coined ‘Blog’) is written.

Then-student, Justin Hall, is credited with creating the first blog, Links.net (a URL still in existence and added to regularly!). He used this platform to publish his writing and provide links to content. His first entry in January 1994 read:

Welcome to my first attempt at Hypertext

Howdy, this is twenty-first century computing... (Is it worth our patience?) I'm publishing this, and I guess you're readin' this, in part to figure that out, huh?

High Stylin' on the Wurld Wyde Webb

This is a Hypertext server using MacHTTP v1.2.3 running on a Powerbook 180 w/ 8 RAM and a 120 HD. It is currently being broadcast from the depths of Willets, a dorm nestled in the shrubbery here at Swarthmore College in Swarthmore, Pennisylvania.

I’m getting a little ahead of myself timeline wise, but I want to share with you a particularly interesting example of success in the blogosphere. Julie Powell was a modern New York woman working in a job she hated, trying to navigate life and reignite her passion for life. In trying to figure this out, Julie decided to embark on a journey of cooking every one of the 524 recipes in Julia Child’s cookbook, Mastering the Art of French Cooking; a lofty goal for someone who “had never eaten an egg before she tackled Oeufs a la Fondue de Fromage”. She decided to call her experiment The Julie/Julia Project and chronicle her adventure on a blog, beginning in 2002.

Julie’s blog grew a large following and it was featured in an article in The New York Times which really put it on the map. Publishing company, Little, Brown and Company offered Julie an opportunity to develop a book about her experience. The resulting book, published in 2005, was called Julie and Julia: 365 Days, 542 Recipes, 1 Tiny Apartment Kitchen. That’s a pretty big deal for a little blog. An even bigger deal was when the film rights were purchased on the story was adapted into a feature film starring Amy Adams as Julie and Meryl (Freaking!) Streep as Julia Child. The film was released in 2009. What a ride! (As an aside, who would you play you in a film about your life? Me? Tina Fey, please.) While the real Julia Child wasn’t a fan of Julie’s blog saying “I don’t think she’s a serious cook”, Julie was awarded an honorary diploma from Le Cordon Bleu, which is the same cooking school that Julia Child graduated from in 1951. What a blog-tastic story!

In the next episode of this podcast, looking at digital publishing’s present, we’ll hear about another blog turned movie blockbuster success.

With that little skip ahead behind us, let’s now officially head Into the new millennium!

The Year is 2000.

The world’s banking systems didn’t set themselves back 100 years and Will Smith may have released the greatest song of our time (in my humble opinion). Google, founded only two years earlier, had a very ambitious goal: they aimed to digitize the world’s books.

In an absolutely incredible 2017 article by The Atlantic, entitled Torching the Modern-Day Library of Alexandria: Somewhere at Google there is a database containing 25 million books and nobody is allowed to read them. Author James Somers details the long-held dream of a “universal library”; one-click access to a collection of nearly every book that has ever been published that’s out of print available for free at terminals in every local library the world over. That was the dream. It wasn’t the capabilities of the technology - hardware or software - that stopped this dream from becoming a reality, instead it was the judicial system (and scholars, archivists and librarians standing opposite Google in the courtroom) that saw the modern-day library of Alexandria go up in virtual flames.

“Project Ocean” (Google’s effort to scan every book in the world) began in 2002 when Google co-founder, Larry Page, sat down with a 300-page book and a metronome to figure out how long it would take to scan a book. It took 40 minutes. And at that rate, scanning the 7 million volumes just contained within the University of Michigan’s collection would take… about a thousand years. Page told the university that they could do it in six years.

With this lofty challenge ahead of them, semi trucks filled with books showed up at Google scanning centres every day of the week. It took just over a decade, but Google outpaced Page’s ambitious previous goal, having scanned almost 25 million books in just over 10 years time, costing the company approximately $400 million dollars.

How did they do it?

The article describes one Google scanning site, a converted office building on Google’s Mountain View campus as custom-made scanning/photographing stations arranged in rows, each with a human operator and each station could digitize 1,000 pages per hour. Each station contained four cameras, two pointed at each half of the book and technology that would compensate for the curvature of the book. Pages were turned by hand and the cameras would fire when the operator manually pressed a foot pedal. Software employed de-warping algorithms to adjust pages after they were scanned, speeding up the scanning process. 50 full-time software engineers were assigned to the task, creating OCR (optical character recognition) software that turned raw images into text, creating the de-warping, colour-correction and contrast-adjustment instructions, as well as detect illustrations and diagrams, page numbers and turn footnotes into real citations.

In a perfect marriage of established and new-fangled digital publishing technology, in August 2010, Google released a blog post that announced that there were 129,864,880 books in the world… and the company was going to scan them all.

Holy smokes.

But it all went up in smoke when, as James Somers explains: “What happened was complicated but how it started was simple: Google did that thing where you ask for forgiveness rather than permission, and forgiveness was not forthcoming. Upon hearing that Google was taking millions of books out of libraries, scanning them, and returning them as if nothing had happened, authors and publishers filed suit against the company, alleging, as the authors put it simply in their initial complaint, “massive copyright infringement.””

The result is a long, complicated series of lawsuits and statutory damages for “willful infringement” of a copyright that could run as high as $150,000 for each work infringed upon (tens of millions of books times $150,000… carry the one…) they were looking at trillions of dollars in potential liability… yikes. It was a complicated and important series of lawsuits for everyone affected by copyright. I encourage you to check out the article for yourself. It’s a long one, but an amazing account of one of the most significant moments in the history of digital publishing and the enforcement of copyright law.

At the end of the article, the author, James Somers, says this:

“I asked someone who used to have that job, what would it take to make the books viewable in full to everybody? I wanted to know how hard it would have been to unlock them. What’s standing between us and a digital public library of 25 million volumes?

You’d get in a lot of trouble, they said, but all you’d have to do, more or less, is write a single database query. You’d flip some access control bits from off to on. It might take a few minutes for the command to propagate.”

Incredible.

While there was LOTS going on with digital publishing in the early 2000’s, there are two more notable events I’ll mention before we round out our look at digital publishing’s past.

The next took place in 2007, when the EPUB 2.0 format (that enabled a unifying standard for publishing ebooks) was established.

EPUB is a standardized, open source file format managed by the IDPF (international Digital Publishing Forum) and it is what you see when you open up most eBooks on a variety of devices.

Without getting too technical, before EPUB 2.0 there was the original OEB 1.0 (Open EBook) specification that was developed in 1999. This consisted of a single HTML file an OPF (open package format) file contained within a folder. The advent of EPUB is so important because before this standard, some publishers had to make six different versions of every title: OEB, Microsoft LIT, Palm, Sony, Mobipocket and PDF.

Additionally, timing played a role. Just a few months before the EPUB 2.0 standard, Apple released the first iPhone and just a few months after, Amazon released the Kindle. Here were two major players hedging their bets in the online world of digital books.

There have been a variety of updates to EPUB since then, some more useful and desired than others. This includes the EPUB 3 standard released 10 years ago in 2011 based on HTML5 with audio and video capability.

EPUB files can be read on different types of electronic devices such as e-book readers, tablets and phones. There are reflowable EPUB documents that allow text to be enlarged or reduced in size, as well as for the typeface and background colour to be changed). This is ideal for long text documents in the trade book publishing market. There are also fixed-layout EPUB documents that allow readers to zoom in and out of a page, but the text doesn’t reflow. This is typically used for publications like children’s books and cookbooks whose image and illustration-heavy design elements can’t easily be broken across pages.

My Final Thoughts on Digital Publishing’s Past

There is A LOT to cover in the world of digital publishing’s past and it’s nearly impossible to cover each and every detail and important moment and technology, many of which intersect, build on one another and collide in the first 40 years of digital publishing.

But in a world where it feels like digital is taking over the world in many domains, can print and digital happily coexist in the publishing realm?

In a blog post I wrote in February 2014 entitled Print & Digital Publishing: Long Lost Lovers (a digitally published piece about digital publishing!), here’s what my seven years ago self had to say about the world of print publishing meeting the world of digital publishing.

Many publishing theorists say that print is a dying medium. Ebooks and digital publishing technologies are the way of the future and the future’s here now (so watch your back, print!). I have a different take on traditional versus digital publishing, which is to say that they are meant to be together.

In my scope of work, I teach students traditional publishing means, including printing technology and preparing files for printed output. As a graduate student, I opt for the ePub version of course textbooks whenever I have a choice and I spend the majority of my time expressing my views in pixels, not on paper.

Furthermore, I love nothing more than the tactility of fine paper and the authentic smell of an old book that ties the present to the past; but I equally love dynamic search capabilities, hyperlinking and the ability to choose my typeface available with a digital book, tying the present to the future. You may call me a walking contradiction, but I believe that there are times when printed mediums work better and there are times when digital platforms simply work better. This long lost love affair between print and digital publishing extends much further than their nostalgia and functionality. I don’t think they simply want each other, so much as they sometimes need each other too.

To round out the idea that traditional printed publishing and modern digital publishing are, indeed, meant to be together (in the same way that the two protagonists in a Nicholas Sparks book are meant to be together), here’s an excerpt from an article I wrote for Graphic Arts Magazine in November 2011 about the new frontiers of digital archiving and preserving important printed matter, entitled Print Through the Ages.

Entire new disciplines of study have been established from the convergence of literature, printing and digitization: Digital Humanists. In an age where the disciplines of technology and humanity have come face to face, Digital Humanists are creating new possibilities for archiving printed matter and enabling accessibility to more people than ever before.

I had the pleasure of listening to Cara Leitch and Julie Meloni discuss “The Future of the History of the Book” at the University of Victoria, British Columbia. Cara is a Researcher who works in UVic’s Electronic Textual Cultures Lab (ETCL), a facility in which data-harvesting, textual content analysis and document encoding takes place. Cara’s focus is on digitizing 19th Century texts and social networking. Julie has also worked in the ETCL with a focus on Information Management. They are both book lovers and consider themselves “Digital Humanists”. Through their work, they rediscover the meaning of texts by using technology, comparing with other texts and annotating. Through using OCR software, as well as crowdsourcing, Digital Humanists preserve texts of the past for future generations to enjoy.

It is important to understand how printed documents and books of the past will be preserved for future generations. Environmental conditions, such as temperature, relative humidity and light all play a role in the deterioration of paper and printed matter. Digitizing is the answer for increasing longevity, accessibility, and broader use of a given document. Print advocates and old book lovers the world over have always faced the challenge of preserving books but now have a viable resource to preserve information long into the foreseeable future. Even our national memory institution, Library and Archives Canada, understands the importance of the momentous shift in archiving processes and keeping up with technology to stay relevant. Digitization of old printed matter has created new jobs and research opportunities, as Digital Humanists use technology to learn about printed documents in whole new ways.

So a strong case could be made that traditional and digital publishing both have their places in this world and work pretty well together, if I do say so myself. That’s what makes them the perfect couple; a modern day Notebook.

The past is behind us. So now tuck yourself back into bed and you will soon learn all about digital publishing’s present.

Music and Sound Effects:

Podington Bear - The Gall

1970’s - DDmyzik

1980’s - Beetlemuse

1990’s - monkeyman535

Happy New Year - stomachache

2000’s - blaerg

Podington Bear - Kitty in the Window

Podington Bear - Smooth Actor

Podington Bear - The Gall

Talk Paper Scissors Theme Music: Retro Quirky Upbeat Funk by Lewis Sound Production via Audio Jungle

Boat Origami Photo: Boat Origami Photo by Alex on Unsplash