I spent a few hours last night trying to understand the meaning of ISBN. It was a strange and fascinating journey that left me somewhat enlightened although mostly just even more confused. Just further proof, I guess, that life is a little bit messier than it should be.
Most books are currently identified by a 10 digit code known as the Internation Standard Book Number, or ISBN. This 10 digit code actually consists of 9 digits of information and 1 check digit to verify that the number was entered without errors. The 9 digits of information can themselves be broken down into three groups: a region code, a publisher code, and an item code. The length of each part is variable. Some regions and publishers are given larger ranges than others An international standards organization and national agents are responsible for assigning ranges to publishers. The individual publishers can then assign codes within their block to books that they publish. There are rules about what needs a code and how much a book can change before it must get a new code. Sometimes movies are assigned ISBNs under the theory, I suppose, that they are just an alternative presentation of a book.
While the book industry was developing its system, retail marketers had adopted the UPC code to identify products for sale. There are several different forms of UPC code, but the most widely known is the 12 digit UPC A code. Like the ISBN, the UPC code can be broken down into 11 information digits and 1 check digit. The data contains region, manufacturer, and item regions. Sometimes the code is extended with an additional 5 digit code. Are UPC codes unique? Well, not necessarily. Looking through my stack of comic books I can see that the issue number for the series appears in extended code (a code of 12511, for example, would indicate issue 125). However, while some series had different 12 digit codes, some did not.
To facilitate the sale of books in retail outlets like grocery stores or drug stores, mass market paperbacks will also be identified by a UPC code. Converting from a UPC code back into an ISBN really doesn't seem practical. Glancing at one of my mass market paperbacks, it looks like the five digit extended code contains the last half of the ISBN code. A lookup might be used to convert the UPC manufacturer code into the publisher code and reassemble the ISBN, but I've found other books that don't have the extended code.
While the UPC code became popular in the US, another code was being developed internationally. The creator of the UPC code also worked on the EAN, or European Article Number, a 13 digit code that can be used throughout the world. It too works very much like the ISBN and UPC code. In fact, by simply adding a 0 to the start of a 12 digit UPC code, you get a valid 13 digit EAN. EAN codes starting with different numbers identify products from different countries. One particularly interesting country is "Bookland". Bookland is the magical country where all books come from. The Bookland EAN is simply made of the country code "978", the first 9 digits of the book's ISBN, and a new check digit calculated according to the EAN rules. The EAN codes also can have a 5 digit supplement. For books, this often represents the suggested list price. A code of 90000 is a null code, meaning no price is given. However a code of 50699 would led the reader know that the book had a retail price of $6.99 (the initial 5 indicates a currency in dollars). Of course, this scheme breaks down for books over $100 and so something else will no doubt be needed as inflation raises prices over the years.
The 10 digit ISBN code can be a bit restrictive and the numbering system is coming under a bit of pressure. To help solve this, we are currently in the beginning of a transition to 13 digit ISBN codes. The new ISBN codes will actually be EAN codes. Fortunately, the transition should be relatively straightforward. All currently existing 10 digit ISBNs are simply converted into the 13 digit EAN Bookland code. Once the transition is complete--scheduled for 2007--new books may be assigned code in the 979 range and conversion back to 10 digit ISBN codes will no longer be possible.
And this isn't even the end of things. The 13 digit EAN code is starting to reach some of its limitations and there is already talk of extending it into a 14 digit code. At some point, shouldn't someone realize that adding one digit at a time isn't a scalable solution? When will we be ready to just bite the bullet and adopt a full 128 bit GUID to identify products....
I also spent some time reading about bar codes are constructed to represent the UPC/EAN codes in a machine readable format. But that is an entirely different story.


