LONG TERM DOCUMENT STORAGE

From the earliest days of computers there has been a lot of concern about the long term problems of digital storage. IBM mainframes use a different character encoding that was devised before the ASCII standard was established. A simple lookup table can translate back and forth.

Unfortunately machines have been designed with so many disparate solutions that it became a real issue. Early home computers like the Tandy Color Computer and the Commodore 64 machines could not read each other’s disks etc. The IBM PC was not able to read them either.

Eventually the IBM PC with its army of clones and lower prices eventually overwhelmed other platforms. This left only game consoles which were not of material concern.

Even with the PC, text editors used a wide range of formatting solutions. This forced developers to build modules to import rival formats.

Once Windows shipped with the graphical user interface, it immediately complicated matters. It was not until Ofice 2010 that Microsoft finally abandoned the old binary file formats in favor of the more readable XML format. By this time commercial rivals were already long ago marginalized.

The rise of open source software was a new solution to long term document storage. Now the programs themselves were documents. Open source operating systems like LInux have seen extensive corporate support. Servers for open source projects have evolved over time.

Hardware solutions are less of a concern. Storage solutions come and go. Tapes and hard disks have been around for 50 years but the growth in solid state storage has made some inroads. Flash storage has fallen in price so much that it has been able to move into new markets. Other solutions have come and gone, over time the trend for lower costs has continued.

Google has been scanning every book in sight. Older books that are out of copyright are not at issue. Courts ruled that Google can also scan copyrighted books. Digitized books can be converted to text which can then be spell checked or revised much like this article. Google books has become a modern library on the internet. Amazon is partnered with Google for their Kindle platform.

HTML is a rich text format. This document itself is HTML which exemplifies the very essence of document durability. The entire site exists inside a vast data center with enormous storage resources. The document editor is nominally a word processor. There is a code view for adding HTML tables or video clips etc. The document editor has some cut and paste capability however some manual work is needed occasionally.

Using HTML and a web host in a datacenter, its possible to edit and publish a document using any machine equipped with a browser. This ranges from a telephone to a tablet computer, a laptop machine and even a desktop machine. With swarms of datacenters, the documents are very secure due to replication across nations.

The web host is discrete from the database host. Graphics and charts are another entity. Each is a separate entity as a service. This is how the idea of cloud came to be.

This site uses Azure which is owned by Microsoft. We hope that the corporation can survive and that the long term durability of the site will continue to exist.

HTML has had many revisions which have made it more complicated for browsers which have to be able to parse content for rendering. Some content that was posted to this site as well as others we operate originally were HTML 2 or HTML 3.2 depending on the vintage of the article.