Saturday, October 15, 2005

Longhorn and longtooth

Some thoughts on WinFX/Longhorn/Avalon/Indigo/WinFS/WCF/Vista, what I call Microsoft "Codename" platform. I largely ignored it because most of the information coming from Microsoft starts with "1. This will boost your productivity enormously." But during the past few days I saw some interesting stuff here at Microsoft Sinergija 05 conference in Belgrade. It all looks like a good next step in the right direction, although I believe that the biggest stone around developers' necks today is not the lack of solutions in these new areas they address, but rather the deficiencies in existing areas. Typical Microsoft? Giving us a Windows 3 "multimedia" graphical interface based on an OS with 8.3 filenames? Could be. The time is ripe to clean up the computing environments. Dump the old paradigms and technologies and replace them with new ones, not just add new on top of the old.

For instance, in .Net 3.0, LINQ will make it easier to query a relational database. But I think the real problem here is the fact that the database is relational in the first place? It's an anachronism. If you create a UML model with inheritance and all the beautiful object-oriented stuff, you will have to cripple it by putting it into a 1960'ies storage technology called a relational database. How do you implement inheritance in a relational database? Can you even simulate it? It is 21st century, we should by now be using remoting to persist objects directly into the database and getting real collections returned from our queries. We should create classes instead of database tables, use database-side object methods instead of stored procedures, handle in-server object events instead of triggers.

And this is just one of the things that outlived their usefulness. How about file storage for another? One more impedance mismatch, the threshold between the object world and file systems: you can serialize objects into a file, easily done today in .Net or Java. But that's the end of your options. Can you find an object in a file? Not unless you write your own code for it. Can you query the contents of a file? To quote Michael Palin: "Er, not as such". On the other hand, Ms Access can query the data inside its files. SQL server can, too. These files are meant for internal use by the application (the user or the programmer need not have much knowledge of their existence), and that is the way it should be. So why doesn't Microsoft Word put all of its documents into a single database? Well, you wouldn't be able to copy them to floppy disks, delete them etc. All of these operations would have to be implemented by Word. Or, to put it another way: the infrastructure doesn't support it. Still, the data files are nothing but a rudimentary replacement for databases. And we're storing data in local files because we don't have local databases.

So, what's stopping us from having a small database server as an integral part of the OS? To have structured data dumped inside of it, not scattered all over the hard disk, mixed with various DLL, EXE and other system files? To be able to query it. And query it not just to find "documents containing the word XYZ" but to find "paragraphs containing italicized word XYZ", and find it not only inside Word documents but also Excel, PDF etc files - and doing it from a single query.

It's not hard to imagine, and Microsoft for one seems to be near the right idea: Office documents can now be saved as XML. If we dumped them into an XML database, we'd be able to do most of the above. Think of the possible uses: I could create notes in my Word file and then quote parts of it in a Powerpoint presentation, using references to (instead of copying) the original text so that when the original is updated so is the PPT. I could scribble additional comments inside the PPT, then create a filtered view of it (analogous to a database view) that says "keep the structure but eliminate the scribbled comments". And then set up a replication mechanism (I'd call it e-mail ;)) to have that view's data replicated (sent) to whoever I want to. This data "linking and embedding" idea is also nothing new: it looks like the things OLE always promised.

What's really important to note here is that most of the required technology is really here (or at least near) and most mechanisms are tested in practice, only not implemented everywhere we need them. We could still use OLE to embed data, just change it to store data in an object/XML database instead of a file. We could use remoting (which is still being developed but was proved to work well) to access data in a database. We'd use XQuery to query the data, maybe XSD to describe it. The next thing to do is try and replace the old technologies with new ones. Having a clean and unified environment for developers would mean 1. an enormous boost in productivity.