Wednesday 1 April 2009

Why Inference is Better than Hacking

Large scale knowledge bases are inherently more flexible than dedicated databases and specific software. In traditional software, a task is preconceived, an algorithm to perform that task – and no other – is conceived and implemented, along with a task-specific data representation, and data is collected and maintained in compliance with that representation. With large scale knowledge bases, both the data to be processed, and the means to apply data to solving tasks, generally, are stored in a single logical representation. It is then the responsibility of the inference system, not a programmer, both to identify the steps required to perform the task, and to identify the required data and transform it into the right form for the required processing. This method of computation is fundamentally more powerful than manual programming, just as the invention of stored-programme computation in the 1940s was fundamentally more powerful than the dedicated calculators and patch-panel setups that preceded it. But, just as stored-programme computing required a huge jump in the complexity (and memory) of early computing devices, knowledge-based computing imposes requirements that are only being satisfied after sixty years of theoretical and engineering  advances. Some of these requirements are physical: to simultaneously search for solution methods, solutions, data transformations, and data, computers must be very powerful, and have very large storage. But many of the requirements have been conceptual: we have needed to assemble enough data, in computer-understandable form, to allow solutions in principle; we have needed to assemble enough background knowledge about tasks and data transformations to allow a solution to be findable; and we have needed to develop reasoning techniques that allow a solution to be found.

The Internet generally, and Web2.0 and the Semantic Web in particular, are providing the seeds of a solution to the data problem, as are specific high-quality KBs such as UMLS (Bodenreider, 2004) and the AKB (Deaton et al, 2005). The laborious hand assembly of the existing Cyc KB (Lenat, 1995) was required to provide an inferentially productive basis for reasoning over these vastly larger amounts of knowledge.