Friday 5 January 2018

OrcIDs are used by arXiv to uniquely identify authors. I'm not particularly indistinguishable, as named entities go, but it still seems like a good idea.

Monday 29 December 2014

Too Many Languages; Too Much Code; Too Widely Connected

The Cambrian explosion of widely used programming languages, D, Python, Scala, Go, Swift, R, Lua, and so forth, is prima facie evidence that computers must start to write code themselves; People alone can't handle the complexity. One approach to self programming is automated inference based on a logical language (e.g. Cyc from Cycorp). Others are problem decomposition, abstraction and chunking (e.g. Soar) and genetic programming. Finally, a very new, one is the "Neural Turing Machine". Imagine the power of combining these approaches.

Saturday 24 December 2011

No Limits to Intelligence

An article earlier in the year in Scientific American suggested that there may be inherent limits on human intelligence, far beyond those imposed by the process of birth.

One intriguing suggestion in the article is that functional specialisation in neuroanatomy may not be driven by algorithmic requirements (i.e. by partitioning the task being the “right way” to go about, for example, seeing), but by connectivity limitations: functions cluster together because it’s not possible to  maintain enough longer-distance connections. The article also pointed out that, by some measures at least, even across humans, intelligence is predicted by the speed of neural communication, controlled, roughly, by the number of neural links traversed by a signal: that long distance communication, when it is possible, is important.

The main point, however, of the article is that humans are unlikely to be able to evolve to get much more intelligent, but that social connections and technology, outside the brain, may have made that unimportant.

However, the evolutionary limit is a pretty specific thing: it’s a constraint on what a type of organism can become, given what it has been previously. For us as humans, it means where we could get to as a species,  given no
"non human" intermediate states, and large, but not astronomical, amounts of time.  But we are no longer constrained in the same ways. As a simple example, it seems likely that the maximisation of the size of the human birth canal is the result of a trade-off against the survival costs of birth earlier during brain growth. But that constraint is now released: infants can be safely born far earlier, after which their brains could grow as big as they like.

Many of the other constraints (transmission speed, neuron size, axon diameter) the article discusses could be also released if we needed them to be, since we no longer need to respect the path dependencies (e.g. we now use myelinated axons for rapid signalling, because we started with axons; but in the reachable future, we will almost certainly be able to engineer multiplexed metal paths, or even multiplexed optical paths (see the recent experiments on inducing lasing inside cells, under conditions that could probably be made to hold "naturally")). This would simultaneously reduce the limitations on wiring density (since theses “neoaxons” could be far thinner than the current  hundreds to thousands of nanometres ( IC transistors are around 20 nm), and increase transmission speed (by around 2million times).

Networking is also badly limited, as you have heard me say, by our IO  channel bandwidth. My guess is that advancement will come both by optimising networking and by reengineering brains, using both biology and non-biological techniques. Imagine, forSketch24613245 example, the architecture I’ve sketched at the right, in which a simplified brain, reduced to two sheets of neurons is provided with a laser optical interconnect instead of axons. The sheets in question (if wolfram alpha and I are correct) would be roughly 5 meters on a side. Assuming that they’re on two sides of a five metre cube, that gives an worst case interconnect delay of 58ns. The worst case delay in a brain, even if there were a directly connected myelinated axon, is more like 2ms. This one (somewhat ungainly – we are talking about putting your brain in a 5m cube) change provides a transmission speed up of about 35000 times.

In short, even for human-like intelligence, it isn’t clear that natural human evolution matters any more. It's certainly happening, but it's so slow that it won't have any appreciable effect compared with the much faster processes of brain augmentation that are now just beginning to occur. The same argument, on a faster timescale, is also why we (and our computer collaborators) are simply going to win against disease-causing organisms, within the next 50 years.

Tuesday 23 November 2010

A Wikipedia that AI systems can safely learn from

Below is a letter I wrote in reply to a fundraising email from Jimmy Wales, for Wikipedia.  I heartily support the wonderful effect of Wikipedia, and was involved in nupedia, from whose ashes Wikipedia rose phoenix-like. However, I think that an inadvertent but serious error was made at Wikipedia’s founding – the adoption of GFDL instead of a truly free, CC 0 style license. I believe that the SA licence unnecessarily interferes with the freedom of Wikipedia users to use the content, and that that is regrettable. In the case of human users, though, the effect is mitigated by the difficulty of establishing what of their subsequent intellectual work is a derivative work of Wikipedia. For AI systems, though, this SA requirement implies a level of invasion of data privacy (the systems’ but also that of people they interact with) that is wholly unconscionable. The letter suggests a way to slowly remedy this, by building up, gradually, a truly free portion of Wikipedia.  I hope it is adopted.


I think this appeal will be effective. However, its effectiveness for me is reduced by the fact that it's not entirely true that "you can use the information in Wikipedia any way you want".  You cannot combine it with other information without infecting that combination with a "Share Alike" obligation that you are obliged to impose on others.

If you were able to persuade the Foundation to give creators of new articles a choice of creating them under a pure CC  licence, with no SA, and if it was permissible to create parallel articles, without reuse of SA content, under the truly FREE CC 0 licence, then Wikipedia would be truly free, as in freedom. And, if that happens, I will make Wikipedia my main object of charity, and will encourage others to do so. 

If not, perhaps you could modify the language in the appeal to be more legally accurate. However, even as flawed and unfree as it is, it remains at present, a wonderful thing, and I will probably continue to donate, a little reluctantly.”

Friday 19 March 2010

Semantic Data

Part of what we’ve been trying to do with the LarKC project is to scale up AI to tackle real problems. One part of that is supporting the storage of vast amounts of inferentially productive knowledge. The SemData initiative is trying to do just that.




Workshop on Semantic Data Management (SemData)

At the 36th International Conference on Very Large Data Bases

Singapore: 13 - 17 Sept 2010, Grand Copthorne Waterfront Hotel

The Semantic Web represents the next generation Web of Data, where information
is published and interlinked in order to facilitate the exploitation of its
structure and meaning for both humans and machines. Semantic Web applications
require database management systems for the handling of structured data, taking
into consideration the models used to represent semantics. To foster the
realization of the Semantic Web, the World Wide Web Consortium (W3C) developed
a set of metadata models, ontology models, and query languages. Today, most of
the Semantic Web repositories are database engines, which store data
represented in RDF, support SPARQL queries, and can interpret schemata and
ontologies represented in RDFS and OWL. We are thus at the point where the
adoption of semantic technologies is growing. However, these technologies often
appear to be immature, and tend to be too expensive or risky to deploy in real
business. Solid data management layer concepts, architectures, and tools are
important to everyone in the semantic ecosystem, and creating them requires a
strong community, with a critical mass of involvement.

Semantic data management refers to a range of techniques for the manipulation
and usage of data based on its meaning. It enables sustainable solutions for a
range of IT environments, where the usage of today's mainstream technology is
either inefficient or entirely unfeasible: enterprise data integration, life
science research, data sharing in SaaS architectures, querying linked data on
the Web. In a nutshell, semantic data management fosters the economy of
knowledge, facilitating more comprehensive usage of larger scale and more
complex datasets at lower cost.

The goal of the SemData workshop is to provide a platform for the discussion
and investigation of various aspects related to semantic databases and data
management in the large. Many of the semantic data management challenges
cumulate in the need for scalable and performing database solutions for
semantic data, a building block that runs largely behind comparable
non-semantic technologies. In order to make semantic technologies take on the
targeted market share, it is indispensable that technological progress allows
semantic repositories to reach near performance parity with some of the best
RDBMS solutions without having to omit the advantages of a higher query
expressivity compared to basic key-value stores, or the higher schema
flexibility compared to the relational model. It is time that one must no
longer pay a heavy price in terms of longer run times or more expensive
equipment for profiting from the flexibility of the generic physical model
underlying the semantic graph-based structures of RDF. We also recognize that
there will always be a burden with more flexibility. Hence, the goal is to
minimize the drawbacks and maximize the advantages of the semantic RDF-minded

The SemData workshop seeks trans-disciplinary expert discussions on issues such
as semantic repositories, their virtualization and distribution, and
interoperability with related database solutions such as relational, XML, graph
databases or others. We thus welcome original academia and industry papers or
project descriptions that propose innovative approaches for semantic data
management in the large, with a particular focus on semantic database solutions
including their virtualization and distribution.

The topics of interest of this workshop include but are not limited to:
* semantic repositories and databases: storage facilities for semantic artifacts,
RDF repositories, reasoning supported data management infrastructures, data
base schemas optimized for semantic data, indexing structures, storage density
and performance improvements
* distribution, interoperability, and benchmarking: "Classical" semantic storage
subjects: distributed repositories (data partitioning, replication, and
federation); interoperability and integration with RDBMS; performance
evaluation and benchmarking
* virtualized semantic repositories: identification and composition of (fragments
of) datasets in a manner, abstracting the applications from the specific setup
of the data management service (e.g. local vs. remote and distribution)
* semantic data bus: a communication layer bridging the gap between the data
layer and the application layer
* embedded data processing: "move the processing close to the data" mechanisms,
allowing application-specific data processing to be performed within the
semantic repository, e.g. stored procedures and engine extension APIs
* adaptive indexing and multi-modal retrieval: strategies for dynamic
materialization towards specific data- and query-patterns; indexing structures
for specific types of data and queries (FTS, co-occurrence, concordance,
temporal, spatial)

Paper Submission Deadline May 21, 2010
Acceptance Notification June 21, 2010
Camera Ready July 11, 2010
SemData Workshop September 17, 2010

The papers must be submitted in the VLDB format; please see Submissions that do not comply with the formatting
detailed for VLDB will be rejected without review. The paper length is limited
to 6 pages.

Karl Aberer
Distributed Information Systems Laboratory LSIR
Ecole Politechnique Federale de Lausanne, Switzerland

Reto Krummenacher
Semantic Technology Institute STI
University of Innsbruck, Austria

Atanas Kiryakov
Ontotext AD, Sofia, Bulgaria

Rajaraman Kanagasabai
Data Mining Department
Institute for Infocomm Research, Singapore

Phone: +43 (0)512 507 6452
Fax: +43 (0)512 507 94906452

Reblog this post [with Zemanta]

Thursday 19 November 2009

Society, eGovernment, Web 3.0 and Us

Location of Sweden within Europe and the Europ...

Image via Wikipedia

Today I took part in a panel discussion with Munir Ismet, Andy Mulholland & Anthony Williams at the 5th Ministerial eGovernment Conference 2009, in Malmö Sweden.  The talk made the case that web2.0 crowd-sourcing depends on a very limited resource: human attention and communication, and that only by harnessing the collaborative work of people and intelligent computers can we make the systems that support our societies really work.

In researching the talk, I did a quick calculation:  There are 7 billion people, more or less in the world. We can speak about 4 words per second, and each work has about 5 characters (let’s say 16 bits each). That’s 320 bits per second. So, all of us, maxing out our output bandwidth (blogging is slower, can can’t really be done while speaking), have an aggregate output data-rate of about 2.25 Tbps. Sounds like a lot, doesn’t it, but it’s only a quarter of the bandwidth of one, single, Dense Wavelength Division Multiplexed fibre optic cable (a fast one). 

And yet our systems, from which we derive enormous benefit, saturate the capacity of a very, very large number of optical fibres – we – all of us, all together – cannot possibly monitor all of this, and of course we shouldn’t try. But inflexible computer systems can’t either – at the base, inflexible  computation outsources flexibility to human beings, and we’re going to run out of people to do that, too.

Here’s the conference programme ( and here’s a link to my talk (

Reblog this post [with Zemanta]

Tuesday 18 August 2009

Why we need Human-Computer Collaboration (I)

EDSAC was one of the first computers to implem...
Image via Wikipedia
In an important sense, AI is the reason we humans invented computers in the first place. At first the motivation for work on AI was reflective – a desire to better understand the way we think, by looking in a mirror – and mixed with the Everest motive: “because it’s there”; because we might be able to. Of course, actually building computers, which in the 1940s could hardly be expected to run AI software, was driven by the urgent need to break codes and defeat fascism in WWII.
Now, though, we’re facing a situation where we’ve got computers, and we actually need AI. Our society is deeply interconnected: almost everything we do depends on what other people do; our systems depend on other systems; our rules depend on other rules. And there’s no easy way  to reduce this interconnectedness; we’ve set things up this way because it allows us to live richer lives. And, now, there are so many of us humans that our very lives depend on the resource-use efficiencies this interconnection brings now, and will bring in the future.
There’s a risk here, though: it’s not clear that these systems we’ve built are stable; the 2009 financial crisis and the H1N1 pandemic are only proximate examples of barely, and only partially, averted disasters of global interconnection, with global effect. World War I, the Spanish Flu, the great depression, the decimation of the Americas during colonization, and the Spanish Inquisition, are examples of network disasters in earlier, far less interdependent, eras.
At a more personal level, we’re faced with looming networked disasters:  failing to notice that your software implements one of millions of patented ideas, or has inadvertently included GPL code, can destroy your livelihood; failure to track, say, the purchase time of a stock lot, can lead to a failed tax audit; failure to fully understand a mortgage contract can cost one’s house, and along with it years worth of a modest income; have your dangan lost, in China, or gain a felony conviction in the US, and lose any reasonable prospect of a fulfilling career; fail to pay an insurance premium in a country without universal care and lose your heath, or possibly your life. 
Systems that can lurch into disaster, in this way, are unstable. Some may be inherently and irreducibly unstable – and those systems we should strive to avoid completely. Others can be kept stable by active control. The problem is that as the systems become faster and more efficient, to our benefit, they also appear to be becoming more unstable, and, since many of the systems are supported by giga-flop computing and speed-of-light communications, the instability can be manifested at super-human speeds, and with super-human complexity.
The term “super-human” is not used here for effect. It’s quite literal. Human beings have limitations, the most important being output bandwidth and memorization speed. These limitations mean that there are problems so complex that human beings – alone or communicating simply cannot solve. Even all 6 billion of us. In other respects, including raw computational power and sensory processing, we humans far out-compute even the largest supercomputers. What we need to maintain the stability and increase the effectiveness of our systems, is super-human computers, and human super-computers, working together. What we need is a new, AI and psychology-based field I’ve begun to call ‘human-computer collaboration’.
Reblog this post [with Zemanta]