Existing Virtual Assistants include Siri, Google Now, and Cortana (others?). But the idea has been in around in computing and science fiction since forever (ie: the 50s).
In my opinion, the current commercial vision of the Virtual Assistant is deeply flawed. This is a consequence of the current consumer computing environment, which in recent years has become incredibly centralised to a few powerful companies, and very closed, dominated by walled gardens that don’t allow programmatic access to the important parts of their platforms.
Here’s a cool vision of Virtual Assistants from Apple in the 80s
Below, I talk about these flaws as I see them, and about what an alternative type of agent, a “Virtual Advocate”, would do instead.
Briefly, a Virtual Advocate will be like a Virtual Assistant in many ways. You talk to it in natural language, it responds in natural language. You ask it high level things: “Arrange a meeting with George next Wednesday”, “Are there any changes to the wikipedia page on Ferrets since yesterday?”, “Cut off my subscription to Fnergling Monthly”. But it is based on open standards, talks to an ecosystem of cloud based Apps built on open protocols, there are potentially many alternative implementations of Advocates available across platforms that talk to the same ecosystem, and the user is in control.
This is preliminary thinking, designed to get a conversation going.
Not Search, but the Browser
This is a deep mistake being made in the first generation of assistants.
The companies making these assistants seem to see them as analogous to Search (aka Google). They are thinking of them as the place people come to ask questions. They reason that if they can provide the answer right there and then, the user wont leave the company’s sphere of influence, perhaps landing in a competitor’s space instead. The oligopolists look at Google’s domination of search, and their ensuing ability to own advertising, and think hey, we could wrest away / continue to own that space in the next generation.
But people don’t go to Google to find things out / do things as a first point of contact. They go to their phone or their browser. And what can they reach from this point? Other people, things made by other people, the entire virtual world. Which includes Google et al, but isn’t Google; Search is still just another app.
A Virtual Advocate that people actually want to use will be an entry point to an open universe of content and functionality made for and by everyone. It will talk to the next generational equivalent of a web page (something like a cognitive app), which will be built on open standards; most importantly on an open protocol.
Note protocol, not platform. A platform is something a closed commercial entity builds to control and contain a space. A protocol describes how a set of disparate entities can cooperate in building an open space. Facebook is platform, the web is protocol.
The web exploded, in the face of stiff commercial opposition, because it was based around an open protocol. Anyone could build a browser by implementing a published protocol, and, more importantly, anyone could build a website by implementing a published standard (or using a webserver based on one). No one had to ask anyone’s permission. If you wanted to build a site, and someone else wanted to visit it, no one had to run it by Tim Berners Lee to see if that’d be ok.
Virtual Advocates will be the same. They will talk to Apps on the behalf of users; these apps will be defined by an open behavioural specification. Sign up to the app in a standard way, pass in a query in a standard way, get a response in standard way, notify the user of new points of interest in a standard way.
Part of this standard could be the semantic web, but that’s not enough. There needs to be a whole protocol for programmatically talking to clever Apps.
And there can be many, many takes on an Advocate. An advocate is anything that can mediate between users on the one hand, and apps on the other. It could be a fairly stupid thing that doesn’t do much, or it could actively protect the user from the worst excesses of the apps it wrangles, keeping a large no-mans-land between the apps and the user. It could be a natural language interface combined with a bit of visual webbiness. Or it could be the next generation terminal / command line, a smart shell.
Advocates could be cloud based/backed but don’t necessarily need to be. Need to store data / state? There’s an app for that. They may need a url. They will likely be platform agnostic.
Smart, but not allowed to solve real problems
The virtual assistant can notice meetings in your email. It can tell you things about your schedule. It can prompt you about your favourite TV show.
To my mind, the biggest problem in personal computing right now (from the point of view of actual users) is overbearing, mind numbing complexity. There are so many services, apps, sites, platforms jostling and fighting to get their slice of our attention, and they don’t play nice.
So what do we spend our days doing? Hunting through all these things for what we need. What’s that notification about, did I already see that somewhere else? What’s the website for my train, and is it warning me of interruptions in service? Where’s that document I made last week; what tool did I even use to make it? What’s Gerry’s email address, or wait, do I talk to him through whatsapp? Why am I looking at my phone… I’m sure I meant to do something, but now I’m in my facebook stream. If I install this app, will it do something evil? Should I put my email address into this website?
Meanwhile, AI is on the way. The next wave of computing is to Take X and add AI. Watson APIs promise all kinds of high level AI services; as do others (eg: google’s prediction API). But there’s no way for a user to expand a virtual assistant’s capabilities with specific services built on top of these new APIs (or indeed on machine learning techniques, etc).
Currently we’re on a path toward apps that use these techniques, and big data, to manipulate us in more eyewateringly powerful ways than ever before, while we, like babes in the woods, have no defence. Unaltered humans cannot successfully use these technologies and also not be predated on by them, we require serious algorithmic backup to help us out.
An Advocate should be able to use the tools we already use, on our behalf. Just reading webpages for us would be a boon: “Are there any disruptions to my train listed on the rail company’s website?”, “How many people does wikipedia say live in Iceland”, etc. Plus more active uses: “Sign me up to a trial account for that new social network”, “Get me off that mailing list”.
To achieve these things, the advocate should use composable modules, the “Apps” I referred to above. These apps would be the equivalent of the modern website, but be for Advocates to use rather than for people to use directly. As they are for Advocates, they need to be machine accessible, not human accessible. So, apps can be less like silos (as modern websites are) and more like small, composable tools, like the linux shell uses. Higher level apps can use lower level apps. Also think dependency injection.
These apps, small composable tools, would advertise what they do, and how to ask them to do it. “I can parse english into a parse tree”, “I can take a parse tree and annotate it with alternate vocabulary”, “I can figure out roughly what an annotated parse tree is about”, “I can take an annotated parse tree asking about the content of a website, look up the website, and answer your question about it”.
As AI services and techniques continue to improve, the apps available to advocates can become more and more interesting. An open, composable architecture for fullfilling user requests using clever combinations of apps will make the best use of these emerging technologies, and harness them in our interests, rather than letting them run wild against our interests, by strictly mediating our interactions.
Not made for us
The assistants are being built with the oligopolistic tech companies’ interests in mind rather than those of their users. The usual design approach is employed in which just enough value is delivered to the user to keep them engaged (superfical things about flight times, when movies are on, travel time to somewhere), and then the bulk of the work is in “scaling” – ie: gathering as many of us in as possible and keeping us coming back over and over.
The services are closed (the user can’t expand them and can barely customise them), they deal in trivialities and shopping, and they are structured so that great corpuses of data are gathered about individuals that the individual herself is entirely walled off from. That’s an architecture we take entirely for granted now, apparently barely worthy of note, but I find it important.
Increasingly, regular users understand the Faustian bargain of the computing devices and environment we are being offered. We understand that these services are manipulative and serve their corporate owners, and we hope we can eke out more value than we lose, in combination with feeling that we must use these services because of the negative side of network effects; ie: everyone else is using them, so if we don’t, we are on the outside of important networks.
The devices we own and use now are a combination of skinner box and shopping mall. No longer are they about productivity enhancement and personal empowerment; they are about superstimulus and manipulation. Devices have appstores, apps have in-app payments. Apps scream for our attention, and feed us small, frequent, random positive rewards for our attention, so we can’t look away. The manipulation professions (marketing, advertising, …?) have flooded online and reach their fingers out toward us through our devices.
Advocates will be the new browsers / phone appstore + launcher. But unlike browsers, they will mediate strongly between us and the frenemy apps, helping us get use from apps without being violated. They might carefully limit the data being sent to abusive apps, warn us of potential abuses, notice problematic notification behaviour, decide to prioritise competing information from competing apps in ways the user wants rather than ways a platform owner or app provider wants.
Advocates should be able to vary wildly in behaviour, as long as they ultimately talk to the apps the user chooses to subscribe to. This should encourage experimentation and innovation, and as a side effect penalise closed implementations (why go there if there are open, inspectable, trustable alternatives?).
The modern personal computing environment is more powerful than it’s ever been, but also a bit dystopic and depressing. The current phase, the mobile+cloud phase begun by the iPhone, that killed the optimism of Web 2.0 and replaced it with the shiny shopping malls of Steve Job’s vision, needs to end. AI is coming, and if we don’t correct course, it’s going to be used for manipulation and control and be a tool of disenfranchisement and disillusionment that’ll make commercial television look like flower power.
I believe one part of a swing back could be to promote the open, powerful computing environment of Virtual Advocates + Apps described above, with this smart, truly user centred software component assisting people with the ballooning complexity of the sociotechnological environment and protecting people from the depredations of the other players (particularly commercial ones).
We’re very early on in the Agents phase, probably too early. But it’s the right time for us (common or garden rebel coders) to begin talking about this, maybe even to begin trying to bang out some horrible prototypes. Even really stupid Advocates that you have to say painfully specific things to could be useful if a few good apps pop up (eg: stuff that reads webpages for you, stuff that can handle signups, logins, unsubs for websites/services). Or if not useful, then interesting. What more could you ask?
Great thanks to John Hardy and Jodie O’Regan, who’ve listened to my ranting and helped with ideas.