Distributed Promises for AppEngine

PromiseThis post is a plea for help. I’ve made something pretty cool, in a commercial context, that’s small and contained and useful and which I want to open source. I’ve got no experience in successfully creating open sourced code that other people might want to interact with, so I’m looking for guidance from someone who does, particularly in the Python and/or AppEngine communities.


 

AppEngine is a great platform, one I’ve been working in now, using Python, for a couple of years.

One of the ongoing annoyances for me has been the low-level nature of the abstractions available for distributed programming, particularly using task queues to kick off background tasks (which might want to kick off more background tasks and etc).

At the most basic, you can create tasks on a queue. The queue is processed, the task runs eventually. The task is a web handler (a “get” handler iirc). Simple, but it’s messy to set up if you want to do complex things, lots of boilerplate, lots of messing around with routes and so on.

Then there’s the excellent deferred library. It allows you to kick off a function with some arguments in a task, but hiding all of the task queue messiness. It makes tasks highly useful and usable. But there are still niggles.

Firstly, deferred functions are passed by name. Actually, it’s more complex than this; deferred takes a callable, which might be an object (must be picklable, is pickled), or a function (passed by name for the most part I think, maybe something else going on with built in functions). But in any case we have the following restrictions in the doc:

The following callables can NOT be used as tasks:
1) Nested functions or closures
2) Nested classes or objects of them
3) Lambda functions
4) Static methods

ie: you can’t use really interesting stuff.

The second problem is that you launch a process, but then what? Does it ever execute? How do you know when it’s complete? How can you compose these calls?

As a higher level alternative to using deferred, I’ve made a library that provides distributed promises.

It lets you do things like this:

The interesting thing here is that both the functions are run in (separate) background tasks. If the first one times out, the second one will receive a timeout error as the result.

You’ll recognise promises more or less from the javascript world. These are similar, but structured a little differently.

Javascript’s promises are used for handling an asynchronous environment inside a single process. So you can kick off something asynchronous, and have the handler for that be a closure over the resolve method, meaning you can signal the job is done in a finished handler, without needing to “return” from the originating function.

For python on appengine, if you want to manage asynchronous behaviour inside a single process, try ndb tasklets.

My distributed promises for appengine are instead for managing distributed processing; that is, algorithms coordination multiple tasks to do something.

The function you pass to your promise (via when, then, etc) will run in a separate task, and can signal it is done by calling resolve() which will trigger further tasks (eg: as above) if necessary.

Also, to make promises really work and be interesting, I’ve included the ability to pass inner functions and closures and so on through to the promises. So while defer takes only callables that are picklable or referenceable by name, promises take functions (must be functions, not other callables) which are completely serialised with their entire closure context as necessary, and reconstructed in the task in which they will run. So you can do things like this:

Note the functions are referring to names from outside their own scope. Full closures are allowed. Also you can pass lambda functions.

The when() function returns a promise. The then() function returns a promise. There’s an all() function which also returns a promise (as well as variations such as allwhen and allthen). These functions are completely chainable.

The all() functions allow you to do something when a bunch of other things finish, ie:

promiseSpace.all([promises...]).then(somethingfinal)

Python has great exception handling, so I’ve dispensed with Javascript’s separate resolve and reject methods; to reject, just pass an exception to resolve. Also dispensed with are separate success / fail handlers; instead you get a result object with a value property, which throws the exception if the result was actually an exception.

Here’s a more red-blooded example of a function for mapping over the datastore, and example of using that to make a count function, and an example of calling that function:

Start at the bottom and work upwards to see how a simple counting method is implemented.

There might be much better ways of writing this; I’m not especially good at using these promises yet!

So, in closing, if you can see some value in this and you could help me with how to best publish this code, please contact me.

Distributed Promises for AppEngine

Virtual Assistants vs Virtual Advocates

Friendly RobotExisting Virtual Assistants include Siri, Google Now, and Cortana (others?). But the idea has been in around in computing and science fiction since forever (ie: the 50s).

In my opinion, the current commercial vision of the Virtual Assistant is deeply flawed. This is a consequence of the current consumer computing environment, which in recent years has become incredibly centralised to a few powerful companies, and very closed, dominated by walled gardens that don’t allow programmatic access to the important parts of their platforms.


Here’s a cool vision of Virtual Assistants from Apple in the 80s


Below, I talk about these flaws as I see them, and about what an alternative type of agent, a “Virtual Advocate”, would do instead.

Briefly, a Virtual Advocate will be like a Virtual Assistant in many ways. You talk to it in natural language, it responds in natural language. You ask it high level things: “Arrange a meeting with George next Wednesday”, “Are there any changes to the wikipedia page on Ferrets since yesterday?”, “Cut off my subscription to Fnergling Monthly”. But it is based on open standards, talks to an ecosystem of cloud based Apps built on open protocols, there are potentially many alternative implementations of Advocates available across platforms that talk to the same ecosystem, and the user is in control. 

This is preliminary thinking, designed to get a conversation going.

Not Search, but the Browser

This is a deep mistake being made in the first generation of assistants.

The companies making these assistants seem to see them as analogous to Search (aka Google). They are thinking of them as the place people come to ask questions. They reason that if they can provide the answer right there and then, the user wont leave the company’s sphere of influence, perhaps landing in a competitor’s space instead. The oligopolists look at Google’s domination of search, and their ensuing ability to own advertising, and think hey, we could wrest away / continue to own that space in the next generation.

But people don’t go to Google to find things out / do things as a first point of contact. They go to their phone or their browser. And what can they reach from this point? Other people, things made by other people, the entire virtual world. Which includes Google et al, but isn’t Google; Search is still just another app.

A Virtual Advocate that people actually want to use will be an entry point to an open universe of content and functionality made for and by everyone. It will talk to the next generational equivalent of a web page (something like a cognitive app), which will be built on open standards; most importantly on an open protocol.

Note protocol, not platform. A platform is something a closed commercial entity builds to control and contain a space. A protocol describes how a set of disparate entities can cooperate in building an open space. Facebook is platform, the web is protocol.

The web exploded, in the face of stiff commercial opposition, because it was based around an open protocol. Anyone could build a browser by implementing a published protocol, and, more importantly, anyone could build a website by implementing a published standard (or using a webserver based on one). No one had to ask anyone’s permission. If you wanted to build a site, and someone else wanted to visit it, no one had to run it by Tim Berners Lee to see if that’d be ok.

Virtual Advocates will be the same. They will talk to Apps on the behalf of users; these apps will be defined by an open behavioural specification. Sign up to the app in a standard way, pass in a query in a standard way, get a response in standard way, notify the user of new points of interest in a standard way.

Part of this standard could be the semantic web, but that’s not enough. There needs to be a whole protocol for programmatically talking to clever Apps.

And there can be many, many takes on an Advocate. An advocate is anything that can mediate between users on the one hand, and apps on the other. It could be a fairly stupid thing that doesn’t do much, or it could actively protect the user from the worst excesses of the apps it wrangles, keeping a large no-mans-land between the apps and the user. It could be a natural language interface combined with a bit of visual webbiness. Or it could be the next generation terminal / command line, a smart shell.

Advocates could be cloud based/backed but don’t necessarily need to be. Need to store data / state? There’s an app for that. They may need a url. They will likely be platform agnostic.

Smart, but not allowed to solve real problems

The virtual assistant can notice meetings in your email. It can tell you things about your schedule. It can prompt you about your favourite TV show. 

Big deal.

To my mind, the biggest problem in personal computing right now (from the point of view of actual users) is overbearing, mind numbing complexity. There are so many services, apps, sites, platforms jostling and fighting to get their slice of our attention, and they don’t play nice. 

So what do we spend our days doing? Hunting through all these things for what we need. What’s that notification about, did I already see that somewhere else? What’s the website for my train, and is it warning me of interruptions in service? Where’s that document I made last week; what tool did I even use to make it? What’s Gerry’s email address, or wait, do I talk to him through whatsapp? Why am I looking at my phone… I’m sure I meant to do something, but now I’m in my facebook stream. If I install this app, will it do something evil? Should I put my email address into this website?

Meanwhile, AI is on the way. The next wave of computing is to Take X and add AI. Watson APIs promise all kinds of high level AI services; as do others (eg: google’s prediction API). But there’s no way for a user to expand a virtual assistant’s capabilities with specific services built on top of these new APIs (or indeed on machine learning techniques, etc).

Currently we’re on a path toward apps that use these techniques, and big data, to manipulate us in more eyewateringly powerful ways than ever before, while we, like babes in the woods, have no defence. Unaltered humans cannot successfully use these technologies and also not be predated on by them, we require serious algorithmic backup to help us out.

An Advocate should be able to use the tools we already use, on our behalf. Just reading webpages for us would be a boon: “Are there any disruptions to my train listed on the rail company’s website?”, “How many people does wikipedia say live in Iceland”, etc. Plus more active uses: “Sign me up to a trial account for that new social network”, “Get me off that mailing list”.

To achieve these things, the advocate should use composable modules, the “Apps” I referred to above. These apps would be the equivalent of the modern website, but be for Advocates to use rather than for people to use directly. As they are for Advocates, they need to be machine accessible, not human accessible. So, apps can be less like silos (as modern websites are) and more like small, composable tools, like the linux shell uses. Higher level apps can use lower level apps. Also think dependency injection.

These apps, small composable tools, would advertise what they do, and how to ask them to do it. “I can parse english into a parse tree”, “I can take a parse tree and annotate it with alternate vocabulary”, “I can figure out roughly what an annotated parse tree is about”, “I can take an annotated parse tree asking about the content of a website, look up the website, and answer your question about it”. 

As AI services and techniques continue to improve, the apps available to advocates can become more and more interesting. An open, composable architecture for fullfilling user requests using clever combinations of apps will make the best use of these emerging technologies, and harness them in our interests, rather than letting them run wild against our interests, by strictly mediating our interactions.

Not made for us

The assistants are being built with the oligopolistic tech companies’ interests in mind rather than those of their users. The usual design approach is employed in which just enough value is delivered to the user to keep them engaged (superfical things about flight times, when movies are on, travel time to somewhere), and then the bulk of the work is in “scaling” – ie: gathering as many of us in as possible and keeping us coming back over and over.

The services are closed (the user can’t expand them and can barely customise them), they deal in trivialities and shopping, and they are structured so that great corpuses of data are gathered about individuals that the individual herself is entirely walled off from. That’s an architecture we take entirely for granted now, apparently barely worthy of note, but I find it important.

Increasingly, regular users understand the Faustian bargain of the computing devices and environment we are being offered. We understand that these services are manipulative and serve their corporate owners, and we hope we can eke out more value than we lose, in combination with feeling that we must use these services because of the negative side of network effects; ie: everyone else is using them, so if we don’t, we are on the outside of important networks.

The devices we own and use now are a combination of skinner box and shopping mall. No longer are they about productivity enhancement and personal empowerment; they are about superstimulus and manipulation. Devices have appstores, apps have in-app payments. Apps scream for our attention, and feed us small, frequent, random positive rewards for our attention, so we can’t look away. The manipulation professions (marketing, advertising, …?) have flooded online and reach their fingers out toward us through our devices.

Advocates will be the new browsers / phone appstore + launcher. But unlike browsers, they will mediate strongly between us and the frenemy apps, helping us get use from apps without being violated. They might carefully limit the data being sent to abusive apps, warn us of potential abuses, notice problematic notification behaviour, decide to prioritise competing information from competing apps in ways the user wants rather than ways a platform owner or app provider wants.

Advocates should be able to vary wildly in behaviour, as long as they ultimately talk to the apps the user chooses to subscribe to. This should encourage experimentation and innovation, and as a side effect penalise closed implementations (why go there if there are open, inspectable, trustable alternatives?).

Conclusion

The modern personal computing environment is more powerful than it’s ever been, but also a bit dystopic and depressing. The current phase, the mobile+cloud phase begun by the iPhone, that killed the optimism of Web 2.0 and replaced it with the shiny shopping malls of Steve Job’s vision, needs to end. AI is coming, and if we don’t correct course, it’s going to be used for manipulation and control and be a tool of disenfranchisement and disillusionment that’ll make commercial television look like flower power.

I believe one part of a swing back could be to promote the open, powerful computing environment of Virtual Advocates + Apps described above, with this smart, truly user centred software component assisting people with the ballooning complexity of the sociotechnological environment and protecting people from the depredations of the other players (particularly commercial ones).

We’re very early on in the Agents phase, probably too early. But it’s the right time for us (common or garden rebel coders) to begin talking about this, maybe even to begin trying to bang out some horrible prototypes. Even really stupid Advocates that you have to say painfully specific things to could be useful if a few good apps pop up (eg: stuff that reads webpages for you, stuff that can handle signups, logins, unsubs for websites/services). Or if not useful, then interesting. What more could you ask?

Great thanks to John Hardy and Jodie O’Regan, who’ve listened to my ranting and helped with ideas.

Virtual Assistants vs Virtual Advocates

A Dangerous Idea: Continuous Metadata Sousveillance

watchthewatchers9I’ve been thinking about Zimmerman’s Law:
“The human population may not be doubling every eighteen months, but the ability of computers to track us doubles every eighteen months.” “The natural flow of technology tends to move in the direction of making surveillance easier.”

He seems to think legislators need to “do something”. But I think you need to work with Moore’s Law, or be crushed by it. Legislators aren’t going to help; they’re actually who you’re trying to defend yourself from!

My instincts are that eventually, computers will be so powerful, networks so capacious, that basic data will be completely impossible to keep under wraps. In that scenario, anyone trying to hide anything will be detectable with a bit of signal processing. Secret organisations will be plainly visible via the negative space they leave in the general data exhaust.

Unfortunately we’re not there. My guess is that we’re about 20 years away from something like that. In the meantime, there’s this massive disparity between what institutions have access to in terms of data and what we have access to.

That disparity is potentially quite dangerous, particularly if it’s completely asymmetrical, as it threatens to be. If it were even a little more symmetrical, I believe that large, secretive institutions would have far more to worry about than regular people. After all, if a bit of your personal information leaks onto the ‘net, it’s just about always going to be harmless. If a bit of the NSA’s private information leaks, all hell breaks loose and they’re suddenly in existential peril.

What we’ve discovered recently is that the content of communications isn’t all that important. It’s the metadata that let’s you see the general shape of things, the big picture. That’s why the secure email services are shutting down.

You’d think that we’d be able to use metadata in the reverse direction; see into the three letter agencies by analyzing the big data, seeing them in their exhaust, and in the negative space. But we don’t have access to datasets from cell phones, from cloud providers, from interaction with government agencies. We don’t have enough ability to touch the big data.

But we could make our own big data. That’s where Sousveillance comes in.

Sousveillance is “watching from below”, the counterpoint to Surveillance. Up till now, everything I’ve seen people say about Sousveillance has been around Video. But video sousveillance has a lot of problems: video is large (hard to upload in volume), unwieldy (hard to extract information), and recording video is still difficult to do continuously and inconspicuously.

But we’ve just learned that “metadata” is actually what’s useful. It’s not the content of an event, but the time, date, participants, location, devices involved. All that stuff is what you actually want to extract interesting big data signal.

All of us now carry devices capable of metadata sousveillance, right now. They’re our mobile phones, tablets, laptops, and soon to be watches, glass, and other wearables.

On these devices, you can monitor all of your own communications. But you have quite an array of sensors. One of the most interesting and most often forgotten is your network hardware. Your network hardware alone is aware of devices around it. Mac addresses of wifi devices, bluetooth devices.  Services advertising themselves on local networks. NFC devices and tags you interact with. Cell towers and related metadata. And etc.

Take that kind of data, stamp it with location, identity and timestamps, and push it online. All the time.

With the right app or apps, your devices could voluntarily upload streams of metadata to public repositories on the net. Users should be aware and voluntarily participating, but needn’t actually be technical. Just install and go.

With enough people installing such software, the repository we’d get would grow stupendously. And you’d start to see things. Maps of devices inside buildings being picked up by people walking down the street. Clusters of otherwise unrelated mobile devices turning up together in the same places at the same time. Protestors might start mapping devices used by the police, turning up from one place to another. And what else? I’m not sure, but I’m pretty sure there’d be amazing secrets to be uncovered.

Early on it’d likely be fairly dangerous to be involved, because you’d be pretty exposed. You’d be posting your own information freely online, after all. But if the idea spread, it’d start to be safer and more powerful I think.

It’d start forcing secretive institutions to try to obfuscate themselves, or else stop using open protocols. Both paths would really damage those institutions, making them less able to operate in the modern world.

There’d be some pretty amazing technical challenges. Where does this data go? How do you handle, store this massive stream of stuff?

But I think it’s probably doable. And it’s probably necessary, if we’re to push back against Zimmerman’s Law.

A Dangerous Idea: Continuous Metadata Sousveillance

How Copyright Makes Books and Music Disappear

An interesting paper by Paul J. Heald. Have a look at the graph; a copyright regime is like burning all the libraries.

2317 New Books from Amazon by Decade

That’s a graph of new books currently on sale now on Amazon, grouped by the decade they are published. Why do new books rapidly drop off starting from those first published in the 1920s? Wikipedia says: “All copyrightable works published in the United States before 1923 are in the public domain.”

Abstract: “A random sample of new books for sale on Amazon.com shows three times more books initially published in the 1850’s are for sale than new books from the 1950’s. Why? This paper presents new data on how copyright seems to make works disappear. First, a random sample of 2300 new books for sale on Amazon.com is analyzed along with a random sample of 2000 songs available on new DVD’s. Copyright status correlates highly with absence from the Amazon shelf. Together with publishing business models, copyright law seems to stifle distribution and access. On page 15, a newly updated version of a now well-known chart tells this story most vividly. Second, the availability on YouTube of songs that reached number one on the U.S., French, and Brazilian pop charts from 1930-60 is analyzed in terms of the identity of the uploader, type of upload, number of views, date of upload, and monetization status. An analysis of the data demonstrates that the DMCA safe harbor system as applied to YouTube helps maintain some level of access to old songs by allowing those possessing copies (primarily infringers) to communicate relatively costlessly with copyright owners to satisfy the market of potential listeners.”

I’ve copied this here so it’s more social-network friendly.

How Copyright Makes Books and Music Disappear

Why are there Wizards?

When I was a young lad, in my first programming job (not even out of uni), an older woman who worked in accounts told me that programming had no future. Apparently her TAFE lecturers had been insistent on this point; programming was being brought into the realm where anyone could do it. Wizards no longer needed.

That was the early nineties, and it made an impression on me, because I felt it was deeply, profoundly wrong. I felt that idea was based in a mistaken view of why technology changes over time with respect to human society (ie: the dynamics of the technium), and what the true role of technologists (especially software people) is.

The Expanding Space of the Possible

Ok, stay with me here…

The Expanding Space Of The Possible
Many thanks to Sir Jony Ive for making this diagram

When you stand back and look at human endeavour, there are things we can do and things we can’t. The effects of human imagination, competition, and general discontent lead us to be very aware of the boundary between the possible and the impossible.

In the diagram above, points in the space represent logically consistent things we can imagine doing. Those points don’t move, but the boundaries in the diagram (what is possible and what is not) do.

The space of the possible expands over time, certainly in recent history this has been very hard to miss. I suspect there is a fairly tight relationship between population density (and so loosely with population) and the size of the possible, and that it can shrink when population density drops . Think of the space of the possible as what we can do using technology; this is totally dependent on the “level” of our technology. Jared Diamond writes about how technological level varies with population in his books. But in any case, at the modern global scale, expansion is a given.

The Automatium and The Laborium

I’ve split the space of the possible into two regions.

The inner, magenta region, is the Automatium. This represents all the things that we understand so well, have mastered in such depth, that they are fully automated. People involved in a relevant domain of endeavour can access the Automatium trivially and with little thought. In the consumer domain, we can go to the shop and buy an incredible variety of food, get whitegoods that keep things cold, wash things, cook things, communicate with people all over the globe, increasingly access knowledge about anything, and none of it requires much skill or understanding. Social networks such as Facebook, Twitter, and G+, recently brought the job of communicating with sophisticated and intricately constructed networks of dispersed others directly into the Automatium.

The outer, cyan region, is the Laborium. This represents all the things that we can do but that are not automated. They require labour, effort. Often they require skilled practitioners of one profession or another, and teams of people, and capital. Pretty much all paid work is in the Laborium (because it’s the place where money moves). Anything that you would build a service business around is in the Laborium. Using a social network might be in the Automatium, but building a social networking platform is in the Laborium (and on the outer edges, at that).

The outer edge of the Automatium is like a hard floor (the Automation Floor) below which we wont go, while the outer edge of the Laborium is like a flexible membrane.

Everyone in the Laborium is either standing on the hard floor provided by the outer edge of the Automatium, or standing on someone else’s shoulders. So the size of the laborium is defined by some combination of the sheer amount of people involved, and the complexity of organisation possible. The latter is the maintainable height of people standing on each other’s shoulders.

So why does the whole thing move? The fundamental mechanism is that we keep building more floor beneath us. Things enter the space of the possible at the outer edge, where massive capital, huge collections of people, large chunks of time are required. Our competition with each other, and maybe just our drive to improve, makes some of us try to make these things simpler, cheaper, quicker. So things are moved from the outer edge of the Laborium toward the inner edge (shifting not the point in possibility space, but the Laborium with respect to it). The laborium is like a churning froth, but it also behaves like a ratchet; once something moves lower, it wont move higher again.

Innevitably possibilities reach the outer edge of the Automatium, and are laid down as another hard layer of automation floor. People step up onto that. The shift ripples upward, and the outer membrane of the Laborium stretches to encompass new, previously impossible things. The space of the possible grows.

Technological Unemployment

The traditional story of technological unemployment goes as follows:

Technological unemployment is unemployment primarily caused by technological change. Given that technological change generally increases productivity, it is accepted that technological progress, although it might disrupt the careers of individuals and the health of particular firms, produces opportunities for the creation of new, unrelated jobs.”

In terms of this post, this traditional view is that people and firms work at a fixed point in space. As the automation floor moves past them (and people really don’t see it coming), they fall out of being able to do paid work. But the people involved eventually retrain/retarget/move on, often to something else very much closer to the outer membrane of the laborium, and they’re back in the game. If anything, the traditional situation has the laborium understaffed a lot of the time; we could reach further but we just don’t have the manpower.

Workpocalypse

However, there’s an emerging view that perhaps the something has changed recently. Because of modern automation, jobs are being destroyed faster than they are being created. That is, the Automatium is expanding faster than the Laborium.

Particularly, a divergence between productivity and job growth has emerged.

Erik Brynjolfsson of MIT thinks jobs are disappearing for good. This excellent piece in the MIT Technology Review reports:

“Perhaps the most damning piece of evidence, according to Brynjolfsson, is a chart that only an economist could love. In economics, productivity—the amount of economic value created for a given unit of input, such as an hour of labor—is a crucial indicator of growth and wealth creation. It is a measure of progress. On the chart Brynjolfsson likes to show, separate lines represent productivity and total employment in the United States. For years after World War II, the two lines closely tracked each other, with increases in jobs corresponding to increases in productivity. The pattern is clear: as businesses generated more value from their workers, the country as a whole became richer, which fueled more economic activity and created even more jobs. Then, beginning in 2000, the lines diverge; productivity continues to rise robustly, but employment suddenly wilts. By 2011, a significant gap appears between the two lines, showing economic growth with no parallel increase in job creation. Brynjolfsson and McAfee call it the “great decoupling.” And Brynjolfsson says he is confident that technology is behind both the healthy growth in productivity and the weak growth in jobs.”

How does this fit into our picture? What seems to be happening is that the Laborium is shrinking.

The Laborium is all about people. People have all kinds of skills and talents and differences, but we all spread out on fairly contrained continua, especially if compared to automation.

It used to be that specific things were automated away, but now entire classes of things are being hit. This means the automation floor is expanding faster than the Laborium’s outer membrane, shrinking the Laborium overall.

The space remaining is biased toward certain kinds of work. What kind of work is it most biased toward? Work that further accelerates expansion of the Automatium and further shrinks the Laborium.

The Rise of Wizards

People who create technology are people who automate things. You can automate with all kinds of technology, but the most effective technological space for doing this is the space of software.

Software is a unique technology. It’s the most flexible general technology we’ve ever found for taking the imagined and making it real in the quickest, most malleable, and potentially most complex and sophisticated fashion.

The wizards, ie: the people who create software, are the most efficient group at moving the boundaries of the possible. Wizards move the automation floor and move the Laborium membrane, and at global scale the collection of such effort has these boundaries moving ever faster.

Other types of work tend to involve staying relatively static within problem space. But wizards, by nature of what we do, are continually on the move; changing technologies, paradigms, environments, everything. Or if we don’t, then we don’t get to stay being wizards.

True wizards tend to abhor manual repetition. The idea of someone working away in a fixed section of the laborium, with no plan for eventually automating away that toil, inspires revulsion.

Business loves wizards, because wizards hold out the promise of a true edge in a competitive environment.

In a static environment, everyone has access to the same technologies, talents, ideas. The kinds of things that give one organisation an advantage over another are size, being entrenched, having connections. This all leads to a static environment without much room for change, or for new players.

But the point of wizards is to raise the business closer to the outer membrane of the Laborium; that keeps the business more competitive (the air is more rarefied there!), keeps it away from the doom of the automation floor, and allows a smaller business to outwit a larger one that is not so far out. Often this requires raising the automation floor in the businesses’ niche, related areas, or sometimes across some orthogonal line when the technology is abstract. Hopefully it involves pushing the membrane further out, and temporarily occupying space that no one else has reached yet.

Why can’t everyone be a Wizard?

When someone tells you that now anyone can do what a wizard does (eg: now I haz visual basic), you know the technologies involved are falling through the automation floor. That’s not where wizards hang out.

Wizards live in tall towers, built high above that floor. As they sink toward the floor, they build new, taller ones.

Up high, near the outer Laborium membrane, is a hostile place. Nothing is easy. Things are possible but very difficult. Ideas haven’t fully coalesced, standards haven’t developed, best practices haven’t developed or are wrong. Compare contructing a web app using the LAMP stack (down in the lower floors of the wizard tower),  to building a massive distributed application on something like Heroku or Google AppEngine. Compare building a standard AJAX based Web 2.0 site to a sophisticated mobile app (or set of apps to reach cross platform), or a mobile friendly web app with offline functionality. The newer things are more powerful, but much harder to do. There’s less received wisdom, more primitive tooling, and previously developed instincts tend to be wrong. But the opportunity is much greater.

It seems to take a unique mindset to really be a Wizard. You have to be comfortable with constant change. Increasingly you need to feel good about not thoroughly understanding your technologies, never being comfortable with the technology stack you’re using this week, never really attaining mastery at particular concrete skills. Clearly not everyone can do this, it’s why people try to develop simplified, non-wizard friendly versions of programming technologies.

All you can know for sure is that if the tech you are using is starting to feel solid, understood, well developed, then you’re close to the automation floor and need to get moving again.

The Ironic Nature of Wizards

The supreme irony for wizards is that we’ll be the last ones in the Laborium, after everyone has given up on that kind of toil.

Step by step, all other work will be automated away. Every other area will require less and less people, as the automation floor expands ever more quickly, and whole industries will continue to be sucked down below it, being replaced by organisations working at increasing levels of abstraction, relying on smarter and smarter tech and ever fewer people.

Meanwhile Wizards keep moving the boundaries, always running toward the outer edge.

The laborium will get thinner and thinner, as technology catches up to and surpasses human ability, unevenly but inexorably. Fewer and fewer people will be in it, and it will come to be dominated by Wizards.

As a great example, an article by a silicon valley web developer marvelling at being paid top dollar for seemingly meaningless work (it’s just abstract), while his non-wizard compatriots are increasingly left in the cold: http://www.aeonmagazine.com/living-together/james-somers-web-developer-money/

So by this logic, we Wizards will be the last ones working. The last of us will turn off the lights before we leave.

Meanwhile, the most recent news about the lady from accounts was that she’d been retrenched and was having trouble finding more work.

Why are there Wizards?

Man Of Constant Sorrow

There’s something unusual about this arrangement of ours. Can you pick it?

Jodie and I are recording a bunch of songs “live” in front of an audience next week. If you’re in Adelaide, you’re welcome to come along: next up | emlynandjodieoregan

This is a rehearsal from earlier this afternoon. Our house was infested with teenagers, so we’ve eschewed the kitchen table in favour of Jodie’s singing studio, “The Singing Garden”.

Man Of Constant Sorrow

gaedocstore: JSON Document Database Layer for ndb

In my professional life I’m working on a server side appengine based system whose next iteration needs to be really good at dealing with schema-less data; JSON objects, in practical terms. To that end I’ve thrown together a simple document database layer to sit on top of appengine’s ndb, in python.

Here’s the github repo: https://github.com/emlynoregan/gaedocstore

And here’s the doco as it currently exists in the repo, it should explain what I’m up to.

This library will no doubt change as begins to be used in earnest.

gaedocstore

gaedocstore is MIT licensed http://opensource.org/licenses/MIT

gaedocstore is a lightweight document database implementation that sits on top of ndb in google appengine.

Introduction

If you are using appengine for your platform, but you need to store arbitrary (data defined) entities, rather than pre-defined schema based entities, then gaedocstore can help.

gaedocstore takes arbitrary JSON object structures, and stores them to a single ndb datastore object called GDSDocument.

In ndb, JSON can simply be stored in a JSON property. Unfortunately that is a blob, and so unindexed. This library stores the bulk of the document in first class expando properties, which are indexed, and only resorts to JSON blobs where it can’t be helped (and where you are unlikely to want to search anyway).

gaedocstore also provides a method for denormalised linking of objects; that is, inserting one document into another based on a reference key, and keeping the inserted, denormalised copy up to date as the source document changes. Amongst other uses, this allows you to provide performant REST apis in which objects are decorated with related information, without the penalty of secondary lookups.

Simple Put

When JSON is stored to the document store, it is converted to a GDSDocument object (an Expando model subclass) as follows:

  • Say we are storing an object called Input.

  • Input must be a dictionary.

  • Input must include a key at minimum. If no key is provided, the put is rejected.

    • If the key already exists for a GDSDocument, then that object is updated using the new JSON.
    • With an update, you can indicate “Replace” or “Update” (default is Replace). Replace entirely replaces the existing entity. “Update” merges the entity with the existing stored entity, preferentially including information from the new JSON.
    • If the key doesn’t already exist, then a new GDSDocument is created for that key.
  • The top level dict is mapped to the GDSDocument (which is an expando).

  • The GDSDocument property structure is built recursively to match the JSON object structure.

    • Simple values become simple property values
    • Arrays of simple values become a repeated GenericProperty. ie: you can search on the contents.
    • Arrays which include dicts or arrays become JSON in a GDSJson object, which just hold “json”, a JsonProperty (nothing inside is indexed, or searchable)
    • Dictionaries become another GDSDocument
    • So nested dictionary fields are fully indexed and searchable, including where their values are lists of simple types, but anything inside a complex array is not.

eg:

ldictPerson = {
    "key": "897654",
    "type": "Person",
    "name": "Fred",
    "address": 
    {
        "addr1": "1 thing st",
        "city": "stuffville",
        "zipcode": 54321,
        "tags": ['some', 'tags']
    }
}

lperson = GDSDocument.ConstructFromDict(ldictPerson)
lperson.put()    

This will create a new person. If a GDSDocument with key “897654″ already existed then this will overwrite it. If you’d like to instead merge over the top of an existing GDSDocument, you can use aReplace = False, eg:

    lperson = GDSDocument.ConstructFromDict(lperson, aReplace = False)

Simple Get

All GDSDocument objects have a top level key. Normal ndb.get is used to get objects by their key.

Querying

Normal ndb querying can be used on the GDSDocument entities. It is recommended that different types of data (eg Person, Address) are denoted using a top level attribute “type”. This is only a recommended convention however, and is in no way required.

You can query on properties in the GDSDocument, ie: properties from the original JSON.

Querying based on properties in nested dictionaries is fully supported.

eg: Say I store the following JSON:

{
    "key": "897654",
    "type": "Person",
    "name": "Fred",
    "address": 
    {
        "key": "1234567",
        "type": "Address",
        "addr1": "1 thing st",
        "city": "stuffville",
        "zipcode": 54321
    }
}

A query that would return potentially multiple objects including this one is:

GDSDocument.gql("WHERE address.zipcode = 54321").fetch()

or

s = GenericProperty()
s._name = 'address.zipcode'
GDSDocument.query(s == 54321).fetch()

Note that if you are querying on properties below the top level, you cannot do the more standard

GDSDocument.query(GenericProperty('address.zipcode') == 54321).fetch()  # fails

due to a limitation of ndb

If you need to get the json back from a GDSDocument, just do this:

json = lgdsDocument.to_dict()

Denormalized Object Linking

You can directly support denormalized object linking.

Say you have two entities, an Address:

{
    "key": "1234567",
    "type": "Address",
    "addr1": "1 thing st",
    "city": "stuffville",
    "zipcode": 54321
}

and a Person:

{
    "key": "897654",
    "type": "Person",
    "name": "Fred"
    "address": // put the address with key "1234567" here
}

You’d like to store the Person so the correct linked address is there; not just the key, but the values (type, addr1, city, zipcode).

If you store the Person as:

{
    "key": "897654",
    "type": "Person",
    "name": "Fred",
    "address": {"key": "1234567"}
}

then this will automatically be expanded to

{
    "key": "897654",
    "type": "Person",
    "name": "Fred",
    "address": 
    {
        "key": "1234567",
        "type": "Address",
        "addr1": "1 thing st",
        "city": "stuffville",
        "zipcode": 54321
    }
}

Furthermore, gaedocstore will update these values if you change address. So if address changes to:

{
    "key": "1234567",
    "type": "Address",
    "addr1": "2 thing st",
    "city": "somewheretown",
    "zipcode": 12345
}

then the person will automatically update to

{
    "key": "897654",
    "type": "Person",
    "name": "Fred",
    "address": 
    {
        "key": "1234567",
        "addr1": "2 thing st",
        "city": "somewheretown",
        "zipcode": 12345
    }
}

Denormalized Object Linking also supports pybOTL transform templates. gaedocstore can take a list of “name”, “transform” pairs. When a key appears like

{
    ...
    "something": { key: XXX },
    ...
}

then gaedocstore loads the key referenced. If found, it looks in its list of transform names. If it finds one, it applies that transform to the loaded object, and puts the output into the stored GDSDocument. If no transform was found, then the entire object is put into the stored GDSDocument as described above.

eg:

Say we have the transform “address” as follows:

ltransform = {
    "fulladdr": "{{.addr1}}, {{.city}} {{.zipcode}}"
}

You can store this transform against the name “address” for gaedocstore to find as follows:

GDSDocument.StorebOTLTransform("address", ltransform)

Then when Person above is stored, it’ll have its address placed inline as follows:

{
    "key": "897654",
    "type": "Person",
    "name": "Fred",
    "address": 
    {
        "key": "1234567",
        "fulladdr": "2 thing st, somewheretown 12345"
    }
}

An analogous process happens to embedded addresses whenever the Address object is updated.

You can lookup the bOTL Transform with:

ltransform = GDSDocument.GetbOTLTransform("address")

and delete it with

GDSDocument.DeletebOTLTransform("address")

Desired feature (not yet implemented): If the template itself is updated, then all objects affected by that template are also updated.

Deletion

If an object is deleted, then all denormalized links will be updated with a special key “link_missing”: True. For example, say we delete address “1234567″ . Then Person will become:

{
    "key": "897654",
    "type": "Person",
    "name": "Fred",
    "address": 
    {
        "key": "1234567",
        "link_missing": True
    }
}

And if the object is recreated in the future, then that linked data will be reinstated as expected.

Similarly, if an object is saved with a link, but the linked object can’t be found, “link_missing”: True will be included as above.

updating denormalized linked data back to parents

The current version does not support this, but in a future version we may support the ability to change the denormalized information, and have it flow back to the original object. eg: you could change addr1 in address inside person, and it would fix the source address. Note this wont work when transforms are being used (you would need inverse transforms).

storing deltas

I’ve had a feature request from a friend, to have a mode that stores a version history of all changes to objects. I think it’s a great idea. I’d like a strongly parsimonious feel for the library as a whole: it should just feel like “ndb with benefits”).

 

gaedocstore: JSON Document Database Layer for ndb