Esteso Voce First Prototype

This weekend, I completed a first prototype of the Esteso Voce!

The Esteso Voce is an instrument I’ve been designing for some time, for adding harmony and delay capabilities to singing. I talk about this more in previous posts.

Introduction to the prototype:

Basic Operation:

Top view demonstration of how it is played:

Playing the Esteso Voce:

This is the first phrase of “Happy Be Thou, Heavenly Queen”, a two voice 14 century a capella piece. My playing skills are pretty poor so far.

Update! Jodie just wrote this little piece for the Esteso Voce; she plays and sings it, and I sing the melody part.

The component parts are:

– A Yamaha ez-eg learning guitar (used for its button based “strings”, functionally an array of 6 x 12 buttons, outputs midi)

– A fairly complex Pure Data patch, relying fundamentally on this PSOLA patch.

– A usb microphone.

– An amp+speaker.

Esteso Voce top view

Esteso Voce First Prototype

Towards an Esteso Voce Prototype

After roughly a year and a half, I’ve picked up where I left off with the Esteso Voce.

A quick recap: The Esteso Voce (the “extended voice”) is my concept for an instrument based on the human voice, which would allow a singer to sing harmonies with themselves, and in canon with themselves. Voice would come in via a microphone, be split into multiple identical voices, to each of which delay and pitch transposition is applied by the performer as desired. The performer would use some kind of hand operated instrument to do this.

Previously I wrote about my thinking and goals, and the first concrete ideas.

Recently I’ve been re-inspired to continue down this path. Two things have kicked this off.

1 – I discovered that the PSOLA pitch shifter for Max/MSP has been ported to pd (the open source version of max/msp). Previously I was using a time-limited demo of Max/MSP, but I really didn’t want to shell out the big bucks for a license, and then have to deal with the annoyances of that (eg: it would be locked to my machine I think, which I hate; I tend to regularly reinstall my OSes, for a start). I’m just not a closed source kind of guy. So having it available for pd is awesome (h/t Julian Villegas). Unfortunately it hasn’t been ported to linux, but the way is open, given that it is now open source.

2 – More importantly, a big conceptual breakthrough. I realised that the control of the extra voices in the Esteso Voce can be modeled on the strings of a stringed instrument.

Voices as Strings

Here’s the model I had in my post from a year and a half ago. Now that I look at it, it looks a lot like the neck of a guitar!

And that was the breakthrough. ie: what if I thought of each voice as a string on a stringed instrument? Now strings represent absolute pitches (eg: you choose a fret/position, and that’s a G), where the voices of the Esteso Voce are relative to the source voice, and possibly to other strings closer to the source voice. But, it still works.

This is important because it enables me to prototype the instrument more easily. Why? Because I can use a midi-controller guitar, which I can just buy. No hardware development.

In my previous thinking, I was going to use a touchscreen computer as the input. That could still work. However, I did buy a cheap touchscreen tablet (an Eken M001), and it’s really not responsive enough. So, I’d probably need an expensive multi-touch capable touchscreen (say an iPad). And then, it’s not a lovely surface for an instrument. You can’t feel the controls. It’s a big rectangle (not shaped to purpose). Plus, I’d have to actually develop the input surface software, and somehow make it communicate back to another computer which was processing the sound, or else use a tablet powerful enough and open enough (at least running a full version of windows) to run PureData.

With a midi controller guitar, OTOH, it’s much simpler. They either come with a direct USB interface, or they come with midi outputs, into which you plug a midi-usb converter, then usb into computer. Easy.

And, they then send midi signals, which is something PureData can work with natively. Win!

And, finally, it’s an instrument already well designed for you to pick out chords on the neck as the Esteso Voce would require. Curling your hand around the neck seems like a good approach ergonomically. Playable.

I’ve found two candidate instruments to use here, the YouRock Guitar and the Yamaha EZ series “learning” midi guitars, the Yamaha EZ-AG and the Yamaha EZ-EG.

Crucial to the choice is that I don’t want to use this as a guitar, it’s as a prototype control mechanism for the Esteso Voce. The Esteso Voce doesn’t need the right hand strumming / picking of strings, only the left hand choosing the voicings. So, the midi guitar must support sending signals when you touch the strings with the left hand, and not require you to also play the string with the right hand.

The YouRock Guitar can be played in pure midi mode (so no “game controller” crap), and has a “tap” mode which I think means you can tap the strings on the fretboard and signals are sent. That’s exactly what is needed. There is a question though as to whether that’s accurate enough. I came across a comment on amazon that the accuracy wasn’t great. So, there’s a question mark.

The Yamaha EZ series are actually my prefered option. Why? Because there are no strings, only buttons, on the neck. Perfect! It means tapping on the neck should be completely accurate; the guitar isn’t trying to guess at taps on the neck from the vibration of strings, but rather it’s getting the signals directly. Again it can be used in a pure midi mode, as required.

So I’m leaning toward the EZ-AG, mainly because the EZ-EG is discontinued, so probably harder to get. It looks as though the EZ-AG is the successor to the EZ-EG, but I’m not certain of that.

In the next post, I’ll describe in detail how this all hangs together.

Towards an Esteso Voce Prototype

Esteso Voce Concept #1

Esteso Voce Interface #1

(Update: Next post is here, previous is here)

I am in the process of inventing a new musical instrument, called the Esteso Voce.

In my previous post I talked about the conceptual path leading to the Esteso Voce. In this post, I detail my first concrete concept for the instrument. To the left is the first shot at the interface (think touchscreen), which I will explain below.

What I didn’t make clear in my previous post was the original inspiration for this instrument. My idea was to be able to extend the voice, using external equipment, such that a single singer could sing a motet or madrigal by themselves.

Now obviously this is not easily possible, in that the different lines generally have different rhythm and words. What I think is achievable via the Esteso Voce is singing homophonic pieces (eg: Taverner’s The Lamb) and pieces in strict canon, also allowing harmonic variation (eg: Byrd’s Non Nobis Domine, which includes canon-style repetition, with some lines in parallel organum fifth). Also a combination of canon and intricate harmonic transposition in realtime is achievable, but I don’t know of any existing music that does this.

(Incidentally, I just listened for the first time to that recording of The Lamb linked above, it’s the best choral singing I’ve ever heard.)

Esteso Voce Flow Diagram

Architectural Overview

The Esteso Voce is in essence a vocal harmoniser combined with delay functionality appropriate for singing in canon. It takes a single vocal input, copies it up to 3 times, delays those copies as desired by the player, and then transposes the copied voices up or down in pitch, as controlled in real-time by the player. The original voice signal and the copies are then recombined and output.

(I intend the singer and the player to be the same person, but they could of course be two different people.)

In the interface, sets of four vertical lines represent these voices. The bolded, or red line, indicates the source vocal (which is being sung by the singer directly).

Wherever there are three sets of controls in a row, these refer to the three copied vocal lines, not to the source vocal. Wherever the source vocal is, the control from such a set of three is affecting the voice on the opposite side of it from the source vocal. Where this effect is relative (eg: in the main harmoniser controls) the effect affects the voice on the opposite side from the source vocal, creating an effect relative the the voice on the same side as the source vocal (which may be the source vocal itself).

Primary Harmonizer Controls
Primary Harmonizer Controls

Primary Harmonizer Controls

The Primary Harmonizer Controls are the main controls for the player to control the transposition of the other vocal lines.

In the diagram, the source voice is the voice 2, the second vertical line (the red one). The line to its left is voice 1 and represents a vocal line which will sit below it in pitch. The two lines to its right (voices 3 and 4) represent two vocal lines; the leftmost of these sits above the source line in pitch, and the rightmost sits above that.

How far the lines are transposed is determined in real time by the circle touched by the player in the appropriate column. Voice 1 is controlled by the column of circles between it and the source voice (each represents a number of semitones to transpose the source down by). Voice 3 is controlled by the column of circles between it and the source voice (each represents a number of semitones to transpose the source up by).

Voice 4 is controlled by the column of circles between it and Voice 3, and the circles represent the amount to transpose Voice 3 by, in order to create Voice 4. This is the same as adding together the transposition amounts for the middle and right hand columns (for Voice 3 and Voice 4), and transposing the source voice by that many semitones.

These controls are intended to be realtime. So, the circle touched in any column has effect while it is touched, and ceases that effect as soon as it is no longer touched. If multiple circles in any column are touched simultaneously, only one is used (the most recently touched? the least recently touched? something else?). If no circle is touched in the column, then the assumed value to transpose by is zero semitones (as if there was a top most row of circles for zero, one of which was being touched).

Next blog, I’ll talk about the rest of the interface, including the Secondary Harmonizer Controls, the Auxilary Controls, and the Delay Controls.


Esteso Voce Concept #1

Esteso Voce (preamble)

(Update: Next post is here)

I was having trouble at the start of the year figuring out what projects I’d do this year. So I just rambled back and forth amongst the ideas I had (I’ve posted about them previously here). Finally, obsession emerges from ennui, and I know what it is that I’m going to do.

The project is codenamed Esteso Voce (“Extended Voice”, italian speakers please correct me and I’ll fix it). The intent is to create an instrument which extends what a highly trained singing voice can do, allowing the singer (me!) to achieve sounds and music which could not be achieved unaided, but only including sounds which would be judged by an expert in voice as “authentic acoustic sounds”.

Some explanations to come, but first a little rundown of the history of this idea.

A few years ago (2006 it turns out) I had this idea for making the “poly voce”. I really didn’t know much about digital audio, so I conceptualized it from whole cloth.

Here’s the original document where I laid out the vision/requirements as I saw them.

Realtime Vocal Ensemble Instrument

I sent this to Tyrell Blackburn (a classmate of Jodie’s at the time at Adelaide Uni), and he gave me very interesting feedback, the most important of which was to point me at Max/MSP.

I was just going over the email exchange from 2006 (thank you gmail for remembering everything), and noticed that the day after he told me about Max/MSP, I sent him the first prototype working patches for the concept. It was a manic period.

Tyrell also warned me about the concept of using a midi keyboard for relative pitches, (I talk about it in Realtime Vocal Ensemble Instrument), that it would confuse people who were used to the keyboard mapping to absolute pitches (ie: everyone). It turns out that was absolutely true!

Where I managed to get to was that I made successive prototypes (thank you to Christian Haynes from the Adelaide Con Electronic Music Unit for introducing me to the poly~ object), but they had a lot of lag, and weren’t really ever going to be reliable. I didn’t have a decent laptop and audio equipment for the job at the time, so I was using an underpowered PC, which didn’t help (this stuff is CPU intensive).

I wish I’d made a recording of it in action, but I never did. What I have found are the first prototype patches that I sent to Tyrell, I probably have the others lying around.

Now, the main thing I found out from this process was that this has already been done, this kind of effect is called a Vocal Harmonizer. Christian sent me a link to this, for example. Really amazing stuff. So, I lost my head of steam a bit, thought about buying some off the shelf effects, but never got around to it.

What I also learned in the process was about the PSOLA algorithm (well, I learned only that it exists, and that it requires real-time pitch detection, so all my ranting about no pitch detection is somewhat nullified, but I’ll live). I used a free PSOLA implementation.

Some years passed. During that time I’ve done a few other related things. The most relevant has been a bunch of work with Ableton Live, which I used to make polyphonic music with only my voice, via recording samples in realtime and looping them. Some examples are here, here and here.

The Ableton Live thing shows promise, but honestly I just don’t like that software. It’s just not flexible. There is no user scripting or macros, not much customization for live performance. It has this crappy facility to map keystrokes to actions for using it live, and people go to town with midi->keystroke mapping software, and all kinds of funny stuff, to tie it to foot pedals and the like (and get great results, no doubt about it, eg: this).

And then, I moved to Ubuntu for my main notebook. No more windows only software like Ableton. I really want to embrace the free software scene anyway, and there’s a lot of stuff available for linux, so it’s all good.

And we come to the present. It grabbed me recently that what I want is more than a vocal harmonizer. A vocal harmonizer is an effect, a gimmick.  What it does is good, but the context is wrong.

What I want to build is an instrument, not a piece of software, not a set of effects, not a sound module. The aesthetic is that of a violin or a trumpet or a bassoon. Something acoustically rich, simple in concept. Something which you have to learn to play. Something you don’t particularly customize, rather you master it. An instrument is a musical paradigm in itself.

What does all that high faluting stuff mean though? Well, it means that it can be limited, not endlessly accomodating. So, if I want vocal harmonizer functions, I can have a limited number of voices (I’m thinking of 4). Also, an instrument should have its own unique control system, and I think I’ve invented something a little bit weird but cool for this (for next blog post). Finally, an instrument should sound excellent, and define its own sound, not try to sound like something else. In this, I’m bending things a little, in that the sound is the human voice, but I want to stick to the following principle. This instrument is not for making a mediocre voice sound better. It is for taking a great voice further than the human voice could otherwise go. If being easy to use and sounding beautiful conflict, then beauty wins.

The short list of functions this instrument will have are:

  1. A vocal harmoniser, optimising for excellent sound (they should sound like a plausible, unaltered human voice).
  2. Harmonised voices might support gender changes and other timbral modifications, probably this will involve fun with formants.
  3. Voices can be delayed by a multiple of a player chosen fixed interval (think roughly a “bar”).  This allows the player to sing in canon with themselves.
  4. Harmoniser functions and delay functions can be combined.
  5. May want support for just intonation or other non-even tempered tunings.

The devil will be in the details, but the core idea is combining a vocal harmoniser and a simple delay mechanism.

That’s all I’ve got the energy to write tonight. This will be continued.

Oh, the other important thing: I found the free software successor to Max/MSP, called Pd. Miller Puckette is a genius. It looks as though my first versions will be built in Pd.


Esteso Voce (preamble)