alps - how it works

The principles behind alps are actually quite simple, and here is an introduction to most of them, also a bit of background on its implementation. For more detailed inofmation please consult the various publications, and of course the code itself! Much becomes also clearer when looking at the example applications.

alps - using sound for tracking

alps uses sound to estimate positions of devices in indoors environments. The principle behind alps, acoustic localisation technique, is applied for positioning and tracking technology in many guises, echolocation for example, used in so called sounders, fish-finders, mammography, etc. Typically these technologies use dedicated equipment, hardware built for the purpose and ultrasonic sound, in the frequency range above what we humans can hear.

In contrast to these technologies, alps in its current implementation uses ubiquitously available technology in form of standard commercially available audio-loudspeakers and audio microphones primarily designed for frequencies within the human hearing range, designed for speech and musical content. The processing power of a current laptop is enough to run it. This makes alps a straight forward choice for all applications which require tracking or positioning, in situations in which loudspeakers and or microphones are readily available, as in most surround sound systems, virtual reality applications, conferencing, live sound, and home theatre.

alps uses a pulsed measure signal between 22 - 30 kHz. This has the advantage of being above the human hearing range, but within the range of most commercially available loudspeakers. Further, the separation of this bandwidth above the content of audio applications means that in most cases the same loudspeakers can be used for localisation and for the content at the same time.

acoustic localisation technique

Here, the principle, as we apply it, in short: As we know how fast sound moves in air, we can calculate the distance beween a microphone and a sound source (loudspeaker) by measuring the time delay of a known signal at the microphone in comparison to its original on the loudspeaker. (This might sound complex, but is in principle the same thing as when we estimate how far away lightning strikes in a thunderstorm by counting the seconds between the moment we see the lightning and when we hear it strike.) If we do this with several loudspeakers or, in fact with several microphones, we can trillaterate or in some cases triangulate the position of a sound source.

ultra sound

With using sound frequencies above the frequency range of the human ear, we can locate devices (loudspeakers or microphones, depending on the approach) without the measuring signal to be audible. The two main advantages of alps, for example when comparing the system to optical tracking principles are that "line of sight" is not necessary, as firstly sound also travels in darkness, and secondly, sound diffracts around objects. However, the second point becomes moot at the frequencies we are using, as the wavelength of a sound at frequency over 20 kHz will be shorter than 17 mm, hence reflecting on objects which are larger than this. alps tries to overcome

implementation The difficulty outlined right above can be overcome by, firstly, have a redundant amount of measurements, i.e. adding a few loudspeakers or microphones "for good measure", so that when one signal is occluded, another one will be received.

Secondly, alps tries to use also distance measurements directly, without trilateration, wherever these are meaningful. As it happens, this is the case - for example in the alps auto-panner - where we track a device moving around a room in order to obtain a panning trajectory for a source represented by that device: As we can think of every sound source as a radiating point source, the distance measured from a point away from the loudspeaker can be interpreted as a reverse estimate of the amplitude of a signal to be sent from that loudspeaker: The fact that we have measured a signal means that the loudspeaker in question is active, and the value of the measurement tells us as to how large the amplitude needs to be according to the inverse square law. So even if we don't have enough readings to provide us with a pair for 2D tracking, (or a triplet for 3D tracking) this "1D" tracking approach provides us with an approximation when trilateration can not provide an estimate.

a tiny bit of "relativity theory"

As alps estimates distances based on time delays between signals it doesn't matter if the delays are between multiple senders or multiple receivers, as long as the signal is known, i.e. we compare all delayed signals to the same original. This has the implication that we can use the same code to estimate the position of a (moving) microphone in relation to multiple loudspeakers of known position or the position of a (moving) loudspeaker in relation to microphones of known position. Further, and this is crucial, it doesn't matter which set is moving, the loudspeaker(s) or the microphone(s), as the resulting position is relative to its frame of reference! This means we can use the same code for the alps auto-panner, where we estimate the moving position of a microphone (or more) within an array of loudspeakers (of known position in relation to each other), as for the alps headtracker,where we track an array of microphones (attached in fixed positions on a headset)in respect to one (or more) loudspeaker!

buffers, windows and correlation

In the inner workings of alps, what actually happens is, we chose a window of a certain amount of samples (a buffer of adequate length for the type of application we are developing, see below for details) and calculate the correlation between the original signal and the delayed signal for each difference/known position. In the resulting correlation signal, the sample with the highest value (The maximum) has the index number indicating the time delay directly, as we know the sampling rate.

pulsed signal

alps, as mentioned somewhere further up, uses pulsed signals. Here is why: In the early days of this project we used a continuous random noise signal on multiple loudspeakers at the same time and calculated the time delays for each loudspeaker. With this set up we rarely measured more than one correlated signal within a setup, usually the signal representing the shortest distance. What happend is that every added signal, during a measurement, (imaging 8 uncorrelated noise signals playing at the same time from 8 loudspeakers!), is noise for any other measurement than its own. The way round this is to "take turns", i.e., use a pulsed signal. There is a considerable snatch in this though: When taking turns, we have to wait until the sequence of measurements has completed before we have a position estimate. As a consequence, the more loudspeakers we use, the longer the measurement takes. The answer here is to use the shortest possible pulse length, and in larger spaces, to find out what the critical distance is where concurrent measurements are possible as the distances are too big to create a problem. (For this you can use the offsets in the configuration file for this)

However, the reverse procedure, as used in alps headtracker is less problematic: When we calculate the distance between multiple microphones and a single loudspeaker, only one measurement signal is necessary: Additional "listeners" don't make noise, only additional "speakers"! Form many applications this approach is therefore preferable.

position estimation and filtering

For the position estimation, alps uses trigonometrical Euclidian distance calculations for possible pairs of delayed signals, for 2D tracking. This is primarily for demonstration purposes only, and if you would like to develop something more advanced and in 3D we suggest you use the option in the configuration file to access the distance readings directly for all delays via OSC, or adjust the code accordingly. But alps doesn't need the known positions to be on the same plane, for example the loudspeakers of a system don't need to be on the same level, alps works out the projection on the plane according to the y - axis value you enter for the position in the configuration file.

tuning the system

The challenge is to find the right balance between update rate, area covered and latency for a particular situation: If you want to track an object in a large room you will have to trade in latency for an increase in covered area, on the other hand, the lower you'd like the latency to be, the smaller the covered area. This is a systemic problem, and this is why: Firstly, we can not "measure faster than sound flies": For every distance we measure, we have to wait at least as long as it takes sound to cover that distance before we can do any calculations. If the tracked device is 100 m away, we need around 0.3 seconds before we can compute it, if it is 10 m away, 0.03 seconds, but if it's only a meter, 0.003 seconds. Secondly, the greater the distances the signals travel, the worse the signal to noise ratio gets, as sound attenuates over distance, meaning that we need more loudspeakers/and or microphones, which slows the measurement down as the series of necessary pulses becomes longer.

alps - the code The alps code is in c++ and open source so feel free to use it adapt it and adopt it according to the licence.

The code for the current implementation, written by Victor Khashchanskiy is available here. It is being developed specifically as part of this project in a collaborative effort.

hardware considerations As to hardware, you will need a (fast) soundcard, microphone(s) with a good signal-to-noise ratio, and loudspeakers with a frequency range upo to 30 kHz. Many loudspeakers do! We used Genelecs 1029A loudspeakers and dpa 4061 microphones. The less directional the loudspeakers and the microphones are, the better. The oldest machine we used successfully as a processor, is a MacBook Air mid 2011 running MacOS Sierra. The faster the processor, the better though, as with every additional correlation calculation due to additional loudpseakers necessary for larger or more dense coverage, the processing time is increased.

This project is funded via a researcher's grant by Kone Foundation