Tutorial Contents

Discrete Wavelet Transform (DWT)

What is the DWT?

Comparison of DWT and FFT

Wavelet basis functions

Exploring the DWT

Using the DWT

Time-frequency analysis with DWT

De-noising signals

Threshold de-noising algorithms

De-noising real data

Contents

Discrete Wavelet Transform

Most discussions of the topic of wavelet transformation rapidly become very mathematical, and are hard-going for the average biologist. Robi Polikar provides an excellent introductory tutorial that tries to be accessible to non-mathematicians, and the paper by Letelier & Weber (2000) also has a very informative introductory section. My own understanding owes a lot to these sources.

The following description is completely non-mathematical and is intended only to give an intuitive introduction to the topic, plus enough information to be able to interpret the output from wavelet transformations.

What is the Discrete Wavelet Transform?

The discreteThe discrete wavelet transform deals with digitized signals. There is a sister continuous wavelet transform that deals with analogue signals, but that is not discussed at all here. wavelet transform (DWT) is a mathematical process that quantifies how much energy is contained within specific frequency bands at particular times within a signal. One way for a non-mathematician to get a conceptual understanding of the DWT and its importance is to consider its similarities and differences to the more familiar Fourier transform (see e.g. Graps, 1995).

Comparison of Wavelet and Fourier Transforms

The principle underlying Fourier transformation is that any waveform of any shape can be deconstructedThe Fourier deconstruction requires that the signal be stationary - which in this context means that it repeats for infinite time. This is obviously untrue for real signals, but it can be treated as though it were true so long as the signal is tapered ("windowed") to 0 at both ends to avoid a likely high-frequency transient at the transition from one repeat to the next. into a series of sine and cosine waves, each with its own frequency and amplitude. The amplitude of the wave at a particular frequency reflects how much power that frequency contributes to the original signal. The Fourier transform is fully reversible, which means that the original signal can be reconstructed by adding together all the sinusoidal waves of these different frequencies, with each one weighted by a value that depends upon its power contribution. This is the inverse Fourier transform. The sinusoidal waves are thus a form of basis waveforms, a term that also occurs in principal component analysis.

Information in the Time- and Frequency-Domain

When you do Fourier analysis, you start with a raw stream of data that contains a complex mixture of waveforms. These may include rapidly-changing (high frequency) signals such as spikes and slowly-changing (low frequency) signals such as generator potentials, and these may occur at all sorts of times within the record. You know the timing of these events, but you have no information other than visual intuition about the frequency content. After Fourier analysis what you have is a list of frequencies and power, but you do not have any information about the timing of different frequency components within the original waveform. This is not actually lost (it can be recovered by the inverse transform) but it is hidden because in the output from the Fourier transform each frequency component is “smeared” across the whole of the original signal, and the timing information is distributed amongst the phases of the sine waves. So in Fourier analysis you make a complete transformation between information in the time domain to information in the frequency domain.

The big difference between Fourier transformation and wavelet transformation (and the main reason for the usefulness of the latter) is that when you do a wavelet transformation, you only move part of the way from the time domain to the frequency domain. After wavelet analysis, you gain some information about the frequency content of the original signal, but you also retain some information about when in the original signal particular frequencies occur. So, for example, you can pick up a spike as a high-frequency signal and gain information about that frequency, but you can also localise the time of occurrence of that spike. In Fourier analysis by contrast, you will detect that there is a high-frequency component in the record, but you have no idea when it occurs. You could divide your data up into short sections and do a separate Fourier analysis on each (which is the method for producing a spectrogram), but there is no such thing as a free lunchAlso know as Heisenberg’s uncertainty principle, although in a rather different context., and the shorter your data chunks, the lower your frequency resolution.

Wavelet Basis Functions

Like the Fourier transform, the wavelet transform deconstructs the original signal waveform into a series of basis waveforms, which in this case are called wavelets. However, unlike the simple sinusoidal waves of Fourier analysis, the wavelet shapes are complex, and, at first sight apparently arbitrary – they look like random squiggles (although in fact they fulfil rigorous mathematical requirements). One important feature that all wavelets share is that they are bounded, i.e. they decline to zero amplitude at some distance either side of the centre, which is in obvious contrast to the sine/cosine waves used in Fourier analysis, which go on forever. This is the underlying key to the time localisation of the DWT.

There are a whole series of different types of “mother “ wavelets (Daubechies, Coiflet, Symmlet etc) available, and each type occurs in a range of sizes (Daubechies-4, Daubechies-8 etc.). A particular episode of wavelet analysis only uses one type of mother wavelet; the user decides which type and size to use depending on the characteristics of the signal to be analysed (and probably some trial-and-error). A couple of examples of mother wavelets are shown below.

 

a
Daubechies 4 mother wavelet
b
Coiffle-30 mother wavelet
Example mother wavelets. a. Daubechies-4. b. Coiflet-30.

After transformation of a raw data signal using a particular mother wavelet you end up with basis waveforms consisting of a series of daughter wavelets. The daughter wavelets are all compressed or expanded versions of the mother wavelet (they have different scales or frequencies), and each daughter wavelet extends across a different part of the original signal (they have different locations). The daughter wavelets are ordered in a hierarchy; at the top of the hierarchy are a series of identical high-frequency wavelets, each occupying a sequence of narrow time locations spanning the duration of the original signal waveform. In this top level there are half as many wavelets as there are samples in the original signal. In the next level down in the hierarchy there are half as many wavelets as in the top level, and these each have half the frequency of the top level, but twice the time duration, again spanning the duration of the original signal. And so on down the hierarchy until we have accounted for as many wavelets as there are samples in the original signal.

Wavelet Coefficients

The important point is that each daughter wavelet is associated with a corresponding coefficient that specifies how much the daughter wavelet at that scale contributes to the raw signal at that location. It is these coefficients that contain the information relating to the original input signal, since the daughter wavelets derived from a particular mother wavelet are completely fixed and independent of the input signal. Like the Fourier transform, the wavelet transform is reversible - you can reconstruct the original signal by adding together the appropriately daughter wavelets, each weighted by its associated coefficient.

This may all become clearer when we see the discrete wavelet transform in action.

Exploring the Discrete Wavelet Transform

This file only contains 64 data points, all set to 0, all of which are displayed. Note that both the Fourier and the wavelet transforms in their standard implementations only operate on sections of records containing a sample number that is a power-of-two, so the whole-file length of 64 meets this criterion.

The dialog applies the transform to the data (in a selected trace) visibleThe data are truncated to the nearest power-of-2 sample number. If you have two vertical cursors visible in the view, the data is taken from the region between them, similarly truncated. in the main view, and displays the resulting wavelet coefficients as a colour-coded map. However, the dialog also allows you to edit the coefficients and immediately apply the reverse transform, so that you can see the how each coefficient interacts with the raw signal.

At the top left of the dialog is a drop-down list from which you can select the Wavelet type for the transform. The Daubechies-6 (Daub-6) wavelet is selected by default, and this is a fine default choice. The raw Data are displayed on the bottom trace, but since the data are all zero, this just shows a flat line. The DWT of the raw data is performed immediately the dialog box appears, and produces another set of 64 numbers, the wavelet coefficients. The squares of these values (val2: their power) are displayed as colours in a colour-coded map towards the top of the dialog box (click the Col table button in the Map frame to see the default coding). The map consists of blocks arranged in a hierarchy of rows and columns, and these correspond to the wavelet hierarchy described above. The blocks are all blue coloured, indicating that all the DWT coefficients have zero value and hence zero power, which is not surprising because the input signal is all zero and hence has no energy. You could show the coefficient colour code rather than the power by selecting Coeff in the Map group.

The top row has 32 equal-width blocks (with 32 being half the number of samples in the original signal) and is referred to as detail level 1 (d1). Each block colour tells you how much power is contained within that time location in the frequency range 250 - 500 Hz. This range derives from the fact that the Nyquist frequency is 500 Hz and so that is the ceiling for frequency analysis, and the detail level spans the frequency range from its upper bound to half its upper bound.

The next row down (d2) has 16 blocks and hence half the time resolution, but its blocks show power in the frequency range 125-250 Hz, and hence it has twice the frequency resolution compared to the level above. And so on down.

The bottom two rows each have 2 blocks. The d5 level (one up from the bottom) is conceptually the same as the preceding levels, but the bottom row itself (r6) is a bit different. This is the residual level (also called the approximation, rough or smoothed level) and it contains the remaining power that has not been accounted for in the levels above. It has the same time resolution as the preceding detail level.

How do we interpret the 64 numbers of the transformed data?

In the View/Edit frame at the top-right of the dialog box you will see the Level box shows 1, the Rel start box shows 30, and the Dur box shows 2 and the Coeff box shows 0. The Level value refers to the top row (detail level 1) of the map, and, as indicated at the left of the map, this represents frequencies in the range 250-500 Hz. Note that this is quite a large range of frequency, so the frequency information is not very precise. The Rel start value tells us that we are looking at a time location 30 ms after the start of the record. The Dur value tells us that the coefficient applies to just 2 ms of the data. Note that this value is quite small, so the location information is quite precise (the trade-off between information in the time and frequency domains). The Coeff value tells us that the raw data has no power corresponding to the Daub-6 mother wavelet at this location and frequency. By convention, this coefficient is given the identifying notation d1,15, indicating that it is coefficient 15 (the index is 0-based) in detail level 1, and this is indicated beside the Coeff edit box.

This row represents frequencies in the range 125-250 Hz, which is a narrower range than that of the row above. However, the block duration is broader, at 4 ms of data. Again, the coefficient is zero.

The dialog should now look like this:

Discrete wavelet transform dialog.
The discrete wavelet transform dialog with a Daubechies-6 wavelet occurring at a particular location in the record.

The line graph immediately below the map shows the values of the coefficients in the selected level d1. This is the coefficient graph. The values are all zero, except for the value 1 just near the middle. This reflects the coeficient value that you entered. The bottom line graph shows the result of an inverse DWT where all coefficients are zero except for the one you altered. This is the data graph. Consequently this waveform is an exact representation of the Daubechies-6 mother wavelet at this scale and location. Another way to look at this is in terms of the forward transformation; if the raw data signal displayed in this data graph is subjected to the Daub-6 DWT, it gives coefficients with values of 0 at all locations except the one that we changed by editing. What this means is that if we were to take a Daub-6 mother wavelet and scale it appropriately (expand or contract it in the horizontal time domain), and then step it along the raw data signal, it would precisely match the waveform of the raw data at this location (giving a coefficient value of 1), and that this accounts for all the power in the signal, since at all other locations not covered by the wavelet, the values of the raw data are exactly zero. The mathematics of the DWT perform exactly this scaling, stepping and matching process by means of a complex series of filtering operations.

Note that an inverted version of the same shape but twice the size appears on the data graph to the left of the original. Again note that from the location of the non-zero coefficients we have quite precise information about when the squiggles are occurring in the raw waveform, but we only know their frequency within rather a wide range.

Note that the frequency at this level covers a lower, but much narrower range (63-125 Hz), so we have a more precisely defined frequency, but that the duration (width of the block) is now 8 ms, and hence the location is less precisely defined than in level 1.

The data graph (showing the inverse DWT) shows a squiggle with the same geometric shape as that which we saw previously, but spread out in time, i.e. containing a lower frequency of signal.

The data graph now contains a mixture of the high-frequency squiggle and the lower frequency squiggle. Because the two coefficients both have a value of 1, the mixture is simply the additive sum of the two original waveforms. If you use coefficients greater or less than 1, then the corresponding daughter wavelet makes a greater or lesser contribution to the data signal.

You may now begin to appreciate that by setting appropriate values for coefficients at appropriate locations in the appropriate levels, you can construct any waveform in the data graph by the inverse DWT, and thus that any raw data input signal can be deconstructed into a series of pre-determined wavelets and coefficients by the forward DWT.

The coefficients that you obtain with the DWT depend very much on the original signal (obviously), but also on which mother wavelet you choose for the transformation.

The forward transform using a Daubechies-8 mother wavelet is immediately performed on the data graph, which itself was produced by the inverse transform using the Daubechies-6 wavelet series. Now there are several non-zero coefficients scattered over several levels. This is because the Daub-8 wavelet does not exactly fit the data signal at any point in the record, and so a whole series of daughter wavelets have to be summed at different scales and locations to “cancel out” and produce the flat line signal away from the central squiggle in the data, and also to produce the appropriate shape of data squiggle itself.

3-D Block Graph

This shows a 3D column graph, where the heights of the columns (on the Y axis) reflects the power of the wavelet at the appropriate locaton (on the X axis) and frequency (on the Z axis). Because power is simply the square of the coefficient value, all power values are positive in sign.

You now see the X-Z projection of the graph, which should have the same colour distribution as the original 2D map in the parent DWT dialog.

Note that wavelet coefficients can have both positive and negative values, so the 3D columns now go both up and down from the zero level.

Using the wavelet transform

In DataView, the wavelet transform is used for three main purposes.

  1. To examine the time-frequency characteristics of a section of data (rather like a spectrogram).
  2. To reduce the noise in a signal. This de-noising function is closely related to the main commercial use of the DWT, which is data compression.
  3. To categorise waveforms in spike sorting.

The choice of which mother wavelet to use in these facilities is very much dependent on the raw waveform, and can probably best be discovered by trial-and-error. The general principle is to use a wavelet whose shape resembles that of the main components in the raw signal. This will give a better time-frequency resolution than if you carry out the analysis using a wavelet type that does not resemble the raw signal. The Daubechies-6 or -8 wavelet is a good starting point for extracellular spikes.

Time-Frequency Analysis with DWT and a Real Signal

The DWT Analysis duration (shown at the bottom of the dialog) is only 102.4 ms. This is because the 200 ms of data visible in the main view is truncated to the next lowest power-of-2 samples in the recording. The data graph (the bottom graph in the dialog) shows a big spike just to the left of centre, and a few smaller spikes, which is slightly more than the first half of the data visible in the main view.

There is a cluster of high-power coefficients (lighter colours) clearly visible in the map at level d5 the time location of the big spike, and a lighter colour band spread across the base (r11). This latter reflects a DC offsetThis is a result of poor zero-value calibration in the electronics of the AC amplifier used to make the recording, and should not exist. in the trace.

By default the DWT dialog starts with Level 1 selected. The coefficient graph (below the colour map) shows a “burst” of coefficients at the time of the large spike, indicating that the spike does have some power in the level 1 frequency range (5000 - 10000 Hz). However, the coefficients in this burst have values of only about +/- 100 (note the values in the scale edit boxes to the left of the graph), and there is no visible deviation from the blue colour in level 1 of the colour map.

The coefficient graph autoscales by default, and the peak values in this frequency range are nearly 200 times greater than those in level 1. Level 5 encompasses the 313 - 625 Hz frequency band, so this is the dominant frequency of the big spikes. It is also the dominant frequency of the small spikes visible in the data graph, but their coefficients are smaller because they themselves are small in the raw signal. To make them more prominent in the colour map we could adjust the map scaling (available by clicking the Col table button), but we will not do that in this tutorial.

The peak values in the coefficient graph get successively smaller at the higher frequency bands, confirming the visual appearance of the colour map.

De-Noising Signals with DWT (and Data Compression)

Many of the coefficients in the DWT of a real signal are very close to zero, which suggests that the coefficients are largely being generated by low amplitude noise in the signal. If this is the case, then the information contained in these coefficients is meaningless, and could be lost without penalty. This is the basis of a major type of commercial data compression. A raw data signal such as music is subjected to DWT. Any coefficients below some threshold criterion are set to 0. When there are runs of 0s in the DWT output, these can be efficiently coded by some flag that says a run is starting, followed by the length of the run, rather than by storing all the 0s individually. This is known as run-length encoding, or RLE compression. The compressed DWT data can be stored or transmitted, and then reconstructed by decoding the RLE, followed by inverse DWT. The compression method is “lossy” (the RLE is non-lossy, but the thresholding of coefficients is lossy), but since what is lost is very largely noise, this does not degrade the information content (much).

The great advantage of DWT compression over some Fourier (frequency) based system is that it removes unnecessary frequency bands on a moment-by-moment basis from the signal, rather than removing an entire frequency band from the whole signal. Thus there may be a section of music with no high frequency sound in it. The coefficients for the higher frequency levels of the DWT will all be close to 0 in this section, and can safely be set to 0 for RLE. However, in a later section of the music there may be high frequency sound, and then the coefficients will be significant, and will be retained.

DataView does not perform data compression. However, exactly the same principle can be applied as a form of noise elimination. The normal way of applying DWT for noise reduction is through the Transform: Filter: Wavelet de-noise menu command. However, the same facilities (plus some others) are available through the Mass edit button in the Discrete Wavelet Transform dialog, and we will use this because it is easier to see what is going on.

This allows us to set coefficients in a whole level, or the whole analysis, to an explicit value, or to apply a de-noising algorithm. We will start by simply setting whole levels to 0, which is what the default values in the Mass edit DWT dialog do.

Note that in the main DWT dialog all the coefficients in level 1 have been set to 0. This will slightly alter the big spike, because we know that this has some power in this level, but it is a very small percentage of its total power. Meanwhile, we have halved the number of coefficients needed to store the data. However, the noise does not seem to have changed much.

Setting whole levels of coefficients to 0 completely misses the point of the DWT in terms of noise reduction because it makes no use of the time-localisation properties of the DWT. It has essentially the same effect as applying multiple band-stop filters to the signal. The proper method is to use a threshold technique, where coefficients below a certain value are attenuated at all levels of the signal.

There has been a lot of research by wavelet experts into how to determine the best value for the threshold and exactly how to modify the coefficients in relation to the threshold. The Mass edit dialog box offers several options for these decisions, and these and their underlying rationale are described briefly below.

Threshold De-Noising Algorithms

De-noising with the DWT requires two decisions. First, what threshold should be set, below which coefficients will be regarded as representing noise? Second, what algorithm should be applied to these noise coefficients to reduce their amplitude?

Threshold

There are 4 common choices for determining the threshold level λ.

  1. The absolute method simply involves looking at the value of coefficients and making an educated guess! This is what we used above.
  2. The quantile method places the absolute value of all the coefficients into a size-ordered list, and sets λ to the value a certain percentage distance from the top of the list. Thus with 512 coefficients (n = 512), the 20% quantile would set λ to the value of the 102nd largest absolute coefficient.
  3. The universal threshold (external noise estimate) relies on measuring the noise level σ from the raw data. The algorithm comes from Jaffard (2001):

    Scale each datum point in the raw signal by dividing by the square root of the sample count n before performing the DWT (re-scale the data signal after performing the inverse DWT).
    Set σ = standard deviation of a "quiet" section of this scaled data.
    Set threshold λ = σ *√(2 * ln(n)/n)

  4. The universal threshold (internal noise estimate) relies on estimating the noise level from the variation in the finest detail level of the DWT coefficients themselves. The algorithm is described by Antoniadis et al. (2001):

    Set med = median of coefficients of the finest detail level
    Set MAD = median of absolute deviation of coefficients from med
    Set σ = MAD /0.6745.
        (The value 0.6745 approximately converts a robust MAD to a Gaussian standard deviation.)
    Set threshold λ = σ *√(2 * ln(n))

Algorithm

There are 3 common algorithms that can be applied to transform the original coefficient Corg to a new value Cnew in relation to the selected threshold level λ. These apply “keep, shrink or kill” rules in various combinations. (In the following rules the coefficient is assumed to be positive; in a real application the coefficient sign propagates to its new value.)

  1. The hard threshold method applies a “keep or kill” algorithm. Coefficients whose absolute value is above the threshold λ are unchanged; those that are equal to or less than λ are set to zero. This method is simple, but is sensitive to small changes in the data and can increase the variance of the output.
  2. The soft threshold method applies a “shrink or kill” algorithm. Coefficients whose absolute value is above λ are reduced by λ; those that are equal to or less than λ are set to zero. This is more stable than the hard threshold method, but biases the output since even large (significant, non-noise) coefficients are reduced.
  3. The SCADSCAD = smoothly clipped absolute deviation. threshold method applies a “keep, shrink or kill” algorithm. Large coefficients are unchanged, middling coefficients are shrunk, and small coefficients are set to zero according to the following formula:

    Corg is the absolute value of a coefficient, Cnew is the sign-corrected new value of the coefficient

    If Corg > 3.7 λ then
          Cnew
    = Corg
    If Corg <= 3.7 λ and Corg > 2 λ then
          Cnew = ((3.7 - 1)Corg – 3.7 λ Corg)/(3.7 - 2)
    If Corg<= 2 λ then
         Cnew= Corg * max of Corg - λ and 0
    The “magic value” of 3.7 in this method comes from a Bayesian argument.

The details of these thresholding methods come from Antoniadis et al. (2001).

In the figure below DWT was performed on the first 1024 samples (51.2 ms) in the file r3s. The absolute value of the coefficients were placed in rank order, and the threshold was set at the 20% level (i.e. the value of the 205th largest coefficient). The three transform algorithms were applied.

de-noise threshold algorithm
DWT de-noise transform algorithms. The absolute values of the raw coefficients were placed in rank order (blue line visible on the right), and the 3 thresholding algorithms were applied. Each algorithm produces zero-valued coefficients below the theshold value.The hard algorithm (purple line) leaves coefficients above the threshold unchanged, but produces a sudden drop to zero at the threshold level. The soft algorithm (red line) reduces all coefficient values by the threshold value. The SCAD algorithm (cyan line) leaves large coefficients unchanged, but gradually shrinks coefficients as they approach the threshold value.

De-Noising Real Data

The file is a fragment of a tetrode recording made from the brain of a freely-moving mouse.

When the new file loads you can see the transformed trace (2) has less noise than the untransformed trace (1).

The coefficient values for level 1 in the transformed trace 2 are mostly 0, indicating that in the untransformed trace these coefficients were in the "kill" zone of the "keep, shrink or kill" SCAD algorithm. There are non-zero coefficient values at the time of the obvious spike in the data graph, but the coefficients are smaller than those at this time in the untransformed trace, indicating that they were in the "shrink" zone. You can switch back-and-forth between traces 1 and 2 to observe these differences.

There is no change in the value of the coefficients associated with large spike in the data trace, indicating that in the untransformed trace these coefficients are in the "keep" zone (although there is definite shrinkage in the smaller coefficients at this level, which appropriate since the algorithm applies to all levels).

You can see the differences in the actual data traces more clearly by giving the traces different colours, superimposing them on the same axis, and zooming in:

wavelet de-noise
De-noising data using the wavelet transform. The noise threshold was set from the data, and the SCAD algorithm was used to transform the wavelet coefficients. The original data (red) and the transformed data (blue) are superimposed. Note that the large spike is virtually unchanged, but that the smaller fluctuations are attenuated.