musisep.audio package

Submodules

musisep.audio.performance module

Module for calculation of the performance measures for blind source separation.

musisep.audio.performance.measures(synth_signals, orig_signals, size=1048576)[source]

Compute the SDR, SIR, and SAR in all permutations of the synthesized signals.

Parameters
  • synth_signals (array_like) – Array with the synthesized signals in its rows

  • orig_signals (array_like) – Array with the original signals in its rows

  • size (int) – Length of the signal fragments to consider at once

Returns

  • perms (list of ndarray) – Permutations of the indices of the signals

  • measures (list of ndarray) – Arrays with SDR, SIR, and SAR for the signals in rows

musisep.audio.performance.orthogonalize(signals)[source]

Orthogonalize the given signals.

Parameters

signals (array_like) – Matrix with the signals in its rows

Returns

q_matrix – Matrix with the orthogonalized signals in its rows

Return type

ndarray

musisep.audio.performance.project(signals, q_matrix)[source]

Project the given signals on the given space.

Parameters
  • signals (array_like) – Matrix with the signals in its rows

  • q_matrix (array_like) – Matrix with an orthonormal basis of the space in its rows

Returns

proj_signals – Matrix with the projected signals in its rows

Return type

ndarray

musisep.audio.performance.select_perm(perms, measures)[source]

Select the permutation with the highest SIR sum.

Parameters
  • perms (list of array_like) – Permutations of the indices of the signals

  • measures (list of array_like) – Arrays with SDR, SIR, and SAR for the signals in rows

Returns

  • best_perm (ndarray) – Permutation of synth_signals with the lowest SIR sum

  • best_measure (ndarray) – SDR, SIR, and SAR in the permutation with the lowest SIR sum

musisep.audio.specttool module

Back-end module for the Griffin-Lim algorithm.

musisep.audio.specttool.adapt_mag()
musisep.audio.specttool.unstripe()

musisep.audio.wav module

Module to handle WAV audio data.

musisep.audio.wav.read(filename)[source]

Read WAV audio data from a file. If the data has multiple channels, they will be averaged.

Parameters

filename (string) – Name of the WAV file.

Returns

  • data (ndarray) – Audio data as double array with values in [-1,1].

  • samprate (int) – Sampling rate of the WAV file.

musisep.audio.wav.read_stereo(filename)[source]

Read WAV audio data from a file. If the data has multiple channels, they will be returned as rows of the output array.

Parameters

filename (string) – Name of the WAV file.

Returns

  • data (ndarray) – Audio data as double array with values in [-1,1].

  • samprate (int) – Sampling rate of the WAV file.

musisep.audio.wav.unify(in_data)[source]

Convert the input data to a double-type array with values in [-1,1]. Input type must be double, float32, int32, int16, or uint8.

Parameters

in_data (array_like) – Data to be unified

Returns

out_data – Unified data.

Return type

ndarray

musisep.audio.wav.write(filename, signal, samprate, normalize=False)[source]

Normalize WAV audio data and write it to a file. The data type should be floating-point and must be supported by scipy.io.wavfile.

Parameters
  • filename (string) – Name of the WAV file.

  • signal (array_like) – Audio data to write.

  • samprate (int) – Intended sampling rate of the WAV file.

  • normalize (bool) – Whether to normalize the output to [-1,1].

Returns

maxval – Number by which was divided during normalization.

Return type

scalar

musisep.audio.wav module

Module to generate spectograms, save them as images and resynthesize audio. When invoked, a side-by-side comparison of the spectrograms from the different methods is performed.

musisep.audio.spect.example_beethoven()[source]
musisep.audio.spect.example_brahms()[source]

Application of different transforms on a recording of the 1st violin sonata of Johannes Brahms.

musisep.audio.spect.example_delta_octaves()[source]

Comparison of the different representations with a delta transient and sinusoids.

musisep.audio.spect.example_delta_scale()[source]

Display of the properties of the smoothed CQT with a delta transient and a chromatic scale of sinusoids.

musisep.audio.spect.example_mozart()[source]

Application of the sparse pursuit method on the individual instrument tracks of the piece by Mozart.

musisep.audio.spect.gauss(x, stdev, normalize=True)[source]

Generate a Gaussian window/kernel with mean 0.

Parameters
  • x (array_like) – Points to evaluate the Gaussian

  • stdev (float) – Standard deviation

  • normalize (bool) – Whether to l1-normalize the Gaussian

Returns

window – Gaussian window/kernel

Return type

ndarray

musisep.audio.spect.istft(spect, siglen, sigmas, sampdist)[source]

Reconstruct an audio signal from a complex-valued linear-frequency spectrogram via orthogonal projection. If a sample cannot be inferred from the spectrogram, it is set to zero.

Parameters
  • spect (array_like) – Complex-valued linear-frequency spectrogram

  • siglen (int) – Intended length of the audio signal

  • sigmas (float) – Number of standard deviations after which to cut the window

  • sampdist (int) – Time intervals to sample the spectrogram

Returns

signal – Reconstructed audio signal

Return type

ndarray

musisep.audio.spect.logspect_cq(signal, spectheight, sigmas, sampdist, basefreq, minfreq, maxfreq, numfreqs, smooth=True)[source]

Compute the time-smoothed CQT of an audio signal.

Parameters
  • signal (array_like) – Audio signal

  • spectheight (int) – Height of the linear-frequency spectrogram

  • sigmas (float) – Number of standard deviations after which to cut the window/kernel

  • sampdist (int) – Time intervals to sample the spectrogram

  • basefreq (float) – Frequency to assume as a minimum for smoothing (normalized to the sampling frequency)

  • minfreq (float) – Minimum frequency to be represented (included) (normalized to the sampling frequency)

  • maxfreq (float) – Maximum frequency to be represented (excluded) (normalized to the sampling frequency)

  • numfreqs (float) – Height of the log-frequency spectrogram

Returns

logspect – Log-frequency magnitude spectrogram

Return type

ndarray

musisep.audio.spect.logspect_mel(signal, spectheight, sigmas, sampdist, basefreq, minfreq, maxfreq, numfreqs, eval_range=slice(None, None, None), scale=True)[source]

Compute the Mel-frequency spectrogram of an audio signal.

Parameters
  • signal (array_like) – Audio signal

  • spectheight (int) – Height of the linear-frequency spectrogram

  • sigmas (float) – Number of standard deviations after which to cut the window/kernel

  • sampdist (int) – Time intervals to sample the spectrogram

  • basefreq (float) – Frequency to assume as a minimum for smoothing (normalized to the sampling frequency)

  • minfreq (float) – Minimum frequency to be represented (included) (normalized to the sampling frequency)

  • maxfreq (float) – Maximum frequency to be represented (excluded) (normalized to the sampling frequency)

  • numfreqs (float) – Height of the log-frequency spectrogram

  • eval_range (slice) – Time range of the spectrogram to be computed

  • scale (bool) – Whether to adjust the l1 norm of the kernels w.r.t. frequency

Returns

  • logspect (ndarray) – Log-frequency magnitude spectrogram

  • spect (ndarray) – Linear-frequency magnitude spectrogram

musisep.audio.spect.logspect_pursuit(signal, spectheight, sigmas, sampdist, basefreq, minfreq, maxfreq, numfreqs, fsigma, eval_range=slice(None, None, None))[source]

Compute the log-frequency frequency via sparse pursuit.

Parameters
  • signal (array_like) – Audio signal

  • spectheight (int) – Height of the linear-frequency spectrogram

  • sigmas (float) – Number of standard deviations after which to cut the window/kernel

  • sampdist (int) – Time intervals to sample the spectrogram

  • basefreq (float) – Frequency to assume as a minimum for smoothing (normalized to the sampling frequency)

  • minfreq (float) – Minimum frequency to be represented (included) (normalized to the sampling frequency)

  • maxfreq (float) – Maximum frequency to be represented (excluded) (normalized to the sampling frequency)

  • numfreqs (float) – Height of the log-frequency spectrogram

  • fsigma (float) – Standard deviation (frequency)

  • eval_range (slice) – Time range of the spectrogram to be computed

Returns

logspect – Log-frequency magnitude spectrogram

Return type

ndarray

musisep.audio.spect.project_audio(spect, siglen, sigmas, sampdist, size=2000)[source]

Reconstruct an audio signal from a linear-frequency complex spectrogram via orthogonal projection.

Parameters
  • spect (array_like) – Linear-frequency magnitude spectrogram

  • siglen (int) – Intended length of the audio signal

  • sigmas (float) – Number of standard deviations after which to cut the window

  • sampdist (int) – Time intervals to sample the spectrogram

  • size (int) – Batch size for the FFT

Returns

signal – Reconstructed audio signal

Return type

ndarray

musisep.audio.spect.smoothconv(signal, window, kernel, sampdist)[source]
musisep.audio.spect.spectrogram(signal, spectheight, sigmas, sampdist, eval_range=slice(None, None, None))[source]

Calculate the linear-frequency magnitude spectrogram via STFT.

Parameters
  • signal (array_like) – Audio signal

  • spectheight (int) – Height of the linear-frequency spectrogram

  • sigmas (float) – Number of standard deviations after which to cut the window

  • sampdist (int) – Time intervals to sample the spectrogram

  • eval_range (slice) – Time range of the spectrogram to be computed

Returns

spectrogram – Linear-frequency magnitude spectrogram

Return type

ndarray

musisep.audio.spect.spectwrite(filename, spectrogram, color='viridis', db=100)[source]

Save a spectrogram as an image. The data is normalized to dynamic range of 100 dB, and a logarithmic Viridis color scale is used.

Parameters
  • filename (string) – Name of the image file

  • spectrogram (array_like) – Spectrogram

  • color (string or NoneType) – Whether to make a color plot

musisep.audio.spect.stft(signal, length, sigmas, sampdist, eval_range=slice(None, None, None))[source]

Calculate the linear-frequency spectrogram of a given audio signal by calling stripe and computing the FFT along the first axis.

Parameters
  • signal (array_like) – Audio signal

  • spectheight (int) – Height of the linear-frequency spectrogram

  • sigmas (float) – Number of standard deviations after which to cut the window

  • sampdist (int) – Time intervals to sample the spectrogram

  • eval_range (slice) – Time range of the spectrogram to be computed

Returns

spectrogram – Complex-valued linear-frequency spectrogram

Return type

ndarray

musisep.audio.spect.stripe(signal, spectheight, sigmas, sampdist, eval_range)[source]

Populate an array with time-shifted and windowed versions of an audio signal. This serves as a precursor for FFT calculation. The first spectrogram time frame coincides with the first sample in the signal. Out-of-bounds array entries are assumed as zero.

Parameters
  • signal (array_like) – Audio signal

  • spectheight (int) – Height of the linear-frequency spectrogram

  • sigmas (float) – Number of standard deviations after which to cut the window

  • sampdist (int) – Time intervals to sample the spectrogram

  • eval_range (slice) – Time range of the spectrogram to be computed

Returns

stripeplot – Populated array

Return type

ndarray

musisep.audio.spect.synth_audio(spect, siglen, sigmas, sampdist, iterations, guess=None, size=2000)[source]

Reconstruct an audio signal from a linear-frequency magnitude spectrogram via the algorithm by Griffin and Lim.

Parameters
  • spect (array_like) – Linear-frequency magnitude spectrogram

  • siglen (int) – Intended length of the audio signal

  • sigmas (float) – Number of standard deviations after which to cut the window

  • sampdist (int) – Time intervals to sample the spectrogram

  • iterations (int) – Number of Griffin-Lim iterations to perform

  • guess (array_like) – Initial value for the audio signal

  • size (int) – Batch size for the FFT

Returns

signal – Reconstructed audio signal

Return type

ndarray

musisep.audio.spect.winlog_spect(spect, freqs, basefreq, sigmas, scale=True)[source]

Apply a logarithmic transform on the frequency axis of a linear-frequency magnitude spectrogram while preserving the width of the horizontal lines via Gaussian smoothing. The attenuation of the higher frequency is counteracted by scaling.

Parameters
  • spect (array_like) – Linear-frequency magnitude spectrogram

  • freqs (array_like) – Frequencies to place the smoothing kernel (normalized to the sampling frequency)

  • basefreq (float) – Frequency to assume as a minimum for smoothing (normalized to the sampling frequency)

  • sigmas (float) – Number of standard deviations after which to cut the kernel

  • scale (bool) – Whether to adjust the l1 norm of the kernels w.r.t. frequency

Returns

logspect – Log-frequency magnitude spectrogram

Return type

ndarray

Module contents