musisep.audio package¶

Submodules¶

musisep.audio.performance module¶

Module for calculation of the performance measures for blind source separation.

musisep.audio.performance.measures(synth_signals, orig_signals, size=1048576)[source]¶

Compute the SDR, SIR, and SAR in all permutations of the synthesized signals.

Parameters

synth_signals (array_like) – Array with the synthesized signals in its rows
orig_signals (array_like) – Array with the original signals in its rows
size (int) – Length of the signal fragments to consider at once

Returns

perms (list of ndarray) – Permutations of the indices of the signals
measures (list of ndarray) – Arrays with SDR, SIR, and SAR for the signals in rows

musisep.audio.performance.orthogonalize(signals)[source]¶

Orthogonalize the given signals.

Parameters: signals (array_like) – Matrix with the signals in its rows
Returns: q_matrix – Matrix with the orthogonalized signals in its rows
Return type: ndarray

musisep.audio.performance.project(signals, q_matrix)[source]¶

Project the given signals on the given space.

Parameters

signals (array_like) – Matrix with the signals in its rows
q_matrix (array_like) – Matrix with an orthonormal basis of the space in its rows

Returns

proj_signals – Matrix with the projected signals in its rows

Return type

ndarray

musisep.audio.performance.select_perm(perms, measures)[source]¶

Select the permutation with the highest SIR sum.

Parameters

perms (list of array_like) – Permutations of the indices of the signals
measures (list of array_like) – Arrays with SDR, SIR, and SAR for the signals in rows

Returns

best_perm (ndarray) – Permutation of synth_signals with the lowest SIR sum
best_measure (ndarray) – SDR, SIR, and SAR in the permutation with the lowest SIR sum

musisep.audio.specttool module¶

Back-end module for the Griffin-Lim algorithm.

musisep.audio.specttool.adapt_mag()¶

musisep.audio.specttool.unstripe()¶

musisep.audio.wav module¶

Module to handle WAV audio data.

musisep.audio.wav.read(filename)[source]¶

Read WAV audio data from a file. If the data has multiple channels, they will be averaged.

Parameters

filename (string) – Name of the WAV file.

Returns

data (ndarray) – Audio data as double array with values in [-1,1].
samprate (int) – Sampling rate of the WAV file.

musisep.audio.wav.read_stereo(filename)[source]¶

Read WAV audio data from a file. If the data has multiple channels, they will be returned as rows of the output array.

Parameters

filename (string) – Name of the WAV file.

Returns

data (ndarray) – Audio data as double array with values in [-1,1].
samprate (int) – Sampling rate of the WAV file.

musisep.audio.wav.unify(in_data)[source]¶

Convert the input data to a double-type array with values in [-1,1]. Input type must be double, float32, int32, int16, or uint8.

Parameters: in_data (array_like) – Data to be unified
Returns: out_data – Unified data.
Return type: ndarray

musisep.audio.wav.write(filename, signal, samprate, normalize=False)[source]¶

Normalize WAV audio data and write it to a file. The data type should be floating-point and must be supported by scipy.io.wavfile.

Parameters

filename (string) – Name of the WAV file.
signal (array_like) – Audio data to write.
samprate (int) – Intended sampling rate of the WAV file.
normalize (bool) – Whether to normalize the output to [-1,1].

Returns

maxval – Number by which was divided during normalization.

Return type

scalar

musisep.audio.wav module¶

Module to generate spectograms, save them as images and resynthesize audio. When invoked, a side-by-side comparison of the spectrograms from the different methods is performed.

musisep.audio.spect.example_beethoven()[source]¶

musisep.audio.spect.example_brahms()[source]¶: Application of different transforms on a recording of the 1st violin sonata of Johannes Brahms.

musisep.audio.spect.example_delta_octaves()[source]¶: Comparison of the different representations with a delta transient and sinusoids.

musisep.audio.spect.example_delta_scale()[source]¶: Display of the properties of the smoothed CQT with a delta transient and a chromatic scale of sinusoids.

musisep.audio.spect.example_mozart()[source]¶: Application of the sparse pursuit method on the individual instrument tracks of the piece by Mozart.

musisep.audio.spect.gauss(x, stdev, normalize=True)[source]¶

Generate a Gaussian window/kernel with mean 0.

Parameters

x (array_like) – Points to evaluate the Gaussian
stdev (float) – Standard deviation
normalize (bool) – Whether to l1-normalize the Gaussian

Returns

window – Gaussian window/kernel

Return type

ndarray

musisep.audio.spect.istft(spect, siglen, sigmas, sampdist)[source]¶

Reconstruct an audio signal from a complex-valued linear-frequency spectrogram via orthogonal projection. If a sample cannot be inferred from the spectrogram, it is set to zero.

Parameters

spect (array_like) – Complex-valued linear-frequency spectrogram
siglen (int) – Intended length of the audio signal
sigmas (float) – Number of standard deviations after which to cut the window
sampdist (int) – Time intervals to sample the spectrogram

Returns

signal – Reconstructed audio signal

Return type

ndarray

musisep.audio.spect.logspect_cq(signal, spectheight, sigmas, sampdist, basefreq, minfreq, maxfreq, numfreqs, smooth=True)[source]¶

Compute the time-smoothed CQT of an audio signal.

Parameters

signal (array_like) – Audio signal
spectheight (int) – Height of the linear-frequency spectrogram
sigmas (float) – Number of standard deviations after which to cut the window/kernel
sampdist (int) – Time intervals to sample the spectrogram
basefreq (float) – Frequency to assume as a minimum for smoothing (normalized to the sampling frequency)
minfreq (float) – Minimum frequency to be represented (included) (normalized to the sampling frequency)
maxfreq (float) – Maximum frequency to be represented (excluded) (normalized to the sampling frequency)
numfreqs (float) – Height of the log-frequency spectrogram

Returns

logspect – Log-frequency magnitude spectrogram

Return type

ndarray

musisep.audio.spect.logspect_mel(signal, spectheight, sigmas, sampdist, basefreq, minfreq, maxfreq, numfreqs, eval_range=slice(None, None, None), scale=True)[source]¶

Compute the Mel-frequency spectrogram of an audio signal.

Parameters

signal (array_like) – Audio signal
spectheight (int) – Height of the linear-frequency spectrogram
sigmas (float) – Number of standard deviations after which to cut the window/kernel
sampdist (int) – Time intervals to sample the spectrogram
basefreq (float) – Frequency to assume as a minimum for smoothing (normalized to the sampling frequency)
minfreq (float) – Minimum frequency to be represented (included) (normalized to the sampling frequency)
maxfreq (float) – Maximum frequency to be represented (excluded) (normalized to the sampling frequency)
numfreqs (float) – Height of the log-frequency spectrogram
eval_range (slice) – Time range of the spectrogram to be computed
scale (bool) – Whether to adjust the l1 norm of the kernels w.r.t. frequency

Returns

logspect (ndarray) – Log-frequency magnitude spectrogram
spect (ndarray) – Linear-frequency magnitude spectrogram

musisep.audio.spect.logspect_pursuit(signal, spectheight, sigmas, sampdist, basefreq, minfreq, maxfreq, numfreqs, fsigma, eval_range=slice(None, None, None))[source]¶

Compute the log-frequency frequency via sparse pursuit.

Parameters

signal (array_like) – Audio signal
spectheight (int) – Height of the linear-frequency spectrogram
sigmas (float) – Number of standard deviations after which to cut the window/kernel
sampdist (int) – Time intervals to sample the spectrogram
basefreq (float) – Frequency to assume as a minimum for smoothing (normalized to the sampling frequency)
minfreq (float) – Minimum frequency to be represented (included) (normalized to the sampling frequency)
maxfreq (float) – Maximum frequency to be represented (excluded) (normalized to the sampling frequency)
numfreqs (float) – Height of the log-frequency spectrogram
fsigma (float) – Standard deviation (frequency)
eval_range (slice) – Time range of the spectrogram to be computed

Returns

logspect – Log-frequency magnitude spectrogram

Return type

ndarray

musisep.audio.spect.project_audio(spect, siglen, sigmas, sampdist, size=2000)[source]¶

Reconstruct an audio signal from a linear-frequency complex spectrogram via orthogonal projection.

Parameters

spect (array_like) – Linear-frequency magnitude spectrogram
siglen (int) – Intended length of the audio signal
sigmas (float) – Number of standard deviations after which to cut the window
sampdist (int) – Time intervals to sample the spectrogram
size (int) – Batch size for the FFT

Returns

signal – Reconstructed audio signal

Return type

ndarray

musisep.audio.spect.smoothconv(signal, window, kernel, sampdist)[source]¶

musisep.audio.spect.spectrogram(signal, spectheight, sigmas, sampdist, eval_range=slice(None, None, None))[source]¶

Calculate the linear-frequency magnitude spectrogram via STFT.

Parameters

signal (array_like) – Audio signal
spectheight (int) – Height of the linear-frequency spectrogram
sigmas (float) – Number of standard deviations after which to cut the window
sampdist (int) – Time intervals to sample the spectrogram
eval_range (slice) – Time range of the spectrogram to be computed

Returns

spectrogram – Linear-frequency magnitude spectrogram

Return type

ndarray

musisep.audio.spect.spectwrite(filename, spectrogram, color='viridis', db=100)[source]¶

Save a spectrogram as an image. The data is normalized to dynamic range of 100 dB, and a logarithmic Viridis color scale is used.

Parameters

filename (string) – Name of the image file
spectrogram (array_like) – Spectrogram
color (string or NoneType) – Whether to make a color plot

musisep.audio.spect.stft(signal, length, sigmas, sampdist, eval_range=slice(None, None, None))[source]¶

Calculate the linear-frequency spectrogram of a given audio signal by calling stripe and computing the FFT along the first axis.

Parameters

signal (array_like) – Audio signal
spectheight (int) – Height of the linear-frequency spectrogram
sigmas (float) – Number of standard deviations after which to cut the window
sampdist (int) – Time intervals to sample the spectrogram
eval_range (slice) – Time range of the spectrogram to be computed

Returns

spectrogram – Complex-valued linear-frequency spectrogram

Return type

ndarray

musisep.audio.spect.stripe(signal, spectheight, sigmas, sampdist, eval_range)[source]¶

Populate an array with time-shifted and windowed versions of an audio signal. This serves as a precursor for FFT calculation. The first spectrogram time frame coincides with the first sample in the signal. Out-of-bounds array entries are assumed as zero.

Parameters

signal (array_like) – Audio signal
spectheight (int) – Height of the linear-frequency spectrogram
sigmas (float) – Number of standard deviations after which to cut the window
sampdist (int) – Time intervals to sample the spectrogram
eval_range (slice) – Time range of the spectrogram to be computed

Returns

stripeplot – Populated array

Return type

ndarray

musisep.audio.spect.synth_audio(spect, siglen, sigmas, sampdist, iterations, guess=None, size=2000)[source]¶

Reconstruct an audio signal from a linear-frequency magnitude spectrogram via the algorithm by Griffin and Lim.

Parameters

spect (array_like) – Linear-frequency magnitude spectrogram
siglen (int) – Intended length of the audio signal
sigmas (float) – Number of standard deviations after which to cut the window
sampdist (int) – Time intervals to sample the spectrogram
iterations (int) – Number of Griffin-Lim iterations to perform
guess (array_like) – Initial value for the audio signal
size (int) – Batch size for the FFT

Returns

signal – Reconstructed audio signal

Return type

ndarray

musisep.audio.spect.winlog_spect(spect, freqs, basefreq, sigmas, scale=True)[source]¶

Apply a logarithmic transform on the frequency axis of a linear-frequency magnitude spectrogram while preserving the width of the horizontal lines via Gaussian smoothing. The attenuation of the higher frequency is counteracted by scaling.

Parameters

spect (array_like) – Linear-frequency magnitude spectrogram
freqs (array_like) – Frequencies to place the smoothing kernel (normalized to the sampling frequency)
basefreq (float) – Frequency to assume as a minimum for smoothing (normalized to the sampling frequency)
sigmas (float) – Number of standard deviations after which to cut the kernel
scale (bool) – Whether to adjust the l1 norm of the kernels w.r.t. frequency

Returns

logspect – Log-frequency magnitude spectrogram

Return type

ndarray

musisep.audio package¶

Submodules¶

musisep.audio.performance module¶

musisep.audio.specttool module¶

musisep.audio.wav module¶

musisep.audio.wav module¶

Module contents¶

Musisep

Navigation

Related Topics