musisep.audio package¶
Submodules¶
musisep.audio.performance module¶
Module for calculation of the performance measures for blind source separation.
- musisep.audio.performance.measures(synth_signals, orig_signals, size=1048576)[source]¶
Compute the SDR, SIR, and SAR in all permutations of the synthesized signals.
- Parameters
synth_signals (array_like) – Array with the synthesized signals in its rows
orig_signals (array_like) – Array with the original signals in its rows
size (int) – Length of the signal fragments to consider at once
- Returns
perms (list of ndarray) – Permutations of the indices of the signals
measures (list of ndarray) – Arrays with SDR, SIR, and SAR for the signals in rows
- musisep.audio.performance.orthogonalize(signals)[source]¶
Orthogonalize the given signals.
- Parameters
signals (array_like) – Matrix with the signals in its rows
- Returns
q_matrix – Matrix with the orthogonalized signals in its rows
- Return type
ndarray
- musisep.audio.performance.project(signals, q_matrix)[source]¶
Project the given signals on the given space.
- Parameters
signals (array_like) – Matrix with the signals in its rows
q_matrix (array_like) – Matrix with an orthonormal basis of the space in its rows
- Returns
proj_signals – Matrix with the projected signals in its rows
- Return type
ndarray
- musisep.audio.performance.select_perm(perms, measures)[source]¶
Select the permutation with the highest SIR sum.
- Parameters
perms (list of array_like) – Permutations of the indices of the signals
measures (list of array_like) – Arrays with SDR, SIR, and SAR for the signals in rows
- Returns
best_perm (ndarray) – Permutation of synth_signals with the lowest SIR sum
best_measure (ndarray) – SDR, SIR, and SAR in the permutation with the lowest SIR sum
musisep.audio.specttool module¶
Back-end module for the Griffin-Lim algorithm.
- musisep.audio.specttool.adapt_mag()¶
- musisep.audio.specttool.unstripe()¶
musisep.audio.wav module¶
Module to handle WAV audio data.
- musisep.audio.wav.read(filename)[source]¶
Read WAV audio data from a file. If the data has multiple channels, they will be averaged.
- Parameters
filename (string) – Name of the WAV file.
- Returns
data (ndarray) – Audio data as double array with values in [-1,1].
samprate (int) – Sampling rate of the WAV file.
- musisep.audio.wav.read_stereo(filename)[source]¶
Read WAV audio data from a file. If the data has multiple channels, they will be returned as rows of the output array.
- Parameters
filename (string) – Name of the WAV file.
- Returns
data (ndarray) – Audio data as double array with values in [-1,1].
samprate (int) – Sampling rate of the WAV file.
- musisep.audio.wav.unify(in_data)[source]¶
Convert the input data to a double-type array with values in [-1,1]. Input type must be double, float32, int32, int16, or uint8.
- Parameters
in_data (array_like) – Data to be unified
- Returns
out_data – Unified data.
- Return type
ndarray
- musisep.audio.wav.write(filename, signal, samprate, normalize=False)[source]¶
Normalize WAV audio data and write it to a file. The data type should be floating-point and must be supported by scipy.io.wavfile.
- Parameters
filename (string) – Name of the WAV file.
signal (array_like) – Audio data to write.
samprate (int) – Intended sampling rate of the WAV file.
normalize (bool) – Whether to normalize the output to [-1,1].
- Returns
maxval – Number by which was divided during normalization.
- Return type
scalar
musisep.audio.wav module¶
Module to generate spectograms, save them as images and resynthesize audio. When invoked, a side-by-side comparison of the spectrograms from the different methods is performed.
- musisep.audio.spect.example_brahms()[source]¶
Application of different transforms on a recording of the 1st violin sonata of Johannes Brahms.
- musisep.audio.spect.example_delta_octaves()[source]¶
Comparison of the different representations with a delta transient and sinusoids.
- musisep.audio.spect.example_delta_scale()[source]¶
Display of the properties of the smoothed CQT with a delta transient and a chromatic scale of sinusoids.
- musisep.audio.spect.example_mozart()[source]¶
Application of the sparse pursuit method on the individual instrument tracks of the piece by Mozart.
- musisep.audio.spect.gauss(x, stdev, normalize=True)[source]¶
Generate a Gaussian window/kernel with mean 0.
- Parameters
x (array_like) – Points to evaluate the Gaussian
stdev (float) – Standard deviation
normalize (bool) – Whether to l1-normalize the Gaussian
- Returns
window – Gaussian window/kernel
- Return type
ndarray
- musisep.audio.spect.istft(spect, siglen, sigmas, sampdist)[source]¶
Reconstruct an audio signal from a complex-valued linear-frequency spectrogram via orthogonal projection. If a sample cannot be inferred from the spectrogram, it is set to zero.
- Parameters
spect (array_like) – Complex-valued linear-frequency spectrogram
siglen (int) – Intended length of the audio signal
sigmas (float) – Number of standard deviations after which to cut the window
sampdist (int) – Time intervals to sample the spectrogram
- Returns
signal – Reconstructed audio signal
- Return type
ndarray
- musisep.audio.spect.logspect_cq(signal, spectheight, sigmas, sampdist, basefreq, minfreq, maxfreq, numfreqs, smooth=True)[source]¶
Compute the time-smoothed CQT of an audio signal.
- Parameters
signal (array_like) – Audio signal
spectheight (int) – Height of the linear-frequency spectrogram
sigmas (float) – Number of standard deviations after which to cut the window/kernel
sampdist (int) – Time intervals to sample the spectrogram
basefreq (float) – Frequency to assume as a minimum for smoothing (normalized to the sampling frequency)
minfreq (float) – Minimum frequency to be represented (included) (normalized to the sampling frequency)
maxfreq (float) – Maximum frequency to be represented (excluded) (normalized to the sampling frequency)
numfreqs (float) – Height of the log-frequency spectrogram
- Returns
logspect – Log-frequency magnitude spectrogram
- Return type
ndarray
- musisep.audio.spect.logspect_mel(signal, spectheight, sigmas, sampdist, basefreq, minfreq, maxfreq, numfreqs, eval_range=slice(None, None, None), scale=True)[source]¶
Compute the Mel-frequency spectrogram of an audio signal.
- Parameters
signal (array_like) – Audio signal
spectheight (int) – Height of the linear-frequency spectrogram
sigmas (float) – Number of standard deviations after which to cut the window/kernel
sampdist (int) – Time intervals to sample the spectrogram
basefreq (float) – Frequency to assume as a minimum for smoothing (normalized to the sampling frequency)
minfreq (float) – Minimum frequency to be represented (included) (normalized to the sampling frequency)
maxfreq (float) – Maximum frequency to be represented (excluded) (normalized to the sampling frequency)
numfreqs (float) – Height of the log-frequency spectrogram
eval_range (slice) – Time range of the spectrogram to be computed
scale (bool) – Whether to adjust the l1 norm of the kernels w.r.t. frequency
- Returns
logspect (ndarray) – Log-frequency magnitude spectrogram
spect (ndarray) – Linear-frequency magnitude spectrogram
- musisep.audio.spect.logspect_pursuit(signal, spectheight, sigmas, sampdist, basefreq, minfreq, maxfreq, numfreqs, fsigma, eval_range=slice(None, None, None))[source]¶
Compute the log-frequency frequency via sparse pursuit.
- Parameters
signal (array_like) – Audio signal
spectheight (int) – Height of the linear-frequency spectrogram
sigmas (float) – Number of standard deviations after which to cut the window/kernel
sampdist (int) – Time intervals to sample the spectrogram
basefreq (float) – Frequency to assume as a minimum for smoothing (normalized to the sampling frequency)
minfreq (float) – Minimum frequency to be represented (included) (normalized to the sampling frequency)
maxfreq (float) – Maximum frequency to be represented (excluded) (normalized to the sampling frequency)
numfreqs (float) – Height of the log-frequency spectrogram
fsigma (float) – Standard deviation (frequency)
eval_range (slice) – Time range of the spectrogram to be computed
- Returns
logspect – Log-frequency magnitude spectrogram
- Return type
ndarray
- musisep.audio.spect.project_audio(spect, siglen, sigmas, sampdist, size=2000)[source]¶
Reconstruct an audio signal from a linear-frequency complex spectrogram via orthogonal projection.
- Parameters
spect (array_like) – Linear-frequency magnitude spectrogram
siglen (int) – Intended length of the audio signal
sigmas (float) – Number of standard deviations after which to cut the window
sampdist (int) – Time intervals to sample the spectrogram
size (int) – Batch size for the FFT
- Returns
signal – Reconstructed audio signal
- Return type
ndarray
- musisep.audio.spect.spectrogram(signal, spectheight, sigmas, sampdist, eval_range=slice(None, None, None))[source]¶
Calculate the linear-frequency magnitude spectrogram via STFT.
- Parameters
signal (array_like) – Audio signal
spectheight (int) – Height of the linear-frequency spectrogram
sigmas (float) – Number of standard deviations after which to cut the window
sampdist (int) – Time intervals to sample the spectrogram
eval_range (slice) – Time range of the spectrogram to be computed
- Returns
spectrogram – Linear-frequency magnitude spectrogram
- Return type
ndarray
- musisep.audio.spect.spectwrite(filename, spectrogram, color='viridis', db=100)[source]¶
Save a spectrogram as an image. The data is normalized to dynamic range of 100 dB, and a logarithmic Viridis color scale is used.
- Parameters
filename (string) – Name of the image file
spectrogram (array_like) – Spectrogram
color (string or NoneType) – Whether to make a color plot
- musisep.audio.spect.stft(signal, length, sigmas, sampdist, eval_range=slice(None, None, None))[source]¶
Calculate the linear-frequency spectrogram of a given audio signal by calling stripe and computing the FFT along the first axis.
- Parameters
signal (array_like) – Audio signal
spectheight (int) – Height of the linear-frequency spectrogram
sigmas (float) – Number of standard deviations after which to cut the window
sampdist (int) – Time intervals to sample the spectrogram
eval_range (slice) – Time range of the spectrogram to be computed
- Returns
spectrogram – Complex-valued linear-frequency spectrogram
- Return type
ndarray
- musisep.audio.spect.stripe(signal, spectheight, sigmas, sampdist, eval_range)[source]¶
Populate an array with time-shifted and windowed versions of an audio signal. This serves as a precursor for FFT calculation. The first spectrogram time frame coincides with the first sample in the signal. Out-of-bounds array entries are assumed as zero.
- Parameters
signal (array_like) – Audio signal
spectheight (int) – Height of the linear-frequency spectrogram
sigmas (float) – Number of standard deviations after which to cut the window
sampdist (int) – Time intervals to sample the spectrogram
eval_range (slice) – Time range of the spectrogram to be computed
- Returns
stripeplot – Populated array
- Return type
ndarray
- musisep.audio.spect.synth_audio(spect, siglen, sigmas, sampdist, iterations, guess=None, size=2000)[source]¶
Reconstruct an audio signal from a linear-frequency magnitude spectrogram via the algorithm by Griffin and Lim.
- Parameters
spect (array_like) – Linear-frequency magnitude spectrogram
siglen (int) – Intended length of the audio signal
sigmas (float) – Number of standard deviations after which to cut the window
sampdist (int) – Time intervals to sample the spectrogram
iterations (int) – Number of Griffin-Lim iterations to perform
guess (array_like) – Initial value for the audio signal
size (int) – Batch size for the FFT
- Returns
signal – Reconstructed audio signal
- Return type
ndarray
- musisep.audio.spect.winlog_spect(spect, freqs, basefreq, sigmas, scale=True)[source]¶
Apply a logarithmic transform on the frequency axis of a linear-frequency magnitude spectrogram while preserving the width of the horizontal lines via Gaussian smoothing. The attenuation of the higher frequency is counteracted by scaling.
- Parameters
spect (array_like) – Linear-frequency magnitude spectrogram
freqs (array_like) – Frequencies to place the smoothing kernel (normalized to the sampling frequency)
basefreq (float) – Frequency to assume as a minimum for smoothing (normalized to the sampling frequency)
sigmas (float) – Number of standard deviations after which to cut the kernel
scale (bool) – Whether to adjust the l1 norm of the kernels w.r.t. frequency
- Returns
logspect – Log-frequency magnitude spectrogram
- Return type
ndarray