musisep.dictsep package¶
Submodules¶
musisep.dictsep.__main__ module¶
Wrapper for the dictionary learning algorithm. When invoked, the audio sources in the supplied audio file are separated.
- musisep.dictsep.__main__.correct_signal_length(signal, length)[source]¶
Right-pad or right-crop the signal such that it fits the desired length.
- Parameters
signal (ndarray) – Signal to be adjusted
length (int) – Desired length of the signal
- Returns
Adjusted signal
- Return type
ndarray
- musisep.dictsep.__main__.main(mixed_soundfile, orig_soundfiles, out_name, out_name_run_suffix='', inst_num=2, tone_num=1, pexp=1, qexp=0.5, har=25, sigmas=6, sampdist=256, spectheight=6144, logspectheight=1024, minfreq=20, maxfreq=20480, runs=10000, lifetime=500, num_dicts=10, mask=True, color=False, plot_range=None, spect_method='pursuit', supply_dicts=None, spect_plots=())[source]¶
Wrapper function for the dictionary learning algorithm.
- Parameters
mixed_soundfile (string) – Name of the mixed input file
orig_soundfiles (list of string or NoneType) – Names of the files with the isolated instrument tracks or None
out_name (string) – Prefix for the file names
out_name_suffix (string) – Extra label for the output files
inst_num (int) – Number of instruments
tone_num (int) – Maximum number of simultaneous tones for each instrument
pexp (float) – Exponent for the addition of sinusoids
qexp (float) – Exponent to be applied on the spectrum
har (int) – Number of harmonics
sigmas (float) – Number of standard deviations after which to cut the window/kernel
sampdist (int) – Time intervals to sample the spectrogram
spectheight (int) – Height of the linear-frequency spectrogram
logspectheight (int) – Height of the log-frequency spectrogram
minfreq (float) – Minimum frequency in Hz to be represented (included)
maxfreq (float) – Maximum frequency in Hz to be represented (excluded)
runs (int) – Number of training iterations to perform
lifetime (int) – Number of steps after which to renew the dictionary
num_dicts (int) – Number of different dictionaries to generate and train
mask (bool) – Whether to apply spectral masking
color (bool or string) – Whether color should be used, or specification of the color scheme
plot_range (slice or NoneType) – Part of the spectrogram to plot
spect_method (string) – If set to “mel”, a mel spectrogram is used for separation. Otherwise, the log-frequency spectrogram is generated via sparse pursuit.
supply_dicts (NoneType or list of array_like) – Is specified, use the given dictionaries rather than computing new ones
spect_plots (sequence of int) – Time frames for which to output the spectrum as a text file
- Returns
inst_dicts – Dictionaries that were used for the separation
- Return type
list of ndarray
- musisep.dictsep.__main__.separate_duan()[source]¶
Separation of the data by Duan et al.
- Parameters
number (int) – Number of the sample to be considered.
- musisep.dictsep.__main__.separate_frere_jacques()[source]¶
Separation of Bb tin whistle and viola and generalization to C tin whistle and violin, then vice versa.
- musisep.dictsep.__main__.separate_jaiswal(number)[source]¶
Separation of the data by Jaiswal et al.
- Parameters
number (int) – Number of the sample to be considered.
- musisep.dictsep.__main__.separate_mozart_clarinet_piano()[source]¶
Separation of clarinet and piano on the piece by Mozart
- musisep.dictsep.__main__.separate_mozart_recorder_violin()[source]¶
Separation of recorder and violin on the piece by Mozart
musisep.dictsep.adam_b module¶
Module containing the modified ADAM algorithm.
- class musisep.dictsep.adam_b.Adam_B(init, lo=0, hi=1, alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08)[source]¶
Bases:
object
Object for the ADAM algorithm with bounds, adapted for the update of instrument dictionaries. Each column refers to one instruments, and the harmonics are in rows.
- Parameters
init (array-like) – Initial value for the dictionary
lo (float) – Lower bound for the dictionary entries
hi (float) – Upper bound for the dictionary entries
alpha (float) – Global step-size
beta1 (float) – Inertia of the first moment estimator
beta2 (float) – Inertia of the second moment estimator
eps (float) – Value to add in the denominator to avoid division by zero
musisep.dictsep.dictlearn module¶
Module for the training of the dictionary. When invoked, a performance test on artificial data is performed.
- class musisep.dictsep.dictlearn.Learner(fsigma, tone_num, inst_num, har, m, minfreq, maxfreq, lifetime, pexp, qexp, init=None)[source]¶
Bases:
object
Container object for the dictionary learning process.
- Parameters
fsigma (float) – Standard deviation (frequency)
tone_num (int) – Maximum number of simultaneous tones for each instrument
inst_num (int) – Number of instruments in the dictionary
har (int) – Number of harmonics
m (int) – Height of the log-frequency spectrogram
minfreq (float) – Minimum frequency to be represented (included)
maxfreq (float) – Maximum frequency to be represented (excluded)
lifetime (int) – Number of steps after which to renew the dictionary
pexp (float) – Exponent for the addition of sinusoids
qexp (float) – Exponent to be applied on the spectrum
init (array_like) – Initial value for the dictionary
- get_dict()[source]¶
Get the active part of the dictionary.
- Returns
inst_dict – Dictionary with inst_num columns
- Return type
ndarray
- musisep.dictsep.dictlearn.gen_random_inst(har)[source]¶
Generate random harmonic amplitudes according to a Par(1,2) distribution.
- Parameters
har (int) – Number of harmonics
- Returns
inst – Harmonic amplitudes for one instrument, unified to an interval of [0,1]
- Return type
ndarray
- musisep.dictsep.dictlearn.gen_random_inst_dict(har, inst_num)[source]¶
Generate a random instrument dictionary according to a Par(1,2) distribution.
- Parameters
har (int) – Number of harmonics
inst_num (int) – Number of instruments
- Returns
inst_dict – Dictionary with instruments in columns, unified to an interval of [0,1]
- Return type
ndarray
- musisep.dictsep.dictlearn.learn_spect_dict(spect, fsigma, tone_num, inst_num, pexp, qexp, har, minfreq, maxfreq, runs, lifetime)[source]¶
Train the dictionary containing the relative amplitudes of the harmonics.
- Parameters
spect (array_like) – Original log-frequency spectrogram of the recording
fsigma (float) – Standard deviation (frequency)
tone_num (int) – Maximum number of simultaneous tones for each instrument
inst_num (int) – Number of instruments in the dictionary
pexp (float) – Exponent for the addition of sinusoids
qexp (float) – Exponent to be applied on the spectrum
har (int) – Number of harmonics
minfreq (float) – Minimum frequency in Hz to be represented (included)
maxfreq (float) – Maximum frequency in Hz to be represented (excluded)
runs (int) – Number of training iterations to perform
lifetime (int) – Number of steps after which to renew the dictionary
- Returns
inst_dict – Dictionary containing the relative amplitudes of the harmonics
- Return type
ndarray
- musisep.dictsep.dictlearn.make_closures(fsigma)[source]¶
Build the functions that give the bounds and initial values.
- Parameters
fsigma (float) – Standard deviation of the Gaussian in the time domain.
- Returns
make_bounds (lambda (length)) – Lambda that gives the bounds for length peaks
make_inits (lambda (length)) – Lambda that gives the initial values length peaks
- musisep.dictsep.dictlearn.mask_spectrums(spects, orig_spect)[source]¶
Mask the synthesized spectrograms with the original spectrogram.
- Parameters
spects (list of array_like) – List of synthesized spectrograms
orig_spect (array_like) – Original spectrogram
- Returns
spectrums (list of ndarray) – Masked spectrograms
mask_spect (ndarray) – Array mask
- musisep.dictsep.dictlearn.stoch_grad(y, inst_dict, tone_num, adam, fsigma, harscale, baseshift, inst_spect, pexp, qexp)[source]¶
Perform a dictionary training step.
- Parameters
y (array_like) – Log-frequency spectrum to represent
inst_dict (ndarray) – Dictionary containing the relative amplitudes of the harmonics
tone_num (int) – Maximum number of simultaneous tones for each instrument
adam (Adam_B) – Container object for the ADAM optimizer
fsigma (float) – Standard deviation (frequency)
harscale (float) – Scaling factor
baseshift (int) – Length to add to the spectrum in order to avoid circular convolution
inst_spect (array_like) – Spectra of the instruments, in the columns
pexp (float) – Exponent for the addition of sinusoids
qexp (float) – Exponent to be applied on the spectrum
- Returns
inst_dict (ndarray) – Updated dictionary
reconstruction (ndarray) – Synthesized spectrum
inst_amps (ndarray) – Summed amplitudes for each instruments
- musisep.dictsep.dictlearn.synth_spect(spect, tone_num, inst_dict, fsigma, spectheight, pexp, qexp, minfreq, maxfreq, stretch=1)[source]¶
Separate and synthesize the spectrograms from the original spectrogram.
- Parameters
spect (array_like) – Original log-frequency spectrogram of the recording
tone_num (int) – Maximum number of simultaneous tones for each instrument
inst_dict (ndarray) – Dictionary containing the relative amplitudes of the harmonics
fsigma (float) – Standard deviation (frequency)
spectheight (int) – Height of the linear-frequency spectrograms
pexp (float) – Exponent for the addition of sinusoids
qexp (float) – Exponent to be applied on the spectrum
minfreq (float) – Minimum frequency to be represented (included) (normalized to the sampling frequency)
maxfreq (float) – Maximum frequency to be represented (excluded) (normalized to the sampling frequency)
- Returns
dict_spectrum (ndarray) – Synthesized log-frequency spectrogram with all instruments
inst_spectrums (list of ndarray) – List of synthesized log-frequency spectrograms for the instruments
dict_spectrum_lin (ndarray) – Synthesized linear-frequency spectrogram with all instruments
inst_spectrums_lin (list of ndarray) – List of synthesized linear-frequency spectrograms for the instruments
- musisep.dictsep.dictlearn.test_learn(fsigma, tone_num, inst_num, pexp, qexp, har, m, runs, test_samples, lifetime, inst_dict)[source]¶
Evaluate the performance of the dictionary learning algorithm via artificial spectra.
- Parameters
fsigma (float) – Width of the Gaussians in the log-frequency spectrogram
tone_num (int) – Maximum number of simultaneous tones for each instrument
inst_num (int) – Number of instruments in the dictionaries
pexp (float) – Exponent for the addition of sinusoids
qexp (float) – Exponent to be applied on the spectrum
har (int) – Number of harmonics
m (int) – Height of the log-frequency spectrogram
runs (int) – Number of training iterations to perform
test_samples (int) – Number of test spectra to generate
lifetime (int) – Number of steps after which to renew the dictionary
inst_dict (array_like) – Dictionary containing the relative amplitudes of the harmonics
- Returns
measures – Array containing, in that order, the SDR, SIR, SAR with the original dictionary and the SDR, SID, SAR with the trained dictionary
- Return type
ndarray
- musisep.dictsep.dictlearn.test_learn_multi(fsigma, tone_num, inst_num, pexp, qexp, har, m, runs, test_samples, lifetime, num_dicts)[source]¶
Evaluate the performance of the dictionary learning algorithm via artificial spectra.
- Parameters
fsigma (float) – Width of the Gaussians in the log-frequency spectrogram
tone_num (int) – Maximum number of simultaneous tones for each instrument
inst_num (int) – Number of instruments in the dictionaries
pexp (float) – Exponent for the addition of sinusoids
qexp (float) – Exponent to be applied on the spectrum
har (int) – Number of harmonics
m (int) – Height of the log-frequency spectrogram
runs (int) – Number of training iterations to perform
test_samples (int) – Number of test spectra to generate
lifetime (int) – Number of steps after which to renew the dictionary
num_dicts (int) – Number of different dictionaries to generate and train
- Returns
measures – Array containing, in the rows, the SDR, SIR, SAR with the original dictionary and the SDR, SID, SAR with the trained dictionary
- Return type
ndarray
musisep.dictsep.exptool module¶
Back-end module for the generation of spectrograms and their gradients.
- musisep.dictsep.exptool.inst_scale()¶
- musisep.dictsep.exptool.inst_scale_grad()¶
- musisep.dictsep.exptool.inst_shift()¶
- musisep.dictsep.exptool.inst_shift_dict_grad()¶
- musisep.dictsep.exptool.inst_shift_grad()¶
musisep.dictsep.pursuit module¶
Module for the sparse pursuit algorithm and its helper functions.
- class musisep.dictsep.pursuit.Peaks(amps, shifts, params, insts)[source]¶
Bases:
object
Object to represent the parameters for the peaks in the spectrogram.
- Parameters
amps (array_like) – Amplitudes
shifts (array_like) – Fundamental frequencies
params (array_like) – Extra parameters (in the rows)
insts (array_like) – Instrument numbers
- classmethod empty(params)[source]¶
Construct an empty Peaks object.
- Returns
A Peaks object with zero peaks
- Return type
- classmethod from_array(array, insts, paramlen)[source]¶
Construct a Peaks object from an array.
- Parameters
array (array_like) – Array that contains, in consecutive order, the amplitudes, the fundamental frequencies, the standard deviations, and the inharmoniticies
insts (array_like) – Instrument numbers
paramlen (int) – Number of extra parameters
- get_array()[source]¶
- Returns
Array that contains, in consecutive order, the amplitudes, the fundamental frequencies, the standard deviations, and the inharmoniticies
- Return type
array_like
- musisep.dictsep.pursuit.calc_harscale(minfreq, maxfreq, numfreqs)[source]¶
Calculate the scaling factor of the frequency axis for the log-frequency spectrogram.
- Parameters
minfreq (float) – Minimum frequency to be represented (included)
maxfreq (float) – Maximum frequency to be represented (excluded)
numfreqs (int) – Intended height of the spectrogram
- Returns
harscale – Scaling factor
- Return type
float
- musisep.dictsep.pursuit.fft_selector(y, prenum, baseshift, inst_spect, qexp)[source]¶
Callback selector to find fundamental frequencies based on the correlation of the spectrum with the instrument spectra.
- Parameters
y (array_like) – Spectrum
prenum (int) – Number of peaks to consider
baseshift (int) – Length to add to the spectrum in order to avoid circular convolution
inst_spect (array_like) – Spectra of the instruments, in the columns
qexp (float) – Exponent to be applied on the spectrum
- Returns
amps (array_like) – Amplitudes
shifts (array_like) – Fundamental frequencies
insts (array_like) – Instrument numbers
- musisep.dictsep.pursuit.gen_inst_spect(baseshift, fsigma, fixed_params, pexp, qexp, m, n)[source]¶
Generate an instrument log-frequency spectrum.
- Parameters
baseshift (int) – Length to add to the spectrum in order to avoid circular convolution
fsigma (float) – Standard deviation (frequency)
fixed_params (sequence) – Extra fixed parameters for the synthesizer
pexp (float) – Exponent for the addition of sinusoids
qexp (float) – Exponent to be applied on the spectrum
m (int) – Height of the spectrogram
n (int) – Number of patterns/instruments
- Returns
inst_spect – Spectra of the instruments, in the columns
- Return type
ndarray
- musisep.dictsep.pursuit.inst_scale(peaks, inst_dict, pexp, m)[source]¶
Synthesize the linear-frequency spectrum.
- Parameters
peaks (Peaks) – Peak parameters
inst_dict (array_like) – Dictionary containing the relative amplitudes of the harmonics
pexp (float) – Exponent for the addition of sinusoids
m (int) – Height of the spectrogram
- Returns
Linear-frequency spectrum
- Return type
ndarray
- musisep.dictsep.pursuit.inst_shift(peaks, fixed_params, pexp, m)[source]¶
Synthesize the log-frequency spectrum.
- Parameters
peaks (Peaks) – Peak parameters
fixed_params (sequence) – Extra fixed parameters for the synthesizer
pexp (float) – Exponent for the addition of sinusoids
m (int) – Height of the spectrogram
- Returns
Log-frequency spectrum
- Return type
ndarray
- musisep.dictsep.pursuit.inst_shift_dict_grad(peak_array, insts, fixed_params, pexp, qexp, m, y)[source]¶
Least-squares gradient function for the log-frequency spectrum w.r.t. the dictionary.
- Parameters
peak_array (array_like) – Peak parameters in array form
insts (array_like) – Instrument numbers
fixed_params (sequence) – Extra fixed parameters for the synthesizer
pexp (float) – Exponent for the addition of sinusoids
qexp (float) – Exponent to be applied on the spectrum
m (int) – Height of the spectrogram
y (array_like) – Spectrum to compare with
- Returns
grad – Least-squares gradient w.r.t. the dictionary
- Return type
ndarray
- musisep.dictsep.pursuit.inst_shift_grad(peak_array, insts, fixed_params, pexp, qexp, m, y)[source]¶
Least-squares gradient function for the log-frequency spectrum w.r.t. the parameters.
- Parameters
peak_array (array_like) – Peak parameters in array form
insts (array_like) – Instrument numbers
fixed_params (sequence) – Extra fixed parameters for the synthesizer
pexp (float) – Exponent for the addition of sinusoids
qexp (float) – Exponent to be applied on the spectrum
m (int) – Height of the spectrogram
y (array_like) – Spectrum to compare with
- Returns
grad – Least-squares gradient
- Return type
ndarray
- musisep.dictsep.pursuit.inst_shift_obj(peak_array, insts, fixed_params, pexp, qexp, m, y)[source]¶
Least-squares objective function for the log-frequency spectrum.
- Parameters
peak_array (array_like) – Peak parameters in array form
insts (array_like) – Instrument numbers
fixed_params (sequence) – Extra fixed parameters for the synthesizer
pexp (float) – Exponent for the addition of sinusoids
qexp (float) – Exponent to be applied on the spectrum
m (int) – Height of the spectrogram
y (array_like) – Spectrum to compare with
- Returns
obj – Least-squares error
- Return type
float
- musisep.dictsep.pursuit.max_selector(y, prenum, n)[source]¶
Callback selector to find peaks based on the local maxima which are dominant in a discrete interval, viewed from its midpoint.
- Parameters
y (array_like) – Spectrum
prenum (int) – Number of peaks to consider
n (int) – Length of the interval
- Returns
amps (array_like) – Amplitudes
shifts (array_like) – Frequencies
insts (array_like) – Instrument numbers (always 0)
- musisep.dictsep.pursuit.peak_pursuit(y, nums, prenum, runs, n, inst_shift, inst_shift_obj, inst_shift_grad, make_bounds, make_inits, fixed_params, selector, selector_args, pexp, qexp, beta=1, init=None)[source]¶
Sparse pursuit algorithm for the identification of peaks in a spectrum.
- Parameters
y (array_like) – Spectrum
num (int) – Maximum number of peaks
prenum (int) – Number of new peaks to consider per iteration
runs (int) – Maximum number of training iterations
n (int) – Number of patterns/instruments
inst_shift (callable (peaks, fixed_params, pexp, m, n)) – Synthesizing function
inst_shift_obj (callable (peak_array, insts, fixed_params, pexp, qexp, m, n, y)) – Synthesizing function objective
inst_shift_grad (callable (peak_array, insts, fixed_params, pexp, qexp, m, n, y)) – Synthesizing function gradient
make_bounds (lambda (length)) – Lambda that gives the bounds for length peaks
make_inits (lambda (length)) – Lambda that gives the initial values length peaks
fixed_params (sequence) – Extra fixed parameters for the synthesizer
selector (function) – Callback selector accepting y and prenum as arguments
selector_args (sequence) – Extra arguments to pass to the selector
pexp (float) – Exponent for the addition of sinusoids
qexp (float) – Exponent to be applied on the spectrum
beta (float) – Residual reduction factor
init (Peaks) – Initial value for the peaks
- Returns
peaks (Peaks) – Identified peaks
reconstruction (ndarray) – Synthesized spectrum
- musisep.dictsep.pursuit.test_pattern(peaks, fixed_params, pexp, m)[source]¶
Evaluate a test pattern.
- Parameters
peaks (Peaks) – Continuous parameters for the peaks
fixed_params (sequence) – Extra fixed parameters
pexp (float) – (ignored)
m (int) – (ignored)
- Returns
y – Sampled test pattern
- Return type
ndarray
- musisep.dictsep.pursuit.test_pattern_comp(x, amps, shifts, sigmas)[source]¶
Evaluate a test pattern.
- Parameters
x (array_like) – Positions to evaluate the pattern
amps (array_like) – Amplitudes of the peaks
shifts (array_like) – Positions of the peaks on the axis
sigmas (array_like) – Standard deviations of the peaks
- Returns
y – Evaluation of the test pattern
- Return type
ndarray
- musisep.dictsep.pursuit.test_pattern_gen(seed, scaling)[source]¶
Generate the parameters for a random pattern.
- Parameters
seed (int) – Random seed
scaling (float) – Scaling of the axis
- Returns
amps (ndarray) – Amplitudes of the peaks
shifts (ndarray) – Positions of the peaks on the axis
sigmas (ndarray) – Standard deviations of the peaks
- musisep.dictsep.pursuit.test_pattern_grad(peak_array, insts, fixed_params, pexp, qexp, m, y)[source]¶
Gradient for a test pattern.
- Parameters
peak_array (array_like) – Array of all the continuous parameters
insts (array_like) – Number of the instruments/patterns
fixed_params (sequence) – Extra fixed parameters
pexp (float) – (ignored)
qexp (float) – (ignored)
m (int) – (ignored)
y (array_like) – Evaluation of the test pattern
- Returns
grad – Gradient for the test pattern
- Return type
ndarray
- musisep.dictsep.pursuit.test_pattern_grad_helper(x, r, amps, shifts, pat_amps, pat_shifts, pat_sigmas)[source]¶
Helper function for the computation of the gradient of the test pattern.
- Parameters
x (array_like) – Positions where pattern was evaluated
r (array_like) – Residual of the pattern
amps (array_like) – Amplitudes of the patterns
shifts (array_like) – Shifts of the peaks
pat_amps (array_like) – Amplitudes of the peaks for each pattern
pat_shifts (array_like) – Positions of the peaks on the axis for each pattern
pat_sigmas (array_like) – Standard deviations of the peaks for each pattern
- Returns
grad – Gradient for the test pattern
- Return type
ndarray
- musisep.dictsep.pursuit.test_pattern_obj(peak_array, insts, fixed_params, pexp, qexp, m, y)[source]¶
Loss objective for a test pattern.
- Parameters
peak_array (array_like) – Array of all the continuous parameters
insts (array_like) – Number of the instruments/patterns
fixed_params (sequence) – Extra fixed parameters
pexp (float) – (ignored)
qexp (float) – (ignored)
m (int) – (ignored)
y (array_like) – Evaluation of the test pattern
- Returns
obj – Least-squares error
- Return type
float