musisep.dictsep package

Submodules

musisep.dictsep.__main__ module

Wrapper for the dictionary learning algorithm. When invoked, the audio sources in the supplied audio file are separated.

musisep.dictsep.__main__.correct_signal_length(signal, length)[source]

Right-pad or right-crop the signal such that it fits the desired length.

Parameters
  • signal (ndarray) – Signal to be adjusted

  • length (int) – Desired length of the signal

Returns

Adjusted signal

Return type

ndarray

musisep.dictsep.__main__.main(mixed_soundfile, orig_soundfiles, out_name, out_name_run_suffix='', inst_num=2, tone_num=1, pexp=1, qexp=0.5, har=25, sigmas=6, sampdist=256, spectheight=6144, logspectheight=1024, minfreq=20, maxfreq=20480, runs=10000, lifetime=500, num_dicts=10, mask=True, color=False, plot_range=None, spect_method='pursuit', supply_dicts=None, spect_plots=())[source]

Wrapper function for the dictionary learning algorithm.

Parameters
  • mixed_soundfile (string) – Name of the mixed input file

  • orig_soundfiles (list of string or NoneType) – Names of the files with the isolated instrument tracks or None

  • out_name (string) – Prefix for the file names

  • out_name_suffix (string) – Extra label for the output files

  • inst_num (int) – Number of instruments

  • tone_num (int) – Maximum number of simultaneous tones for each instrument

  • pexp (float) – Exponent for the addition of sinusoids

  • qexp (float) – Exponent to be applied on the spectrum

  • har (int) – Number of harmonics

  • sigmas (float) – Number of standard deviations after which to cut the window/kernel

  • sampdist (int) – Time intervals to sample the spectrogram

  • spectheight (int) – Height of the linear-frequency spectrogram

  • logspectheight (int) – Height of the log-frequency spectrogram

  • minfreq (float) – Minimum frequency in Hz to be represented (included)

  • maxfreq (float) – Maximum frequency in Hz to be represented (excluded)

  • runs (int) – Number of training iterations to perform

  • lifetime (int) – Number of steps after which to renew the dictionary

  • num_dicts (int) – Number of different dictionaries to generate and train

  • mask (bool) – Whether to apply spectral masking

  • color (bool or string) – Whether color should be used, or specification of the color scheme

  • plot_range (slice or NoneType) – Part of the spectrogram to plot

  • spect_method (string) – If set to “mel”, a mel spectrogram is used for separation. Otherwise, the log-frequency spectrogram is generated via sparse pursuit.

  • supply_dicts (NoneType or list of array_like) – Is specified, use the given dictionaries rather than computing new ones

  • spect_plots (sequence of int) – Time frames for which to output the spectrum as a text file

Returns

inst_dicts – Dictionaries that were used for the separation

Return type

list of ndarray

musisep.dictsep.__main__.separate_duan()[source]

Separation of the data by Duan et al.

Parameters

number (int) – Number of the sample to be considered.

musisep.dictsep.__main__.separate_frere_jacques()[source]

Separation of Bb tin whistle and viola and generalization to C tin whistle and violin, then vice versa.

musisep.dictsep.__main__.separate_jaiswal(number)[source]

Separation of the data by Jaiswal et al.

Parameters

number (int) – Number of the sample to be considered.

musisep.dictsep.__main__.separate_mozart_clarinet_piano()[source]

Separation of clarinet and piano on the piece by Mozart

musisep.dictsep.__main__.separate_mozart_piano_mock()[source]

Mock separation of the piano track.

musisep.dictsep.__main__.separate_mozart_recorder_violin()[source]

Separation of recorder and violin on the piece by Mozart

musisep.dictsep.__main__.separate_mozart_recorder_violin_mel()[source]

Separation of recorder and violin on the piece by Mozart

musisep.dictsep.__main__.separate_urmp()[source]

Separation of selected samples from the URMP dataset.

musisep.dictsep.adam_b module

Module containing the modified ADAM algorithm.

class musisep.dictsep.adam_b.Adam_B(init, lo=0, hi=1, alpha=0.001, beta1=0.9, beta2=0.999, eps=1e-08)[source]

Bases: object

Object for the ADAM algorithm with bounds, adapted for the update of instrument dictionaries. Each column refers to one instruments, and the harmonics are in rows.

Parameters
  • init (array-like) – Initial value for the dictionary

  • lo (float) – Lower bound for the dictionary entries

  • hi (float) – Upper bound for the dictionary entries

  • alpha (float) – Global step-size

  • beta1 (float) – Inertia of the first moment estimator

  • beta2 (float) – Inertia of the second moment estimator

  • eps (float) – Value to add in the denominator to avoid division by zero

reset(i)[source]

Reset an instrument to its initial state.

Parameters

i (int) – Number of the instrument

step(stepdir)[source]

Update the dictionary.

Parameters

stepdir (array-like) – Step direction (negative gradient)

Returns

theta – New value of the dictionary

Return type

ndarray

musisep.dictsep.dictlearn module

Module for the training of the dictionary. When invoked, a performance test on artificial data is performed.

class musisep.dictsep.dictlearn.Learner(fsigma, tone_num, inst_num, har, m, minfreq, maxfreq, lifetime, pexp, qexp, init=None)[source]

Bases: object

Container object for the dictionary learning process.

Parameters
  • fsigma (float) – Standard deviation (frequency)

  • tone_num (int) – Maximum number of simultaneous tones for each instrument

  • inst_num (int) – Number of instruments in the dictionary

  • har (int) – Number of harmonics

  • m (int) – Height of the log-frequency spectrogram

  • minfreq (float) – Minimum frequency to be represented (included)

  • maxfreq (float) – Maximum frequency to be represented (excluded)

  • lifetime (int) – Number of steps after which to renew the dictionary

  • pexp (float) – Exponent for the addition of sinusoids

  • qexp (float) – Exponent to be applied on the spectrum

  • init (array_like) – Initial value for the dictionary

get_dict()[source]

Get the active part of the dictionary.

Returns

inst_dict – Dictionary with inst_num columns

Return type

ndarray

learn(y)[source]

Learning step. Automatically renews the dictionary.

Parameters

y (array_like) – Log-frequency spectrum

Returns

reconstruction – Synthesized spectrum

Return type

ndarray

renew_dict(headstart, newinsts)[source]

Renew the dictionary.

Parameters
  • headstart (int) – Headstart in the lifetime counter (to help new instruments)

  • newinsts (int) – Number of instruments to be renewed

musisep.dictsep.dictlearn.gen_random_inst(har)[source]

Generate random harmonic amplitudes according to a Par(1,2) distribution.

Parameters

har (int) – Number of harmonics

Returns

inst – Harmonic amplitudes for one instrument, unified to an interval of [0,1]

Return type

ndarray

musisep.dictsep.dictlearn.gen_random_inst_dict(har, inst_num)[source]

Generate a random instrument dictionary according to a Par(1,2) distribution.

Parameters
  • har (int) – Number of harmonics

  • inst_num (int) – Number of instruments

Returns

inst_dict – Dictionary with instruments in columns, unified to an interval of [0,1]

Return type

ndarray

musisep.dictsep.dictlearn.learn_spect_dict(spect, fsigma, tone_num, inst_num, pexp, qexp, har, minfreq, maxfreq, runs, lifetime)[source]

Train the dictionary containing the relative amplitudes of the harmonics.

Parameters
  • spect (array_like) – Original log-frequency spectrogram of the recording

  • fsigma (float) – Standard deviation (frequency)

  • tone_num (int) – Maximum number of simultaneous tones for each instrument

  • inst_num (int) – Number of instruments in the dictionary

  • pexp (float) – Exponent for the addition of sinusoids

  • qexp (float) – Exponent to be applied on the spectrum

  • har (int) – Number of harmonics

  • minfreq (float) – Minimum frequency in Hz to be represented (included)

  • maxfreq (float) – Maximum frequency in Hz to be represented (excluded)

  • runs (int) – Number of training iterations to perform

  • lifetime (int) – Number of steps after which to renew the dictionary

Returns

inst_dict – Dictionary containing the relative amplitudes of the harmonics

Return type

ndarray

musisep.dictsep.dictlearn.make_closures(fsigma)[source]

Build the functions that give the bounds and initial values.

Parameters

fsigma (float) – Standard deviation of the Gaussian in the time domain.

Returns

  • make_bounds (lambda (length)) – Lambda that gives the bounds for length peaks

  • make_inits (lambda (length)) – Lambda that gives the initial values length peaks

musisep.dictsep.dictlearn.mask_spectrums(spects, orig_spect)[source]

Mask the synthesized spectrograms with the original spectrogram.

Parameters
  • spects (list of array_like) – List of synthesized spectrograms

  • orig_spect (array_like) – Original spectrogram

Returns

  • spectrums (list of ndarray) – Masked spectrograms

  • mask_spect (ndarray) – Array mask

musisep.dictsep.dictlearn.stoch_grad(y, inst_dict, tone_num, adam, fsigma, harscale, baseshift, inst_spect, pexp, qexp)[source]

Perform a dictionary training step.

Parameters
  • y (array_like) – Log-frequency spectrum to represent

  • inst_dict (ndarray) – Dictionary containing the relative amplitudes of the harmonics

  • tone_num (int) – Maximum number of simultaneous tones for each instrument

  • adam (Adam_B) – Container object for the ADAM optimizer

  • fsigma (float) – Standard deviation (frequency)

  • harscale (float) – Scaling factor

  • baseshift (int) – Length to add to the spectrum in order to avoid circular convolution

  • inst_spect (array_like) – Spectra of the instruments, in the columns

  • pexp (float) – Exponent for the addition of sinusoids

  • qexp (float) – Exponent to be applied on the spectrum

Returns

  • inst_dict (ndarray) – Updated dictionary

  • reconstruction (ndarray) – Synthesized spectrum

  • inst_amps (ndarray) – Summed amplitudes for each instruments

musisep.dictsep.dictlearn.synth_spect(spect, tone_num, inst_dict, fsigma, spectheight, pexp, qexp, minfreq, maxfreq, stretch=1)[source]

Separate and synthesize the spectrograms from the original spectrogram.

Parameters
  • spect (array_like) – Original log-frequency spectrogram of the recording

  • tone_num (int) – Maximum number of simultaneous tones for each instrument

  • inst_dict (ndarray) – Dictionary containing the relative amplitudes of the harmonics

  • fsigma (float) – Standard deviation (frequency)

  • spectheight (int) – Height of the linear-frequency spectrograms

  • pexp (float) – Exponent for the addition of sinusoids

  • qexp (float) – Exponent to be applied on the spectrum

  • minfreq (float) – Minimum frequency to be represented (included) (normalized to the sampling frequency)

  • maxfreq (float) – Maximum frequency to be represented (excluded) (normalized to the sampling frequency)

Returns

  • dict_spectrum (ndarray) – Synthesized log-frequency spectrogram with all instruments

  • inst_spectrums (list of ndarray) – List of synthesized log-frequency spectrograms for the instruments

  • dict_spectrum_lin (ndarray) – Synthesized linear-frequency spectrogram with all instruments

  • inst_spectrums_lin (list of ndarray) – List of synthesized linear-frequency spectrograms for the instruments

musisep.dictsep.dictlearn.test_learn(fsigma, tone_num, inst_num, pexp, qexp, har, m, runs, test_samples, lifetime, inst_dict)[source]

Evaluate the performance of the dictionary learning algorithm via artificial spectra.

Parameters
  • fsigma (float) – Width of the Gaussians in the log-frequency spectrogram

  • tone_num (int) – Maximum number of simultaneous tones for each instrument

  • inst_num (int) – Number of instruments in the dictionaries

  • pexp (float) – Exponent for the addition of sinusoids

  • qexp (float) – Exponent to be applied on the spectrum

  • har (int) – Number of harmonics

  • m (int) – Height of the log-frequency spectrogram

  • runs (int) – Number of training iterations to perform

  • test_samples (int) – Number of test spectra to generate

  • lifetime (int) – Number of steps after which to renew the dictionary

  • inst_dict (array_like) – Dictionary containing the relative amplitudes of the harmonics

Returns

measures – Array containing, in that order, the SDR, SIR, SAR with the original dictionary and the SDR, SID, SAR with the trained dictionary

Return type

ndarray

musisep.dictsep.dictlearn.test_learn_multi(fsigma, tone_num, inst_num, pexp, qexp, har, m, runs, test_samples, lifetime, num_dicts)[source]

Evaluate the performance of the dictionary learning algorithm via artificial spectra.

Parameters
  • fsigma (float) – Width of the Gaussians in the log-frequency spectrogram

  • tone_num (int) – Maximum number of simultaneous tones for each instrument

  • inst_num (int) – Number of instruments in the dictionaries

  • pexp (float) – Exponent for the addition of sinusoids

  • qexp (float) – Exponent to be applied on the spectrum

  • har (int) – Number of harmonics

  • m (int) – Height of the log-frequency spectrogram

  • runs (int) – Number of training iterations to perform

  • test_samples (int) – Number of test spectra to generate

  • lifetime (int) – Number of steps after which to renew the dictionary

  • num_dicts (int) – Number of different dictionaries to generate and train

Returns

measures – Array containing, in the rows, the SDR, SIR, SAR with the original dictionary and the SDR, SID, SAR with the trained dictionary

Return type

ndarray

musisep.dictsep.exptool module

Back-end module for the generation of spectrograms and their gradients.

musisep.dictsep.exptool.inst_scale()
musisep.dictsep.exptool.inst_scale_grad()
musisep.dictsep.exptool.inst_shift()
musisep.dictsep.exptool.inst_shift_dict_grad()
musisep.dictsep.exptool.inst_shift_grad()

musisep.dictsep.pursuit module

Module for the sparse pursuit algorithm and its helper functions.

class musisep.dictsep.pursuit.Peaks(amps, shifts, params, insts)[source]

Bases: object

Object to represent the parameters for the peaks in the spectrogram.

Parameters
  • amps (array_like) – Amplitudes

  • shifts (array_like) – Fundamental frequencies

  • params (array_like) – Extra parameters (in the rows)

  • insts (array_like) – Instrument numbers

copy()[source]
Returns

Copy of the contained peak parameters

Return type

Peaks

classmethod empty(params)[source]

Construct an empty Peaks object.

Returns

A Peaks object with zero peaks

Return type

Peaks

classmethod from_array(array, insts, paramlen)[source]

Construct a Peaks object from an array.

Parameters
  • array (array_like) – Array that contains, in consecutive order, the amplitudes, the fundamental frequencies, the standard deviations, and the inharmoniticies

  • insts (array_like) – Instrument numbers

  • paramlen (int) – Number of extra parameters

get_array()[source]
Returns

Array that contains, in consecutive order, the amplitudes, the fundamental frequencies, the standard deviations, and the inharmoniticies

Return type

array_like

get_params()[source]
Returns

  • amps (ndarray) – Amplitudes

  • shifts (ndarray) – Fundamental frequencies

  • params (ndarray) – Extra parameters

  • insts (ndarray) – Instrument numbers

merge(new)[source]

Merge the Peaks object with another Peaks object contained in new by concatenating the parameters.

Parameters

new (Peaks) – Object to merge with

musisep.dictsep.pursuit.calc_harscale(minfreq, maxfreq, numfreqs)[source]

Calculate the scaling factor of the frequency axis for the log-frequency spectrogram.

Parameters
  • minfreq (float) – Minimum frequency to be represented (included)

  • maxfreq (float) – Maximum frequency to be represented (excluded)

  • numfreqs (int) – Intended height of the spectrogram

Returns

harscale – Scaling factor

Return type

float

musisep.dictsep.pursuit.fft_selector(y, prenum, baseshift, inst_spect, qexp)[source]

Callback selector to find fundamental frequencies based on the correlation of the spectrum with the instrument spectra.

Parameters
  • y (array_like) – Spectrum

  • prenum (int) – Number of peaks to consider

  • baseshift (int) – Length to add to the spectrum in order to avoid circular convolution

  • inst_spect (array_like) – Spectra of the instruments, in the columns

  • qexp (float) – Exponent to be applied on the spectrum

Returns

  • amps (array_like) – Amplitudes

  • shifts (array_like) – Fundamental frequencies

  • insts (array_like) – Instrument numbers

musisep.dictsep.pursuit.gen_inst_spect(baseshift, fsigma, fixed_params, pexp, qexp, m, n)[source]

Generate an instrument log-frequency spectrum.

Parameters
  • baseshift (int) – Length to add to the spectrum in order to avoid circular convolution

  • fsigma (float) – Standard deviation (frequency)

  • fixed_params (sequence) – Extra fixed parameters for the synthesizer

  • pexp (float) – Exponent for the addition of sinusoids

  • qexp (float) – Exponent to be applied on the spectrum

  • m (int) – Height of the spectrogram

  • n (int) – Number of patterns/instruments

Returns

inst_spect – Spectra of the instruments, in the columns

Return type

ndarray

musisep.dictsep.pursuit.inst_scale(peaks, inst_dict, pexp, m)[source]

Synthesize the linear-frequency spectrum.

Parameters
  • peaks (Peaks) – Peak parameters

  • inst_dict (array_like) – Dictionary containing the relative amplitudes of the harmonics

  • pexp (float) – Exponent for the addition of sinusoids

  • m (int) – Height of the spectrogram

Returns

Linear-frequency spectrum

Return type

ndarray

musisep.dictsep.pursuit.inst_shift(peaks, fixed_params, pexp, m)[source]

Synthesize the log-frequency spectrum.

Parameters
  • peaks (Peaks) – Peak parameters

  • fixed_params (sequence) – Extra fixed parameters for the synthesizer

  • pexp (float) – Exponent for the addition of sinusoids

  • m (int) – Height of the spectrogram

Returns

Log-frequency spectrum

Return type

ndarray

musisep.dictsep.pursuit.inst_shift_dict_grad(peak_array, insts, fixed_params, pexp, qexp, m, y)[source]

Least-squares gradient function for the log-frequency spectrum w.r.t. the dictionary.

Parameters
  • peak_array (array_like) – Peak parameters in array form

  • insts (array_like) – Instrument numbers

  • fixed_params (sequence) – Extra fixed parameters for the synthesizer

  • pexp (float) – Exponent for the addition of sinusoids

  • qexp (float) – Exponent to be applied on the spectrum

  • m (int) – Height of the spectrogram

  • y (array_like) – Spectrum to compare with

Returns

grad – Least-squares gradient w.r.t. the dictionary

Return type

ndarray

musisep.dictsep.pursuit.inst_shift_grad(peak_array, insts, fixed_params, pexp, qexp, m, y)[source]

Least-squares gradient function for the log-frequency spectrum w.r.t. the parameters.

Parameters
  • peak_array (array_like) – Peak parameters in array form

  • insts (array_like) – Instrument numbers

  • fixed_params (sequence) – Extra fixed parameters for the synthesizer

  • pexp (float) – Exponent for the addition of sinusoids

  • qexp (float) – Exponent to be applied on the spectrum

  • m (int) – Height of the spectrogram

  • y (array_like) – Spectrum to compare with

Returns

grad – Least-squares gradient

Return type

ndarray

musisep.dictsep.pursuit.inst_shift_obj(peak_array, insts, fixed_params, pexp, qexp, m, y)[source]

Least-squares objective function for the log-frequency spectrum.

Parameters
  • peak_array (array_like) – Peak parameters in array form

  • insts (array_like) – Instrument numbers

  • fixed_params (sequence) – Extra fixed parameters for the synthesizer

  • pexp (float) – Exponent for the addition of sinusoids

  • qexp (float) – Exponent to be applied on the spectrum

  • m (int) – Height of the spectrogram

  • y (array_like) – Spectrum to compare with

Returns

obj – Least-squares error

Return type

float

musisep.dictsep.pursuit.max_selector(y, prenum, n)[source]

Callback selector to find peaks based on the local maxima which are dominant in a discrete interval, viewed from its midpoint.

Parameters
  • y (array_like) – Spectrum

  • prenum (int) – Number of peaks to consider

  • n (int) – Length of the interval

Returns

  • amps (array_like) – Amplitudes

  • shifts (array_like) – Frequencies

  • insts (array_like) – Instrument numbers (always 0)

musisep.dictsep.pursuit.peak_pursuit(y, nums, prenum, runs, n, inst_shift, inst_shift_obj, inst_shift_grad, make_bounds, make_inits, fixed_params, selector, selector_args, pexp, qexp, beta=1, init=None)[source]

Sparse pursuit algorithm for the identification of peaks in a spectrum.

Parameters
  • y (array_like) – Spectrum

  • num (int) – Maximum number of peaks

  • prenum (int) – Number of new peaks to consider per iteration

  • runs (int) – Maximum number of training iterations

  • n (int) – Number of patterns/instruments

  • inst_shift (callable (peaks, fixed_params, pexp, m, n)) – Synthesizing function

  • inst_shift_obj (callable (peak_array, insts, fixed_params, pexp, qexp, m, n, y)) – Synthesizing function objective

  • inst_shift_grad (callable (peak_array, insts, fixed_params, pexp, qexp, m, n, y)) – Synthesizing function gradient

  • make_bounds (lambda (length)) – Lambda that gives the bounds for length peaks

  • make_inits (lambda (length)) – Lambda that gives the initial values length peaks

  • fixed_params (sequence) – Extra fixed parameters for the synthesizer

  • selector (function) – Callback selector accepting y and prenum as arguments

  • selector_args (sequence) – Extra arguments to pass to the selector

  • pexp (float) – Exponent for the addition of sinusoids

  • qexp (float) – Exponent to be applied on the spectrum

  • beta (float) – Residual reduction factor

  • init (Peaks) – Initial value for the peaks

Returns

  • peaks (Peaks) – Identified peaks

  • reconstruction (ndarray) – Synthesized spectrum

musisep.dictsep.pursuit.test_pattern(peaks, fixed_params, pexp, m)[source]

Evaluate a test pattern.

Parameters
  • peaks (Peaks) – Continuous parameters for the peaks

  • fixed_params (sequence) – Extra fixed parameters

  • pexp (float) – (ignored)

  • m (int) – (ignored)

Returns

y – Sampled test pattern

Return type

ndarray

musisep.dictsep.pursuit.test_pattern_comp(x, amps, shifts, sigmas)[source]

Evaluate a test pattern.

Parameters
  • x (array_like) – Positions to evaluate the pattern

  • amps (array_like) – Amplitudes of the peaks

  • shifts (array_like) – Positions of the peaks on the axis

  • sigmas (array_like) – Standard deviations of the peaks

Returns

y – Evaluation of the test pattern

Return type

ndarray

musisep.dictsep.pursuit.test_pattern_gen(seed, scaling)[source]

Generate the parameters for a random pattern.

Parameters
  • seed (int) – Random seed

  • scaling (float) – Scaling of the axis

Returns

  • amps (ndarray) – Amplitudes of the peaks

  • shifts (ndarray) – Positions of the peaks on the axis

  • sigmas (ndarray) – Standard deviations of the peaks

musisep.dictsep.pursuit.test_pattern_grad(peak_array, insts, fixed_params, pexp, qexp, m, y)[source]

Gradient for a test pattern.

Parameters
  • peak_array (array_like) – Array of all the continuous parameters

  • insts (array_like) – Number of the instruments/patterns

  • fixed_params (sequence) – Extra fixed parameters

  • pexp (float) – (ignored)

  • qexp (float) – (ignored)

  • m (int) – (ignored)

  • y (array_like) – Evaluation of the test pattern

Returns

grad – Gradient for the test pattern

Return type

ndarray

musisep.dictsep.pursuit.test_pattern_grad_helper(x, r, amps, shifts, pat_amps, pat_shifts, pat_sigmas)[source]

Helper function for the computation of the gradient of the test pattern.

Parameters
  • x (array_like) – Positions where pattern was evaluated

  • r (array_like) – Residual of the pattern

  • amps (array_like) – Amplitudes of the patterns

  • shifts (array_like) – Shifts of the peaks

  • pat_amps (array_like) – Amplitudes of the peaks for each pattern

  • pat_shifts (array_like) – Positions of the peaks on the axis for each pattern

  • pat_sigmas (array_like) – Standard deviations of the peaks for each pattern

Returns

grad – Gradient for the test pattern

Return type

ndarray

musisep.dictsep.pursuit.test_pattern_obj(peak_array, insts, fixed_params, pexp, qexp, m, y)[source]

Loss objective for a test pattern.

Parameters
  • peak_array (array_like) – Array of all the continuous parameters

  • insts (array_like) – Number of the instruments/patterns

  • fixed_params (sequence) – Extra fixed parameters

  • pexp (float) – (ignored)

  • qexp (float) – (ignored)

  • m (int) – (ignored)

  • y (array_like) – Evaluation of the test pattern

Returns

obj – Least-squares error

Return type

float

musisep.dictsep.pursuit.test_pursuit()[source]

Testing the pursuit algorithm on a generic example.

Module contents