musisep.neuralsep package

Submodules

musisep.neuralsep.__main__ module

Wrapper for the policy gradient separation algorithm. When invoked, the audio sources in the supplied audio file are separated.

musisep.neuralsep.__main__.separate_duan_acous(seed)[source]

Separate the piece from Duan et al. with euphonium and oboe.

Parameters

seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_duan_synth2(seed)[source]

Separate the piece from Duan et al. with piccolo and organ.

Parameters

seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_duan_synth3(seed)[source]

Separate the piece from Duan et al. with piccolo, organ, and oboe.

Parameters

seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_mozart(seed)[source]

Separate the piece by Mozart for recorder and violin.

Parameters

seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_mozart_cl(seed)[source]

Separate the piece by Mozart for clarinet and piano.

Parameters

seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_mozart_piano(seed)[source]

Separate the piece by Mozart for clarinet and piano.

Parameters

seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_train(seed_np, seed_tf, name, mixed_soundfile, orig_soundfiles, loss_coeffs=(0, 10, 1, 10), har_num=25, num_guesses=(3, 3), spl=0.9, batch_size=12, batch_size_pred=100, virt_batch_mul=1, stepsize_net=0.001, stepsize_dict=0.0001, tau=0.01, max_iter=100000, eval_interval=2500, sampdist=128, sub_factor=4, sigmas_an=6, load_dir=None, plot_color=False, save_points=(70000), init_dict=None)[source]

Separate a music recording into the contribution of the individual instruments.

Parameters
  • seed_np (int) – Random seed for NumPy

  • seed_tf (int) – Random seed for Tensorflow

  • name (string) – Name of the training run. Used for files names and logging.

  • mixed_soundfile (string) – Name of the sound file containing the mixture

  • orig_soundfiles (sequence of string) – Name of the sound files containing the individual instrument tracks

  • loss_coeffs (sequence of float) – Weights of the dictionary prediction loss, the sparse loss, the regularization loss, and the direct prediction loss

  • har_num (int) – Number of harmonics to identify

  • num_guesses (sequence of int) – Number of samples per tone

  • spl (float) – Discount factor for the sparsity

  • batch_size (int) – Batch size for training

  • batch_size_pred (int) – Batch size for prediction

  • virt_batch_mul (int) – Virtual batch multiplier

  • stepsize_net (float) – Learning rate for training the neural network

  • stepsize_dict (float) – Learning rate for training the dictionary

  • tau (float) – Exponent to control exploration

  • max_iter (int) – Total number of training iterations

  • eval_interval (int) – Interval at which to evaluate the entire spectrogram

  • sampdist (int) – Time interval of the spectrogram

  • sub_factor (int) – Factor by which to subsample the spectrogram for resynthesis

  • sigmas_an (float) – Number of standard deviations at which the analysis window is cut

  • load_dir (string) – Path from where to preload the model and the dictionary

  • plot_color (string or NoneType) – Whether to make a color plot

  • save_points (sequence of int) – Iterations at which to save the output

  • init_dict (tensor of float) – Dictionary with the shape [instruments, harmonics]

musisep.neuralsep.__main__.separate_urmp_03(seed)[source]

Separate the piece from URMP with flute and clarinet.

Parameters

seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_03_cl(seed)[source]

Represent the clarinet track from URMP (oracle).

Parameters

seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_03_fl(seed)[source]

Represent the flute track from URMP (oracle).

Parameters

seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_03_init(seed)[source]

Separate the piece from URMP with flute and clarinet with an initial oracle dictionary.

Parameters

seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_03_oracle(seed)[source]

Separate the piece from URMP with flute and clarinet with a fixed oracle dictionary.

Parameters

seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_09(seed)[source]

Separate the piece from URMP with trumpet and violin.

Parameters

seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_10(seed)[source]

Separate the piece from URMP with trumpet and saxophone.

Parameters

seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_11(seed)[source]

Separate the piece from URMP with oboe and violoncello.

Parameters

seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_11_init(seed)[source]

Separate the piece from URMP with oboe and violoncello with an initial oracle dictionary.

Parameters

seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_11_ob(seed)[source]

Represent the flute track from URMP (oracle).

Parameters

seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_11_oracle(seed)[source]

Separate the piece from URMP with oboe and violoncello with a fixed oracle dictionary.

Parameters

seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_11_vc(seed)[source]

Represent the violoncello track from URMP (oracle).

Parameters

seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.trainsep module

All the training mechanisms for blind separation via neural networks.

class musisep.neuralsep.trainsep.ParamsDict(batch_size, num_guesses_prod)[source]

Bases: object

Container object for the model parameters.

Parameters
  • batch_size (int) – Batch size of all the data structures

  • num_guesses_prod (int) – Total number of all the samples per spectrum

class musisep.neuralsep.trainsep.SpectLoss(batch_size, num_guesses_prod, inst_num, spectheight)[source]

Bases: object

Container object for the spectra and losses for the individual tones of the instruments.

Parameters
  • batch_size (int) – Batch size of all the data structures

  • num_guesses_prod (int) – Total number of all the samples per spectrum

  • inst_num (int) – Number of instruments in the sample

  • spectheight (int) – Size of the input/output spectrum

add_tone(params, har_coeffs, on_factors, insts, har_spect, inst_dict, orig_spect)[source]

Add the results of a new tone to the object

Parameters
  • params (tensor of float) – Instrument parameters for the tone

  • har_coeffs (tensor of float) – Relative complexe amplitudes of the harmonics

  • on_factors (tensor of int) – Binary indicator if a tone contributes to the sparse prediction

  • insts (tensor of int) – Indices of the instruments playing the tones

  • har_spect (tensor of float) – Spectra of the individual harmonics (without amplitudes)

  • inst_dict (tensor of float) – Dictionary with the shape [instruments, harmonics]

  • orig_spect (tensor of float) – Original input sampled spectrum

class musisep.neuralsep.trainsep.SpectvisDict(spectheight)[source]

Bases: object

Container object to visualize spectra.

Parameters

spectheight (int) – Size of the spectrum

class musisep.neuralsep.trainsep.Trainer(name, mixed_soundfile, orig_soundfiles, loss_coeffs, har_num, num_guesses, spl, batch_size, batch_size_pred, virt_batch_mul, stepsize_net, stepsize_dict, tau, sampdist, sub_factor, sigmas_an, plot_color, save_points, init_dict)[source]

Bases: object

Object containing all the data necessary for the training.

Parameters
  • name (string) – Name of the training run. Used for files names and logging.

  • mixed_soundfile (string) – Name of the sound file containing the mixture

  • orig_soundfiles (sequence of string) – Name of the sound files containing the individual instrument tracks

  • loss_coeffs (sequence of float) – Weights of the dictionary prediction loss, the sparse loss, the regularization loss, and the direct prediction loss

  • har_num (int) – Number of harmonics to identify

  • num_guesses (sequence of int) – Number of samples per tone

  • spl (float) – Discount factor for the sparsity

  • batch_size (int) – Batch size for training

  • batch_size_pred (int) – Batch size for prediction

  • virt_batch_mul (int) – Virtual batch multiplier

  • stepsize_net (float) – Learning rate for training the neural network

  • stepsize_dict (float) – Learning rate for training the dictionary

  • tau (float) – Exponent to control exploration

  • sampdist (int) – Time interval of the spectrogram

  • sub_factor (int) – Factor by which to subsample the spectrogram for resynthesis

  • sigmas_an (float) – Number of standard deviations at which the analysis window is cut

  • plot_color (string or NoneType) – Whether to make a color plot

  • save_points (sequence of int) – Iterations at which to save the output

  • init_dict (tensor of float) – Dictionary with the shape [instruments, harmonics]

add_gradient_dict(gradient_dict)[source]

Add gradients related to the dictionary.

Parameters

gradient_dict (tensor of float) – Gradient with respect to the dictionary

add_gradient_scales(gradient_scales)[source]

Add gradients related to the output scalings.

Parameters

gradient_dict (tensor of float) – Gradient with respect to the output scalings

add_gradients_model_cnn(gradients_model_cnn)[source]

Add gradients related to the CNN model

Parameters

gradients_model_cnn (sequence of tensor of float) – Gradients with respect to the model

apply_gradients()[source]

Apply all gradients to the optimization algorithm.

load(module)[source]

Load dictionary, output scalings, and the model from a module.

moduletf.Module

Module with the saved data

make_model(input_sizes)[source]

Construct a Keras model for the parameter prediction

Parameters

input_sizes (int) – Total number of input channels for the network

Returns

Return type

The neural network as a Keras model

predict_loop(k, write)[source]

Predict and resynthesize the entire spectrogram.

Parameters
  • k (int) – Iteration number

  • write (bool) – Whether to save the output to files

predict_mix_spect(mix_spect_in)[source]

Predict the separation of a mixture spectrum and compute the losses.

Parameters

mix_spect_in (tensor of float) – Mixture spectrogram to perform the separation on

Returns

  • spects (SimpleNamespace) – Spectra related to the mixture

  • losses (SimpleNamespace) – Losses related to the mixture

separate(mix_spect_in, tau, batch_size, predict=False)[source]

Identify all the parameters for the tones in the spectrum.

Parameters
  • mix_spect_in (tensor of float) – Mixture spectrogram to perform the separation on

  • tau (float) – Exponent to control exploration

  • batch_size (int) – Batch size for training

  • predict (bool) – Whether to go into prediction mode instead of training

  • Results

  • -------

  • spects (SimpleNamespace) – Spectra related to the mixture

  • losses (SimpleNamespace) – Losses related to the mixture

  • params_d (ParamsDict) – Parameters related to the mixture

  • spectvis_d (SpectvisDict) – Visualization spectra related to the mixture

separate_inst(model, model_in, training, batch_factor, fan_factor, num_guesses, inst_mask, tau, batch_size)[source]

Identify the parameters for one tone in a given spectrum

Parameters
  • model (tf.Keras.Model) – Definition of the neural network

  • model_in (tensor of float) – Input channels that the neural network receives

  • training (bool) – Whether to perform training

  • batch_factor (int) – Product of the number of samples for previous tones

  • fan_factor (int) – Product of the number of samples for current and future tones

  • num_guesses (int) – Number of samples for the current tone

  • inst_mask (tensor of int) – 1 for instruments that have already played a tone, 0 otherwise

  • tau (float) – Exponent to control exploration

  • batch_size (int) – Batch size for training

Returns

  • params_tone (SimpleNamespace) – All parameters relating to an identified tone

  • params_spect (SimpleNamespace) – Unsampled parameters in the dimensionality of the spectrum

  • har_spects_raw (tensor of float) – Spectra of the harmonics

train_dict_norm(writer, k)[source]

Train the dictionary such that the largest entry for each instrument gets to 1.

Parameters
  • writer (SummaryWriter) – Writer object to capture the summarized variables

  • k (int) – Iteration number

Returns

inst_dict_norm – Dictionary norm loss

Return type

tensor of float

train_loop(max_iter, eval_interval, interval=50)[source]

Train the neural network. Predict and resynthesize the entire spectrogram.

Parameters
  • max_iter (int) – Total number of training iterations

  • eval_interval (int) – Interval at which to evaluate the entire spectrogram

  • interval (bool) – Interval at which to output debug information

train_mix_spect(mix_spect_in, writer, k)[source]

Train the separation of a mixture spectrum and compute the losses.

Parameters
  • mix_spect_in (tensor of float) – Mixture spectrogram to perform the separation on

  • writer (SummaryWriter) – Writer object to capture the summarized variables

  • k (int) – Iteration number

Returns

  • spects (SimpleNamespace) – Spectra related to the mixture

  • losses (SimpleNamespace) – Losses related to the mixture

  • params (ParamsDict) – Parameters related to the mixture

  • spectvis (SpectvisDict) – Visualization spectra related to the mixture

musisep.neuralsep.trainsep.add_linspace(x)[source]

Add a linear range layer to a CNN tensor

Parameters

x (tensor of float) – Layer to add the linear range to

Returns

model_in – Input layer with the linear range added

Return type

tensor of float

musisep.neuralsep.trainsep.comp_total_loss(losses, on_factors, spl, loss_coeffs)[source]

Compute the linear combination of losses

Parameters
  • losses (SimpleNamespace) – The individual loss values

  • on_factors (tensor of int) – Binary indicator if a tone contributes to the sparse prediction

  • spl (float) – Discount factor for the sparsity

  • loss_coeffs (array_like) – Linear weights for the loss terms

Returns

  • total_loss (tensor of float) – Computed total loss

  • discounts (tensor of float) – Sparsity discount factors

musisep.neuralsep.trainsep.complex_abs(spect, axis)[source]

Compute the absolute value of a complex tensor.

Parameters
  • spect (tensor of float) – Real-valued tensor with a complex axis

  • axis (int) – Complex axis

Returns

Return type

Absolute value of the input tensor (with complex axis shrunken to size 1)

musisep.neuralsep.trainsep.complex_arg(spect, axis, bias=1e-20)[source]

Compute the argument of a complex tensor.

Parameters
  • spect (tensor of float) – Real-valued tensor with a complex axis

  • axis (int) – Complex axis

  • bias (float) – Offset to avoid division by 0

Returns

Return type

Tensor normalized to an absolute value of 1

musisep.neuralsep.trainsep.gamma_probs(spreads, spreads_a, spreads_b)[source]

Evaluate the gamma distribution

Parameters
  • spreads (tensor of float) – Values where to evaluate

  • spreads_a (tensor of float) – “alpha” parameter of the distribution

  • spreads_b (tensor of float) – “beta” parameter of the distribution

Returns

Return type

Log probabilities

musisep.neuralsep.trainsep.gauss(x, mean, stdev)[source]

Evaluate the Gaussian function.

Parameters
  • x (tensor of float) – Points of evaluation

  • mean (tensor of float) – Mean value(s)

  • stdev (tensor of float) – Standard deviation(s)

Returns

spect – Values of the Gaussian

Return type

tensor of float

musisep.neuralsep.trainsep.inst_scale(params, insts, inst_dict, har_coeffs, spectheight, sigmas_an)[source]

Evaluate the linear-frequency spectra for tones.

Parameters
  • params (tensor of float) – Continous parameters for the tones, stacked along axis 2

  • insts (int) – Indices of the instruments playing the tones

  • inst_dict (tensor of float) – Dictionary with the shape [instruments, harmonics]

  • har_coeffs (tensor of float) – Relative complexe amplitudes of the harmonics

  • spectheight (int) – Size of the output spectrum

  • sigmas_an (float) – Number of standard deviations at which the analysis window is cut

Returns

spect – Spectra of the tones

Return type

tensor of float

musisep.neuralsep.trainsep.inst_scale_post(amps, insts, inst_dict, inst_num, har_coeffs, har_spect)[source]

Combine the spectra of individual harmonics into tone spectra.

Parameters
  • amps (tensor of float) – Amplitudes of the tones

  • insts (int) – Indices of the instruments playing the tones

  • inst_dict (tensor of float) – Dictionary with the shape [instruments, harmonics]

  • inst_num (int) – Number of instruments available

  • har_coeffs (tensor of float) – Relative complexe amplitudes of the harmonics

  • har_spect (tensor of float) – Spectra of the individual harmonics (without amplitudes)

Returns

spect – Spectra of the tones

Return type

tensor of float

musisep.neuralsep.trainsep.inst_scale_raw(params, har_num, spectheight, sigmas_an)[source]

Evaluate the linear-frequency spectra for the harmonics of tones, disregarding the amplitudes.

Parameters
  • params (tensor of float) – Continous parameters for the tones, stacked along axis 2

  • har_num (int) – Number of harmonics to evaluate

  • spectheight (int) – Size of the output spectrum

  • sigmas_an (float) – Number of standard deviations at which the analysis window is cut

Returns

spect – Spectra of the harmonics

Return type

tensor of float

musisep.neuralsep.trainsep.lift_cmplx(x, axis, shift=1e-07, qexp=0.5)[source]

Lift a complex-valued spectrum via a concave power function.

Parameters
  • x (tensor of float) – Spectrum

  • shift (float) – Additive constant to keep the transform differentiable

  • qexp (float) – Exponent of the power function

Returns

Return type

The lifted spectrum

musisep.neuralsep.trainsep.lift_spect(x, shift=1e-07, qexp=0.5)[source]

Lift a positive-valued spectrum via a concave power function.

Parameters
  • x (tensor of float) – Spectrum

  • shift (float) – Additive constant to keep the transform differentiable

  • qexp (float) – Exponent of the power function

Returns

Return type

The lifted spectrum

musisep.neuralsep.trainsep.lift_spect_sign(x, shift=1e-07, qexp=0.5)[source]

Lift a real-valued spectrum via a concave power function.

Parameters
  • x (tensor of float) – Spectrum

  • shift (float) – Additive constant to keep the transform differentiable

  • qexp (float) – Exponent of the power function

Returns

Return type

The lifted spectrum

musisep.neuralsep.trainsep.lifted_l2_abs(x, y, axis)[source]

Radially symmetric lifted l2 distance between two spectra.

Parameters
  • x (tensor of float) – First spectrum

  • y (tensor of float) – Second spectrum

  • axis (int) – Complex axis

Returns

Return type

l2 loss

musisep.neuralsep.trainsep.lifted_l2_cmplx(x, y, axis)[source]

Radially symmetric lifted l2 distance between two spectra.

Parameters
  • x (tensor of float) – First spectrum

  • y (tensor of float) – Second spectrum

  • axis (int) – Complex axis

Returns

Return type

l2 loss

musisep.neuralsep.trainsep.lsq_stock(har_spects, samp)[source]

Solve a regularized least-squares system.

Parameters
  • har_spects (tensor of float) – Spectra of the individual harmonics (without amplitudes)

  • samp (tensor of float) – Direct prediction

Returns

Return type

Phase values for the harmonics

musisep.neuralsep.trainsep.mix_inst_spects(inst_spects, axis)[source]

Combine the spectra for multiple instruments, dropping the summation axis.

Parameters
  • inst_spects (tensor of float) – Spectra for the individual instruments

  • axis (int) – Summation axis

Returns

Return type

Mixture spectrum

musisep.neuralsep.trainsep.norm_pdf(pdf)[source]

Normalize a categorical distribution batch-wise via softmax.

Parameters

pdf (tensor of float) – Log probabilies of [batch, insts, scales]

Returns

Return type

Normalized log probabilities

musisep.neuralsep.trainsep.plot_spectrum(filename, spectheight, *spects)[source]

Plot real-valued spectra to a file

Parameters
  • filename (string) – Name of the file to save the figure to

  • spectheight (int) – Size of the output spectra

  • spects (sequence of array_like of float) – Spectra to plot

musisep.neuralsep.trainsep.sample_multi(pdf)[source]

Sample batch-wise from a categorical distribution.

Parameters

pdf (tensor of float) – Log probabilies of [batch, insts, scales]

Returns

  • insts (tensor of int) – Indices of the sampled instruments

  • scales (tensor of int) – Discrete sampled frequencies

musisep.neuralsep.trainsep.sample_multi_max(pdf)[source]

Pick the mode batch-wise from a categorical distribution.

Parameters

pdf (tensor of float) – Log probabilies of [batch, insts, scales]

Returns

  • insts (tensor of int) – Indices of the selected instruments

  • scales (tensor of int) – Discrete selected frequencies

musisep.neuralsep.trainsep.trans_params(amps, scales, sigmas, spreads, sigmas_an)[source]

Apply transformations on instrument parameters to ensure their validity.

Parameters
  • amps (tensor of float) – Amplitudes of the tones

  • scales (tensor of float) – Natural fundamental frequencies of the tones

  • sigmas (tensor of float) – Widths of the Gaussians

  • spreads (tensor of float) – Inharmonicities of the tones

Returns

  • amps (tensor of float) – Amplitudes of the tones

  • scales (tensor of float) – Natural fundamental frequencies of the tones

  • sigmas (tensor of float) – Widths of the Gaussians

  • spreads (tensor of float) – Inharmonicities of the tones

musisep.neuralsep.trainsep.unet(x, inst_num, spectheight)[source]

Create the U-Net as a Keras model.

Parameters
  • inst_num (int) – Number of instruments expected in the sample

  • spectheight (int) – Size of the input/output spectrum

Returns

y – Network output

Return type

tensor of float