musisep.neuralsep package¶

Submodules¶

musisep.neuralsep.main module¶

Wrapper for the policy gradient separation algorithm. When invoked, the audio sources in the supplied audio file are separated.

musisep.neuralsep.__main__.separate_duan_acous(seed)[source]¶

Separate the piece from Duan et al. with euphonium and oboe.

Parameters: seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_duan_synth2(seed)[source]¶

Separate the piece from Duan et al. with piccolo and organ.

Parameters: seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_duan_synth3(seed)[source]¶

Separate the piece from Duan et al. with piccolo, organ, and oboe.

Parameters: seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_mozart(seed)[source]¶

Separate the piece by Mozart for recorder and violin.

Parameters: seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_mozart_cl(seed)[source]¶

Separate the piece by Mozart for clarinet and piano.

Parameters: seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_mozart_piano(seed)[source]¶

Separate the piece by Mozart for clarinet and piano.

Parameters: seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_train(seed_np, seed_tf, name, mixed_soundfile, orig_soundfiles, loss_coeffs=(0, 10, 1, 10), har_num=25, num_guesses=(3, 3), spl=0.9, batch_size=12, batch_size_pred=100, virt_batch_mul=1, stepsize_net=0.001, stepsize_dict=0.0001, tau=0.01, max_iter=100000, eval_interval=2500, sampdist=128, sub_factor=4, sigmas_an=6, load_dir=None, plot_color=False, save_points=(70000), init_dict=None)[source]¶

Separate a music recording into the contribution of the individual instruments.

Parameters

seed_np (int) – Random seed for NumPy
seed_tf (int) – Random seed for Tensorflow
name (string) – Name of the training run. Used for files names and logging.
mixed_soundfile (string) – Name of the sound file containing the mixture
orig_soundfiles (sequence of string) – Name of the sound files containing the individual instrument tracks
loss_coeffs (sequence of float) – Weights of the dictionary prediction loss, the sparse loss, the regularization loss, and the direct prediction loss
har_num (int) – Number of harmonics to identify
num_guesses (sequence of int) – Number of samples per tone
spl (float) – Discount factor for the sparsity
batch_size (int) – Batch size for training
batch_size_pred (int) – Batch size for prediction
virt_batch_mul (int) – Virtual batch multiplier
stepsize_net (float) – Learning rate for training the neural network
stepsize_dict (float) – Learning rate for training the dictionary
tau (float) – Exponent to control exploration
max_iter (int) – Total number of training iterations
eval_interval (int) – Interval at which to evaluate the entire spectrogram
sampdist (int) – Time interval of the spectrogram
sub_factor (int) – Factor by which to subsample the spectrogram for resynthesis
sigmas_an (float) – Number of standard deviations at which the analysis window is cut
load_dir (string) – Path from where to preload the model and the dictionary
plot_color (string or NoneType) – Whether to make a color plot
save_points (sequence of int) – Iterations at which to save the output
init_dict (tensor of float) – Dictionary with the shape [instruments, harmonics]

musisep.neuralsep.__main__.separate_urmp_03(seed)[source]¶

Separate the piece from URMP with flute and clarinet.

Parameters: seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_03_cl(seed)[source]¶

Represent the clarinet track from URMP (oracle).

Parameters: seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_03_fl(seed)[source]¶

Represent the flute track from URMP (oracle).

Parameters: seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_03_init(seed)[source]¶

Separate the piece from URMP with flute and clarinet with an initial oracle dictionary.

Parameters: seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_03_oracle(seed)[source]¶

Separate the piece from URMP with flute and clarinet with a fixed oracle dictionary.

Parameters: seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_09(seed)[source]¶

Separate the piece from URMP with trumpet and violin.

Parameters: seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_10(seed)[source]¶

Separate the piece from URMP with trumpet and saxophone.

Parameters: seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_11(seed)[source]¶

Separate the piece from URMP with oboe and violoncello.

Parameters: seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_11_init(seed)[source]¶

Separate the piece from URMP with oboe and violoncello with an initial oracle dictionary.

Parameters: seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_11_ob(seed)[source]¶

Represent the flute track from URMP (oracle).

Parameters: seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_11_oracle(seed)[source]¶

Separate the piece from URMP with oboe and violoncello with a fixed oracle dictionary.

Parameters: seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.__main__.separate_urmp_11_vc(seed)[source]¶

Represent the violoncello track from URMP (oracle).

Parameters: seed (int) – Random seed for both NumPy and Tensorflow

musisep.neuralsep.trainsep module¶

All the training mechanisms for blind separation via neural networks.

class musisep.neuralsep.trainsep.ParamsDict(batch_size, num_guesses_prod)[source]¶

Bases: object

Container object for the model parameters.

Parameters

batch_size (int) – Batch size of all the data structures
num_guesses_prod (int) – Total number of all the samples per spectrum

class musisep.neuralsep.trainsep.SpectLoss(batch_size, num_guesses_prod, inst_num, spectheight)[source]¶

Bases: object

Container object for the spectra and losses for the individual tones of the instruments.

Parameters

batch_size (int) – Batch size of all the data structures
num_guesses_prod (int) – Total number of all the samples per spectrum
inst_num (int) – Number of instruments in the sample
spectheight (int) – Size of the input/output spectrum

add_tone(params, har_coeffs, on_factors, insts, har_spect, inst_dict, orig_spect)[source]¶

Add the results of a new tone to the object

Parameters

params (tensor of float) – Instrument parameters for the tone
har_coeffs (tensor of float) – Relative complexe amplitudes of the harmonics
on_factors (tensor of int) – Binary indicator if a tone contributes to the sparse prediction
insts (tensor of int) – Indices of the instruments playing the tones
har_spect (tensor of float) – Spectra of the individual harmonics (without amplitudes)
inst_dict (tensor of float) – Dictionary with the shape [instruments, harmonics]
orig_spect (tensor of float) – Original input sampled spectrum

class musisep.neuralsep.trainsep.SpectvisDict(spectheight)[source]¶

Bases: object

Container object to visualize spectra.

Parameters: spectheight (int) – Size of the spectrum

class musisep.neuralsep.trainsep.Trainer(name, mixed_soundfile, orig_soundfiles, loss_coeffs, har_num, num_guesses, spl, batch_size, batch_size_pred, virt_batch_mul, stepsize_net, stepsize_dict, tau, sampdist, sub_factor, sigmas_an, plot_color, save_points, init_dict)[source]¶

Bases: object

Object containing all the data necessary for the training.

Parameters

name (string) – Name of the training run. Used for files names and logging.
mixed_soundfile (string) – Name of the sound file containing the mixture
orig_soundfiles (sequence of string) – Name of the sound files containing the individual instrument tracks
loss_coeffs (sequence of float) – Weights of the dictionary prediction loss, the sparse loss, the regularization loss, and the direct prediction loss
har_num (int) – Number of harmonics to identify
num_guesses (sequence of int) – Number of samples per tone
spl (float) – Discount factor for the sparsity
batch_size (int) – Batch size for training
batch_size_pred (int) – Batch size for prediction
virt_batch_mul (int) – Virtual batch multiplier
stepsize_net (float) – Learning rate for training the neural network
stepsize_dict (float) – Learning rate for training the dictionary
tau (float) – Exponent to control exploration
sampdist (int) – Time interval of the spectrogram
sub_factor (int) – Factor by which to subsample the spectrogram for resynthesis
sigmas_an (float) – Number of standard deviations at which the analysis window is cut
plot_color (string or NoneType) – Whether to make a color plot
save_points (sequence of int) – Iterations at which to save the output
init_dict (tensor of float) – Dictionary with the shape [instruments, harmonics]

add_gradient_dict(gradient_dict)[source]¶

Add gradients related to the dictionary.

Parameters: gradient_dict (tensor of float) – Gradient with respect to the dictionary

add_gradient_scales(gradient_scales)[source]¶

Add gradients related to the output scalings.

Parameters: gradient_dict (tensor of float) – Gradient with respect to the output scalings

add_gradients_model_cnn(gradients_model_cnn)[source]¶

Add gradients related to the CNN model

Parameters: gradients_model_cnn (sequence of tensor of float) – Gradients with respect to the model

apply_gradients()[source]¶: Apply all gradients to the optimization algorithm.

load(module)[source]¶

Load dictionary, output scalings, and the model from a module.

moduletf.Module: Module with the saved data

make_model(input_sizes)[source]¶

Construct a Keras model for the parameter prediction

Parameters: input_sizes (int) – Total number of input channels for the network
Returns
Return type: The neural network as a Keras model

predict_loop(k, write)[source]¶

Predict and resynthesize the entire spectrogram.

Parameters

k (int) – Iteration number
write (bool) – Whether to save the output to files

predict_mix_spect(mix_spect_in)[source]¶

Predict the separation of a mixture spectrum and compute the losses.

Parameters

mix_spect_in (tensor of float) – Mixture spectrogram to perform the separation on

Returns

spects (SimpleNamespace) – Spectra related to the mixture
losses (SimpleNamespace) – Losses related to the mixture

separate(mix_spect_in, tau, batch_size, predict=False)[source]¶

Identify all the parameters for the tones in the spectrum.

Parameters

mix_spect_in (tensor of float) – Mixture spectrogram to perform the separation on
tau (float) – Exponent to control exploration
batch_size (int) – Batch size for training
predict (bool) – Whether to go into prediction mode instead of training
Results –
------- –
spects (SimpleNamespace) – Spectra related to the mixture
losses (SimpleNamespace) – Losses related to the mixture
params_d (ParamsDict) – Parameters related to the mixture
spectvis_d (SpectvisDict) – Visualization spectra related to the mixture

separate_inst(model, model_in, training, batch_factor, fan_factor, num_guesses, inst_mask, tau, batch_size)[source]¶

Identify the parameters for one tone in a given spectrum

Parameters

model (tf.Keras.Model) – Definition of the neural network
model_in (tensor of float) – Input channels that the neural network receives
training (bool) – Whether to perform training
batch_factor (int) – Product of the number of samples for previous tones
fan_factor (int) – Product of the number of samples for current and future tones
num_guesses (int) – Number of samples for the current tone
inst_mask (tensor of int) – 1 for instruments that have already played a tone, 0 otherwise
tau (float) – Exponent to control exploration
batch_size (int) – Batch size for training

Returns

params_tone (SimpleNamespace) – All parameters relating to an identified tone
params_spect (SimpleNamespace) – Unsampled parameters in the dimensionality of the spectrum
har_spects_raw (tensor of float) – Spectra of the harmonics

train_dict_norm(writer, k)[source]¶

Train the dictionary such that the largest entry for each instrument gets to 1.

Parameters

writer (SummaryWriter) – Writer object to capture the summarized variables
k (int) – Iteration number

Returns

inst_dict_norm – Dictionary norm loss

Return type

tensor of float

train_loop(max_iter, eval_interval, interval=50)[source]¶

Train the neural network. Predict and resynthesize the entire spectrogram.

Parameters

max_iter (int) – Total number of training iterations
eval_interval (int) – Interval at which to evaluate the entire spectrogram
interval (bool) – Interval at which to output debug information

train_mix_spect(mix_spect_in, writer, k)[source]¶

Train the separation of a mixture spectrum and compute the losses.

Parameters

mix_spect_in (tensor of float) – Mixture spectrogram to perform the separation on
writer (SummaryWriter) – Writer object to capture the summarized variables
k (int) – Iteration number

Returns

spects (SimpleNamespace) – Spectra related to the mixture
losses (SimpleNamespace) – Losses related to the mixture
params (ParamsDict) – Parameters related to the mixture
spectvis (SpectvisDict) – Visualization spectra related to the mixture

musisep.neuralsep.trainsep.add_linspace(x)[source]¶

Add a linear range layer to a CNN tensor

Parameters: x (tensor of float) – Layer to add the linear range to
Returns: model_in – Input layer with the linear range added
Return type: tensor of float

musisep.neuralsep.trainsep.comp_total_loss(losses, on_factors, spl, loss_coeffs)[source]¶

Compute the linear combination of losses

Parameters

losses (SimpleNamespace) – The individual loss values
on_factors (tensor of int) – Binary indicator if a tone contributes to the sparse prediction
spl (float) – Discount factor for the sparsity
loss_coeffs (array_like) – Linear weights for the loss terms

Returns

total_loss (tensor of float) – Computed total loss
discounts (tensor of float) – Sparsity discount factors

musisep.neuralsep.trainsep.complex_abs(spect, axis)[source]¶

Compute the absolute value of a complex tensor.

Parameters

spect (tensor of float) – Real-valued tensor with a complex axis
axis (int) – Complex axis

Returns

Return type

Absolute value of the input tensor (with complex axis shrunken to size 1)

musisep.neuralsep.trainsep.complex_arg(spect, axis, bias=1e-20)[source]¶

Compute the argument of a complex tensor.

Parameters

spect (tensor of float) – Real-valued tensor with a complex axis
axis (int) – Complex axis
bias (float) – Offset to avoid division by 0

Returns

Return type

Tensor normalized to an absolute value of 1

musisep.neuralsep.trainsep.gamma_probs(spreads, spreads_a, spreads_b)[source]¶

Evaluate the gamma distribution

Parameters

spreads (tensor of float) – Values where to evaluate
spreads_a (tensor of float) – “alpha” parameter of the distribution
spreads_b (tensor of float) – “beta” parameter of the distribution

Returns

Return type

Log probabilities

musisep.neuralsep.trainsep.gauss(x, mean, stdev)[source]¶

Evaluate the Gaussian function.

Parameters

x (tensor of float) – Points of evaluation
mean (tensor of float) – Mean value(s)
stdev (tensor of float) – Standard deviation(s)

Returns

spect – Values of the Gaussian

Return type

tensor of float

musisep.neuralsep.trainsep.inst_scale(params, insts, inst_dict, har_coeffs, spectheight, sigmas_an)[source]¶

Evaluate the linear-frequency spectra for tones.

Parameters

params (tensor of float) – Continous parameters for the tones, stacked along axis 2
insts (int) – Indices of the instruments playing the tones
inst_dict (tensor of float) – Dictionary with the shape [instruments, harmonics]
har_coeffs (tensor of float) – Relative complexe amplitudes of the harmonics
spectheight (int) – Size of the output spectrum
sigmas_an (float) – Number of standard deviations at which the analysis window is cut

Returns

spect – Spectra of the tones

Return type

tensor of float

musisep.neuralsep.trainsep.inst_scale_post(amps, insts, inst_dict, inst_num, har_coeffs, har_spect)[source]¶

Combine the spectra of individual harmonics into tone spectra.

Parameters

amps (tensor of float) – Amplitudes of the tones
insts (int) – Indices of the instruments playing the tones
inst_dict (tensor of float) – Dictionary with the shape [instruments, harmonics]
inst_num (int) – Number of instruments available
har_coeffs (tensor of float) – Relative complexe amplitudes of the harmonics
har_spect (tensor of float) – Spectra of the individual harmonics (without amplitudes)

Returns

spect – Spectra of the tones

Return type

tensor of float

musisep.neuralsep.trainsep.inst_scale_raw(params, har_num, spectheight, sigmas_an)[source]¶

Evaluate the linear-frequency spectra for the harmonics of tones, disregarding the amplitudes.

Parameters

params (tensor of float) – Continous parameters for the tones, stacked along axis 2
har_num (int) – Number of harmonics to evaluate
spectheight (int) – Size of the output spectrum
sigmas_an (float) – Number of standard deviations at which the analysis window is cut

Returns

spect – Spectra of the harmonics

Return type

tensor of float

musisep.neuralsep.trainsep.lift_cmplx(x, axis, shift=1e-07, qexp=0.5)[source]¶

Lift a complex-valued spectrum via a concave power function.

Parameters

x (tensor of float) – Spectrum
shift (float) – Additive constant to keep the transform differentiable
qexp (float) – Exponent of the power function

Returns

Return type

The lifted spectrum

musisep.neuralsep.trainsep.lift_spect(x, shift=1e-07, qexp=0.5)[source]¶

Lift a positive-valued spectrum via a concave power function.

Parameters

x (tensor of float) – Spectrum
shift (float) – Additive constant to keep the transform differentiable
qexp (float) – Exponent of the power function

Returns

Return type

The lifted spectrum

musisep.neuralsep.trainsep.lift_spect_sign(x, shift=1e-07, qexp=0.5)[source]¶

Lift a real-valued spectrum via a concave power function.

Parameters

x (tensor of float) – Spectrum
shift (float) – Additive constant to keep the transform differentiable
qexp (float) – Exponent of the power function

Returns

Return type

The lifted spectrum

musisep.neuralsep.trainsep.lifted_l2_abs(x, y, axis)[source]¶

Radially symmetric lifted l2 distance between two spectra.

Parameters

x (tensor of float) – First spectrum
y (tensor of float) – Second spectrum
axis (int) – Complex axis

Returns

Return type

l2 loss

musisep.neuralsep.trainsep.lifted_l2_cmplx(x, y, axis)[source]¶

Radially symmetric lifted l2 distance between two spectra.

Parameters

x (tensor of float) – First spectrum
y (tensor of float) – Second spectrum
axis (int) – Complex axis

Returns

Return type

l2 loss

musisep.neuralsep.trainsep.lsq_stock(har_spects, samp)[source]¶

Solve a regularized least-squares system.

Parameters

har_spects (tensor of float) – Spectra of the individual harmonics (without amplitudes)
samp (tensor of float) – Direct prediction

Returns

Return type

Phase values for the harmonics

musisep.neuralsep.trainsep.mix_inst_spects(inst_spects, axis)[source]¶

Combine the spectra for multiple instruments, dropping the summation axis.

Parameters

inst_spects (tensor of float) – Spectra for the individual instruments
axis (int) – Summation axis

Returns

Return type

Mixture spectrum

musisep.neuralsep.trainsep.norm_pdf(pdf)[source]¶

Normalize a categorical distribution batch-wise via softmax.

Parameters: pdf (tensor of float) – Log probabilies of [batch, insts, scales]
Returns
Return type: Normalized log probabilities

musisep.neuralsep.trainsep.plot_spectrum(filename, spectheight, *spects)[source]¶

Plot real-valued spectra to a file

Parameters

filename (string) – Name of the file to save the figure to
spectheight (int) – Size of the output spectra
spects (sequence of array_like of float) – Spectra to plot

musisep.neuralsep.trainsep.sample_multi(pdf)[source]¶

Sample batch-wise from a categorical distribution.

Parameters

pdf (tensor of float) – Log probabilies of [batch, insts, scales]

Returns

insts (tensor of int) – Indices of the sampled instruments
scales (tensor of int) – Discrete sampled frequencies

musisep.neuralsep.trainsep.sample_multi_max(pdf)[source]¶

Pick the mode batch-wise from a categorical distribution.

Parameters

pdf (tensor of float) – Log probabilies of [batch, insts, scales]

Returns

insts (tensor of int) – Indices of the selected instruments
scales (tensor of int) – Discrete selected frequencies

musisep.neuralsep.trainsep.trans_params(amps, scales, sigmas, spreads, sigmas_an)[source]¶

Apply transformations on instrument parameters to ensure their validity.

Parameters

amps (tensor of float) – Amplitudes of the tones
scales (tensor of float) – Natural fundamental frequencies of the tones
sigmas (tensor of float) – Widths of the Gaussians
spreads (tensor of float) – Inharmonicities of the tones

Returns

amps (tensor of float) – Amplitudes of the tones
scales (tensor of float) – Natural fundamental frequencies of the tones
sigmas (tensor of float) – Widths of the Gaussians
spreads (tensor of float) – Inharmonicities of the tones

musisep.neuralsep.trainsep.unet(x, inst_num, spectheight)[source]¶

Create the U-Net as a Keras model.

Parameters

inst_num (int) – Number of instruments expected in the sample
spectheight (int) – Size of the input/output spectrum

Returns

y – Network output

Return type

tensor of float

musisep.neuralsep package¶

Submodules¶

musisep.neuralsep.main module¶

musisep.neuralsep.trainsep module¶

Musisep

Navigation

Related Topics

musisep.neuralsep package¶

Submodules¶

musisep.neuralsep.__main__ module¶

musisep.neuralsep.trainsep module¶

musisep.neuralsep.main module¶