musisep.neuralsep package¶
Submodules¶
musisep.neuralsep.__main__ module¶
Wrapper for the policy gradient separation algorithm. When invoked, the audio sources in the supplied audio file are separated.
- musisep.neuralsep.__main__.separate_duan_acous(seed)[source]¶
Separate the piece from Duan et al. with euphonium and oboe.
- Parameters
seed (int) – Random seed for both NumPy and Tensorflow
- musisep.neuralsep.__main__.separate_duan_synth2(seed)[source]¶
Separate the piece from Duan et al. with piccolo and organ.
- Parameters
seed (int) – Random seed for both NumPy and Tensorflow
- musisep.neuralsep.__main__.separate_duan_synth3(seed)[source]¶
Separate the piece from Duan et al. with piccolo, organ, and oboe.
- Parameters
seed (int) – Random seed for both NumPy and Tensorflow
- musisep.neuralsep.__main__.separate_mozart(seed)[source]¶
Separate the piece by Mozart for recorder and violin.
- Parameters
seed (int) – Random seed for both NumPy and Tensorflow
- musisep.neuralsep.__main__.separate_mozart_cl(seed)[source]¶
Separate the piece by Mozart for clarinet and piano.
- Parameters
seed (int) – Random seed for both NumPy and Tensorflow
- musisep.neuralsep.__main__.separate_mozart_piano(seed)[source]¶
Separate the piece by Mozart for clarinet and piano.
- Parameters
seed (int) – Random seed for both NumPy and Tensorflow
- musisep.neuralsep.__main__.separate_train(seed_np, seed_tf, name, mixed_soundfile, orig_soundfiles, loss_coeffs=(0, 10, 1, 10), har_num=25, num_guesses=(3, 3), spl=0.9, batch_size=12, batch_size_pred=100, virt_batch_mul=1, stepsize_net=0.001, stepsize_dict=0.0001, tau=0.01, max_iter=100000, eval_interval=2500, sampdist=128, sub_factor=4, sigmas_an=6, load_dir=None, plot_color=False, save_points=(70000), init_dict=None)[source]¶
Separate a music recording into the contribution of the individual instruments.
- Parameters
seed_np (int) – Random seed for NumPy
seed_tf (int) – Random seed for Tensorflow
name (string) – Name of the training run. Used for files names and logging.
mixed_soundfile (string) – Name of the sound file containing the mixture
orig_soundfiles (sequence of string) – Name of the sound files containing the individual instrument tracks
loss_coeffs (sequence of float) – Weights of the dictionary prediction loss, the sparse loss, the regularization loss, and the direct prediction loss
har_num (int) – Number of harmonics to identify
num_guesses (sequence of int) – Number of samples per tone
spl (float) – Discount factor for the sparsity
batch_size (int) – Batch size for training
batch_size_pred (int) – Batch size for prediction
virt_batch_mul (int) – Virtual batch multiplier
stepsize_net (float) – Learning rate for training the neural network
stepsize_dict (float) – Learning rate for training the dictionary
tau (float) – Exponent to control exploration
max_iter (int) – Total number of training iterations
eval_interval (int) – Interval at which to evaluate the entire spectrogram
sampdist (int) – Time interval of the spectrogram
sub_factor (int) – Factor by which to subsample the spectrogram for resynthesis
sigmas_an (float) – Number of standard deviations at which the analysis window is cut
load_dir (string) – Path from where to preload the model and the dictionary
plot_color (string or NoneType) – Whether to make a color plot
save_points (sequence of int) – Iterations at which to save the output
init_dict (tensor of float) – Dictionary with the shape [instruments, harmonics]
- musisep.neuralsep.__main__.separate_urmp_03(seed)[source]¶
Separate the piece from URMP with flute and clarinet.
- Parameters
seed (int) – Random seed for both NumPy and Tensorflow
- musisep.neuralsep.__main__.separate_urmp_03_cl(seed)[source]¶
Represent the clarinet track from URMP (oracle).
- Parameters
seed (int) – Random seed for both NumPy and Tensorflow
- musisep.neuralsep.__main__.separate_urmp_03_fl(seed)[source]¶
Represent the flute track from URMP (oracle).
- Parameters
seed (int) – Random seed for both NumPy and Tensorflow
- musisep.neuralsep.__main__.separate_urmp_03_init(seed)[source]¶
Separate the piece from URMP with flute and clarinet with an initial oracle dictionary.
- Parameters
seed (int) – Random seed for both NumPy and Tensorflow
- musisep.neuralsep.__main__.separate_urmp_03_oracle(seed)[source]¶
Separate the piece from URMP with flute and clarinet with a fixed oracle dictionary.
- Parameters
seed (int) – Random seed for both NumPy and Tensorflow
- musisep.neuralsep.__main__.separate_urmp_09(seed)[source]¶
Separate the piece from URMP with trumpet and violin.
- Parameters
seed (int) – Random seed for both NumPy and Tensorflow
- musisep.neuralsep.__main__.separate_urmp_10(seed)[source]¶
Separate the piece from URMP with trumpet and saxophone.
- Parameters
seed (int) – Random seed for both NumPy and Tensorflow
- musisep.neuralsep.__main__.separate_urmp_11(seed)[source]¶
Separate the piece from URMP with oboe and violoncello.
- Parameters
seed (int) – Random seed for both NumPy and Tensorflow
- musisep.neuralsep.__main__.separate_urmp_11_init(seed)[source]¶
Separate the piece from URMP with oboe and violoncello with an initial oracle dictionary.
- Parameters
seed (int) – Random seed for both NumPy and Tensorflow
- musisep.neuralsep.__main__.separate_urmp_11_ob(seed)[source]¶
Represent the flute track from URMP (oracle).
- Parameters
seed (int) – Random seed for both NumPy and Tensorflow
musisep.neuralsep.trainsep module¶
All the training mechanisms for blind separation via neural networks.
- class musisep.neuralsep.trainsep.ParamsDict(batch_size, num_guesses_prod)[source]¶
Bases:
object
Container object for the model parameters.
- Parameters
batch_size (int) – Batch size of all the data structures
num_guesses_prod (int) – Total number of all the samples per spectrum
- class musisep.neuralsep.trainsep.SpectLoss(batch_size, num_guesses_prod, inst_num, spectheight)[source]¶
Bases:
object
Container object for the spectra and losses for the individual tones of the instruments.
- Parameters
batch_size (int) – Batch size of all the data structures
num_guesses_prod (int) – Total number of all the samples per spectrum
inst_num (int) – Number of instruments in the sample
spectheight (int) – Size of the input/output spectrum
- add_tone(params, har_coeffs, on_factors, insts, har_spect, inst_dict, orig_spect)[source]¶
Add the results of a new tone to the object
- Parameters
params (tensor of float) – Instrument parameters for the tone
har_coeffs (tensor of float) – Relative complexe amplitudes of the harmonics
on_factors (tensor of int) – Binary indicator if a tone contributes to the sparse prediction
insts (tensor of int) – Indices of the instruments playing the tones
har_spect (tensor of float) – Spectra of the individual harmonics (without amplitudes)
inst_dict (tensor of float) – Dictionary with the shape [instruments, harmonics]
orig_spect (tensor of float) – Original input sampled spectrum
- class musisep.neuralsep.trainsep.SpectvisDict(spectheight)[source]¶
Bases:
object
Container object to visualize spectra.
- Parameters
spectheight (int) – Size of the spectrum
- class musisep.neuralsep.trainsep.Trainer(name, mixed_soundfile, orig_soundfiles, loss_coeffs, har_num, num_guesses, spl, batch_size, batch_size_pred, virt_batch_mul, stepsize_net, stepsize_dict, tau, sampdist, sub_factor, sigmas_an, plot_color, save_points, init_dict)[source]¶
Bases:
object
Object containing all the data necessary for the training.
- Parameters
name (string) – Name of the training run. Used for files names and logging.
mixed_soundfile (string) – Name of the sound file containing the mixture
orig_soundfiles (sequence of string) – Name of the sound files containing the individual instrument tracks
loss_coeffs (sequence of float) – Weights of the dictionary prediction loss, the sparse loss, the regularization loss, and the direct prediction loss
har_num (int) – Number of harmonics to identify
num_guesses (sequence of int) – Number of samples per tone
spl (float) – Discount factor for the sparsity
batch_size (int) – Batch size for training
batch_size_pred (int) – Batch size for prediction
virt_batch_mul (int) – Virtual batch multiplier
stepsize_net (float) – Learning rate for training the neural network
stepsize_dict (float) – Learning rate for training the dictionary
tau (float) – Exponent to control exploration
sampdist (int) – Time interval of the spectrogram
sub_factor (int) – Factor by which to subsample the spectrogram for resynthesis
sigmas_an (float) – Number of standard deviations at which the analysis window is cut
plot_color (string or NoneType) – Whether to make a color plot
save_points (sequence of int) – Iterations at which to save the output
init_dict (tensor of float) – Dictionary with the shape [instruments, harmonics]
- add_gradient_dict(gradient_dict)[source]¶
Add gradients related to the dictionary.
- Parameters
gradient_dict (tensor of float) – Gradient with respect to the dictionary
- add_gradient_scales(gradient_scales)[source]¶
Add gradients related to the output scalings.
- Parameters
gradient_dict (tensor of float) – Gradient with respect to the output scalings
- add_gradients_model_cnn(gradients_model_cnn)[source]¶
Add gradients related to the CNN model
- Parameters
gradients_model_cnn (sequence of tensor of float) – Gradients with respect to the model
- load(module)[source]¶
Load dictionary, output scalings, and the model from a module.
- moduletf.Module
Module with the saved data
- make_model(input_sizes)[source]¶
Construct a Keras model for the parameter prediction
- Parameters
input_sizes (int) – Total number of input channels for the network
- Returns
- Return type
The neural network as a Keras model
- predict_loop(k, write)[source]¶
Predict and resynthesize the entire spectrogram.
- Parameters
k (int) – Iteration number
write (bool) – Whether to save the output to files
- predict_mix_spect(mix_spect_in)[source]¶
Predict the separation of a mixture spectrum and compute the losses.
- Parameters
mix_spect_in (tensor of float) – Mixture spectrogram to perform the separation on
- Returns
spects (SimpleNamespace) – Spectra related to the mixture
losses (SimpleNamespace) – Losses related to the mixture
- separate(mix_spect_in, tau, batch_size, predict=False)[source]¶
Identify all the parameters for the tones in the spectrum.
- Parameters
mix_spect_in (tensor of float) – Mixture spectrogram to perform the separation on
tau (float) – Exponent to control exploration
batch_size (int) – Batch size for training
predict (bool) – Whether to go into prediction mode instead of training
Results –
------- –
spects (SimpleNamespace) – Spectra related to the mixture
losses (SimpleNamespace) – Losses related to the mixture
params_d (ParamsDict) – Parameters related to the mixture
spectvis_d (SpectvisDict) – Visualization spectra related to the mixture
- separate_inst(model, model_in, training, batch_factor, fan_factor, num_guesses, inst_mask, tau, batch_size)[source]¶
Identify the parameters for one tone in a given spectrum
- Parameters
model (tf.Keras.Model) – Definition of the neural network
model_in (tensor of float) – Input channels that the neural network receives
training (bool) – Whether to perform training
batch_factor (int) – Product of the number of samples for previous tones
fan_factor (int) – Product of the number of samples for current and future tones
num_guesses (int) – Number of samples for the current tone
inst_mask (tensor of int) – 1 for instruments that have already played a tone, 0 otherwise
tau (float) – Exponent to control exploration
batch_size (int) – Batch size for training
- Returns
params_tone (SimpleNamespace) – All parameters relating to an identified tone
params_spect (SimpleNamespace) – Unsampled parameters in the dimensionality of the spectrum
har_spects_raw (tensor of float) – Spectra of the harmonics
- train_dict_norm(writer, k)[source]¶
Train the dictionary such that the largest entry for each instrument gets to 1.
- Parameters
writer (SummaryWriter) – Writer object to capture the summarized variables
k (int) – Iteration number
- Returns
inst_dict_norm – Dictionary norm loss
- Return type
tensor of float
- train_loop(max_iter, eval_interval, interval=50)[source]¶
Train the neural network. Predict and resynthesize the entire spectrogram.
- Parameters
max_iter (int) – Total number of training iterations
eval_interval (int) – Interval at which to evaluate the entire spectrogram
interval (bool) – Interval at which to output debug information
- train_mix_spect(mix_spect_in, writer, k)[source]¶
Train the separation of a mixture spectrum and compute the losses.
- Parameters
mix_spect_in (tensor of float) – Mixture spectrogram to perform the separation on
writer (SummaryWriter) – Writer object to capture the summarized variables
k (int) – Iteration number
- Returns
spects (SimpleNamespace) – Spectra related to the mixture
losses (SimpleNamespace) – Losses related to the mixture
params (ParamsDict) – Parameters related to the mixture
spectvis (SpectvisDict) – Visualization spectra related to the mixture
- musisep.neuralsep.trainsep.add_linspace(x)[source]¶
Add a linear range layer to a CNN tensor
- Parameters
x (tensor of float) – Layer to add the linear range to
- Returns
model_in – Input layer with the linear range added
- Return type
tensor of float
- musisep.neuralsep.trainsep.comp_total_loss(losses, on_factors, spl, loss_coeffs)[source]¶
Compute the linear combination of losses
- Parameters
losses (SimpleNamespace) – The individual loss values
on_factors (tensor of int) – Binary indicator if a tone contributes to the sparse prediction
spl (float) – Discount factor for the sparsity
loss_coeffs (array_like) – Linear weights for the loss terms
- Returns
total_loss (tensor of float) – Computed total loss
discounts (tensor of float) – Sparsity discount factors
- musisep.neuralsep.trainsep.complex_abs(spect, axis)[source]¶
Compute the absolute value of a complex tensor.
- Parameters
spect (tensor of float) – Real-valued tensor with a complex axis
axis (int) – Complex axis
- Returns
- Return type
Absolute value of the input tensor (with complex axis shrunken to size 1)
- musisep.neuralsep.trainsep.complex_arg(spect, axis, bias=1e-20)[source]¶
Compute the argument of a complex tensor.
- Parameters
spect (tensor of float) – Real-valued tensor with a complex axis
axis (int) – Complex axis
bias (float) – Offset to avoid division by 0
- Returns
- Return type
Tensor normalized to an absolute value of 1
- musisep.neuralsep.trainsep.gamma_probs(spreads, spreads_a, spreads_b)[source]¶
Evaluate the gamma distribution
- Parameters
spreads (tensor of float) – Values where to evaluate
spreads_a (tensor of float) – “alpha” parameter of the distribution
spreads_b (tensor of float) – “beta” parameter of the distribution
- Returns
- Return type
Log probabilities
- musisep.neuralsep.trainsep.gauss(x, mean, stdev)[source]¶
Evaluate the Gaussian function.
- Parameters
x (tensor of float) – Points of evaluation
mean (tensor of float) – Mean value(s)
stdev (tensor of float) – Standard deviation(s)
- Returns
spect – Values of the Gaussian
- Return type
tensor of float
- musisep.neuralsep.trainsep.inst_scale(params, insts, inst_dict, har_coeffs, spectheight, sigmas_an)[source]¶
Evaluate the linear-frequency spectra for tones.
- Parameters
params (tensor of float) – Continous parameters for the tones, stacked along axis 2
insts (int) – Indices of the instruments playing the tones
inst_dict (tensor of float) – Dictionary with the shape [instruments, harmonics]
har_coeffs (tensor of float) – Relative complexe amplitudes of the harmonics
spectheight (int) – Size of the output spectrum
sigmas_an (float) – Number of standard deviations at which the analysis window is cut
- Returns
spect – Spectra of the tones
- Return type
tensor of float
- musisep.neuralsep.trainsep.inst_scale_post(amps, insts, inst_dict, inst_num, har_coeffs, har_spect)[source]¶
Combine the spectra of individual harmonics into tone spectra.
- Parameters
amps (tensor of float) – Amplitudes of the tones
insts (int) – Indices of the instruments playing the tones
inst_dict (tensor of float) – Dictionary with the shape [instruments, harmonics]
inst_num (int) – Number of instruments available
har_coeffs (tensor of float) – Relative complexe amplitudes of the harmonics
har_spect (tensor of float) – Spectra of the individual harmonics (without amplitudes)
- Returns
spect – Spectra of the tones
- Return type
tensor of float
- musisep.neuralsep.trainsep.inst_scale_raw(params, har_num, spectheight, sigmas_an)[source]¶
Evaluate the linear-frequency spectra for the harmonics of tones, disregarding the amplitudes.
- Parameters
params (tensor of float) – Continous parameters for the tones, stacked along axis 2
har_num (int) – Number of harmonics to evaluate
spectheight (int) – Size of the output spectrum
sigmas_an (float) – Number of standard deviations at which the analysis window is cut
- Returns
spect – Spectra of the harmonics
- Return type
tensor of float
- musisep.neuralsep.trainsep.lift_cmplx(x, axis, shift=1e-07, qexp=0.5)[source]¶
Lift a complex-valued spectrum via a concave power function.
- Parameters
x (tensor of float) – Spectrum
shift (float) – Additive constant to keep the transform differentiable
qexp (float) – Exponent of the power function
- Returns
- Return type
The lifted spectrum
- musisep.neuralsep.trainsep.lift_spect(x, shift=1e-07, qexp=0.5)[source]¶
Lift a positive-valued spectrum via a concave power function.
- Parameters
x (tensor of float) – Spectrum
shift (float) – Additive constant to keep the transform differentiable
qexp (float) – Exponent of the power function
- Returns
- Return type
The lifted spectrum
- musisep.neuralsep.trainsep.lift_spect_sign(x, shift=1e-07, qexp=0.5)[source]¶
Lift a real-valued spectrum via a concave power function.
- Parameters
x (tensor of float) – Spectrum
shift (float) – Additive constant to keep the transform differentiable
qexp (float) – Exponent of the power function
- Returns
- Return type
The lifted spectrum
- musisep.neuralsep.trainsep.lifted_l2_abs(x, y, axis)[source]¶
Radially symmetric lifted l2 distance between two spectra.
- Parameters
x (tensor of float) – First spectrum
y (tensor of float) – Second spectrum
axis (int) – Complex axis
- Returns
- Return type
l2 loss
- musisep.neuralsep.trainsep.lifted_l2_cmplx(x, y, axis)[source]¶
Radially symmetric lifted l2 distance between two spectra.
- Parameters
x (tensor of float) – First spectrum
y (tensor of float) – Second spectrum
axis (int) – Complex axis
- Returns
- Return type
l2 loss
- musisep.neuralsep.trainsep.lsq_stock(har_spects, samp)[source]¶
Solve a regularized least-squares system.
- Parameters
har_spects (tensor of float) – Spectra of the individual harmonics (without amplitudes)
samp (tensor of float) – Direct prediction
- Returns
- Return type
Phase values for the harmonics
- musisep.neuralsep.trainsep.mix_inst_spects(inst_spects, axis)[source]¶
Combine the spectra for multiple instruments, dropping the summation axis.
- Parameters
inst_spects (tensor of float) – Spectra for the individual instruments
axis (int) – Summation axis
- Returns
- Return type
Mixture spectrum
- musisep.neuralsep.trainsep.norm_pdf(pdf)[source]¶
Normalize a categorical distribution batch-wise via softmax.
- Parameters
pdf (tensor of float) – Log probabilies of [batch, insts, scales]
- Returns
- Return type
Normalized log probabilities
- musisep.neuralsep.trainsep.plot_spectrum(filename, spectheight, *spects)[source]¶
Plot real-valued spectra to a file
- Parameters
filename (string) – Name of the file to save the figure to
spectheight (int) – Size of the output spectra
spects (sequence of array_like of float) – Spectra to plot
- musisep.neuralsep.trainsep.sample_multi(pdf)[source]¶
Sample batch-wise from a categorical distribution.
- Parameters
pdf (tensor of float) – Log probabilies of [batch, insts, scales]
- Returns
insts (tensor of int) – Indices of the sampled instruments
scales (tensor of int) – Discrete sampled frequencies
- musisep.neuralsep.trainsep.sample_multi_max(pdf)[source]¶
Pick the mode batch-wise from a categorical distribution.
- Parameters
pdf (tensor of float) – Log probabilies of [batch, insts, scales]
- Returns
insts (tensor of int) – Indices of the selected instruments
scales (tensor of int) – Discrete selected frequencies
- musisep.neuralsep.trainsep.trans_params(amps, scales, sigmas, spreads, sigmas_an)[source]¶
Apply transformations on instrument parameters to ensure their validity.
- Parameters
amps (tensor of float) – Amplitudes of the tones
scales (tensor of float) – Natural fundamental frequencies of the tones
sigmas (tensor of float) – Widths of the Gaussians
spreads (tensor of float) – Inharmonicities of the tones
- Returns
amps (tensor of float) – Amplitudes of the tones
scales (tensor of float) – Natural fundamental frequencies of the tones
sigmas (tensor of float) – Widths of the Gaussians
spreads (tensor of float) – Inharmonicities of the tones