I'm trying to implement a 1D CNN in Keras that will extract features from a spectrogram, as described in this paper: (p.170, section "1D Convolution for Audio Representation".
The goal is just for the spectrogram to "pass through" the CNN, which will "gradually aggregate" the 2D spectrogram to a 1D vector. (Later, this output is fused with features from a 2nd model and finally fed to an LSTM, but I'm just doing the CNN right now). There's no classification being performed; I'm just extracting the spectrogram features.
My question is: where do I stop the process in order to achieve this?
So far, I've built (a smaller version) of the CNN, as they described:
model = Sequential() # Block 1 model.add(Conv1D(filters=32, kernel_size=5, input_shape=(timesteps, features), padding="same")) model.add(BatchNormalization()) model.add(MaxPooling1D(pool_size=8)) # Block 2 model.add(Conv1D(filters=64, kernel_size=5)) model.add(BatchNormalization()) model.add(MaxPooling1D(pool_size=8)) # Get 1D vector model.add(Flatten()) # What do I call next...? But from here, I don't know how to proceed. Specifically: do I call model.predict(spectrogram) directly, or do I need to .compile() and .fit() it first?
没有评论:
发表评论