The way it gets applied is each input channel is multiplied with a corresponding kernel channel.
Then the results of these individual computations are just plainly added and we get the final (1 x h x w) or (1 x fraction of h x fraction of w) resultant feature map.
This paper also talks about dilation convolutions which is basically IMHO convolution with holes.
What this allows a neural network is increased receptivity so even the shallower layers are looking at a larger portion of the image plus we get dimensionality redx
8/n
Which means no need of pooling or taking larger strides. This was indeed a new way of thinking about convolutions and I hope to try it out in the near future.