ACM #Multimedia 2021: Skeleton-Contrastive 3D Action Representation Learning w/ @fmthoker, @doughty_hazel: arxiv.org/abs/2108.03656 We learn invariances to multiple #skeleton representations and introduce various skeleton augmentations via noise contrastive estimation 1/n
Contribution I: leverage multiple input-representations of #3D-#skeleton sequences. Our inter-skeleton contrast learns from a pair of representations in a cross-contrastive fashion. Enriches the sparse input space and focuses on the high-level semantics of the skeleton data. 2/n
Contribution II: three skeleton-specific #augmentations for generating positive pairs which encourage the model to focus on the spatio- temporal #dynamics of skeleton-based @action sequences, ignoring confounding factors such as viewpoint and exact joint positions. 3/n
LiftPool adopts the philosophy of the classical #Lifting#Scheme from #signal#processing. LiftDownPool decomposes a feature map into various downsized sub-bands, each of which contains information with different frequencies. Because of its invertible properties, ... 2/n
by performing LiftDownPool backwards, a corresponding up-pooling layer #LiftUpPool is able to generate a refined upsampled feature map using the detail sub-bands, which is useful for #image-#to-#image#translation challenges. 3/n