Ravi Kiran S Profile picture
Aug 20, 2021 β€’ 17 tweets β€’ 7 min read β€’ Read on X
πŸ“’ In our #ACMMM21 paper, we highlight issues with training and evaluation of π—°π—Ώπ—Όπ˜„π—± π—°π—Όπ˜‚π—»π˜π—Άπ—»π—΄ deep networks. πŸ§΅πŸ‘‡
For far too long, π—°π—Ώπ—Όπ˜„π—± π—°π—Όπ˜‚π—»π˜π—Άπ—»π—΄ works in #CVPR, #AAAI, #ICCV, #NeurIPS have reported only MAE, but not standard deviation.
Looking at MAE and standard deviation from MAE, a very grim picture emerges. E.g. Imagine a SOTA net with MAE 71.7 but deviation is a whopping 376.4 !
How do we address this ? There is no easy answer. The problem lies all over the processing pipeline ! The standard pipeline for π—°π—Ώπ—Όπ˜„π—± π—°π—Όπ˜‚π—»π˜π—Άπ—»π—΄ looks like πŸ‘‡
ISSUE-1:Standard sampling procedure for creating train-validation-test splits implicitly assumes uniform distribution over target range. But benchmark dataset distribution of crowd counts is discontinuous and heavy-tailed. Uniform sampling causes tail to be underrepresented.
The problem is that sampling being done is too fine a resolution, i.e. individual counts.

OUR FIX: Coarsen the resolution. Partition the count range into bins optimal for uniform sampling.
We employ a Bayesian stratification approach to obtain bins which can be uniformly sampled from, for minibatching.
ISSUE-2: Minimizing per-instance loss averaged over minibatch poses same issues as those during minibatch creation (imbalance, bias). OUR FIX: A novel bin sensitive loss function. Instead of loss depending only on error, we also consider count bin to which data sample belongs
ISSUE-3: The imbalanced data distribution also causes MSE to be an ineffective representative of performance across the entire test set.
OUR FIX: Instead of using a single pair of numbers (mean, standard deviation) to characterize performance across the *entire* count range, we suggest that reporting them for each bin. This provides a much broader idea of performance across count range.
If a single summary statistic is still desired, mean and standard deviation of bin-level performance measures can be combined in a bin-aware manner.
Bin-level results demonstrate that our proposed modifications reduce error standard deviation in a noticeable manner. The comparatively large deviations when binning is not used, can clearly be seen.
However, the large magnitudes of deviations relative to MAE are still a big concern.
Studying and addressing issues we have raised would enable statistically reliable π—°π—Ώπ—Όπ˜„π—± π—°π—Όπ˜‚π—»π˜π—Άπ—»π—΄ approaches in future. Our project page deepcount.iiit.ac.in contains interactive visualizations for examining results on a per-dataset and per-model basis.
Code and pretrained models can be found at github.com/atmacvit/bincr…
Our crowd counting paper can be read here
... and this work is a happy collaboration with @ganramkr πŸ˜€

β€’ β€’ β€’

Missing some Tweet in this thread? You can try to force a refresh
γ€€

Keep Current with Ravi Kiran S

Ravi Kiran S Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @vikataravi

Sep 8, 2021
Presenting BoundaryNet - a resizing-free approach for high-precision weakly supervised document layout parsing. BoundaryNet will be an ORAL presentation (Oral Session 3) today at @icdar2021 . Project page: ihdia.iiit.ac.in/BoundaryNet/ . Details πŸ‘‡ Image
Precise boundary annotations can be crucial for downstream applications which rely on region-class semantics. Some document collections contain irregular and overlapping region instances. Fully automatic approaches require resizing and often produce suboptimal parsing results. Image
Our semi-automatic approach takes region bounding box as input and predicts boundary polygon as output. Importantly, BoundaryNet can handle variable sized images without any need for resizing. Image
Read 23 tweets
Feb 3, 2021
πŸ“’ Introducing SynSE, a language-guided approach for generalized zero shot learning of pose-based action representations! Great effort by @bublaasaur and @divyanshu1709 #actionrecognition

Paper: arxiv.org/abs/2101.11530…
Code: github.com/skelemoa/synse…

πŸ§΅πŸ‘‡ Image
For enabling compositional generalization to novel action-object combinations, the action description is transformed into individual Part-of-Speech based embeddings. Image
The PoS-based embeddings are aligned with action sequence embedding via a VAE-based generative space. This alignment is optimized using within and cross modality constraints. Image
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(