Ravi Kiran S Profile picture
Sep 8, 2021 23 tweets 10 min read Read on X
Presenting BoundaryNet - a resizing-free approach for high-precision weakly supervised document layout parsing. BoundaryNet will be an ORAL presentation (Oral Session 3) today at @icdar2021 . Project page: ihdia.iiit.ac.in/BoundaryNet/ . Details 👇 Image
Precise boundary annotations can be crucial for downstream applications which rely on region-class semantics. Some document collections contain irregular and overlapping region instances. Fully automatic approaches require resizing and often produce suboptimal parsing results. Image
Our semi-automatic approach takes region bounding box as input and predicts boundary polygon as output. Importantly, BoundaryNet can handle variable sized images without any need for resizing. Image
In the first stage, the variable-sized input image is processed by an attention-based fully convolutional network to obtain a region mask and a class label. Image
The first part of backbone contains a series of residual blocks to obtain progressively refined feature representations. The second part contains Skip Attentional Guidance blocks. Each block produces increasingly compressed feature representations of its input. Image
Output from immediate earlier SAG block is fused with skip features originating from a lower-level residual block layer. This fusion is modulated via attention mechanism. No spatial downsampling/upsampling is done. Doing so enables crucial boundary information to be preserved. Image
Features from the last residual block are fed to ‘Region Classifier’ sub-network which predicts the associated region class. The adaptive average pooling block within the sub-network ensures a fixed-dimensional output despite varying input dimensions. Image
The final set of features generated by skip-connection based attentional guidance are provided to the ‘Mask Decoder’ network which outputs a region mask binary map. Image
A fast marching distance map is used to guide the region mask optimization to be more boundary aware. The map is used along with per-pixel class weighted binary focal loss to improve robustness. Image
Overall, our choices within the MCNN encourage the generation of good initial boundary estimates. Such estimates decrease the task complexity for subsequent boundary refinement task. Image
A series of morphological operations are applied to the region mask output by MCNN above, to obtain the initial estimate of boundary polygon. ImageImage
Our Anchor Graph Convolutional Network processes the boundary estimate points as a graph and iteratively refines the point locations. Image
The nodes in the graph that is input to Anchor GCN consist of sampled points on the mask contour. The 2D position and the appropriate skip attention backbone feature are used as node features. Each node is connected to is 10-hop mask contour neighbours. Image
The boundary estimates are refined via two GCN and six Res-GCN blocks. The fully connected layer at end predicts shifts in x-y locations of initial boundary points. Image
We use a boundary-centric Hausdorff distance to optimize Anchor GCN parameters. We also use Hausdorff Distance as the evaluation metric. Image
We source the region data and annotations from Indiscapes - a large-scale #historical #Manuscripts dataset. Image
BoundaryNet outperforms strong semi-automatic baselines. In particular, it has extremely good performance for the most common region class - Character Line Segment. Image
A visual illustration of BoundaryNet’s superior quality boundaries compared to other baseline approaches. Image
BoundaryNet’s boundary predictions are more accurate than fully automatic methods. In particular, note that BoundaryNet predictions enclose text lines in a proper and complete manner. ImageImageImage
BoundaryNet works for non-Indic documents too. Here are outputs on #Arabic , #Hebrew , #SoutheastAsia #historical #Manuscripts . ImageImage
When deployed for annotation, timing analysis reveals BoundaryNet reduces overall annotation time, including correction time. BoundaryNet’s effective annotation time is smaller than even fully automatic approaches due to high quality of boundaries generated by our approach. Image
Code, pre-trained models and an interactive viewer for data and predictions are available at ihdia.iiit.ac.in/BoundaryNet/ Image
BoundaryNet was possible due to efforts of @Abhishe53242750 👏👏 .

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Ravi Kiran S

Ravi Kiran S Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @vikataravi

Aug 20, 2021
📢 In our #ACMMM21 paper, we highlight issues with training and evaluation of 𝗰𝗿𝗼𝘄𝗱 𝗰𝗼𝘂𝗻𝘁𝗶𝗻𝗴 deep networks. 🧵👇
For far too long, 𝗰𝗿𝗼𝘄𝗱 𝗰𝗼𝘂𝗻𝘁𝗶𝗻𝗴 works in #CVPR, #AAAI, #ICCV, #NeurIPS have reported only MAE, but not standard deviation.
Looking at MAE and standard deviation from MAE, a very grim picture emerges. E.g. Imagine a SOTA net with MAE 71.7 but deviation is a whopping 376.4 !
Read 17 tweets
Feb 3, 2021
📢 Introducing SynSE, a language-guided approach for generalized zero shot learning of pose-based action representations! Great effort by @bublaasaur and @divyanshu1709 #actionrecognition

Paper: arxiv.org/abs/2101.11530…
Code: github.com/skelemoa/synse…

🧵👇 Image
For enabling compositional generalization to novel action-object combinations, the action description is transformed into individual Part-of-Speech based embeddings. Image
The PoS-based embeddings are aligned with action sequence embedding via a VAE-based generative space. This alignment is optimized using within and cross modality constraints. Image
Read 7 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us!

:(