Post

How to get URL link on X (Twitter) App

On the Twitter thread, click on or icon on the bottom
Click again on or Share Via icon
Click on Copy Link to Tweet
Paste it above and click "Unroll Thread"!
More info at Twitter Help

Ravi Kiran S

@vikataravi

Sep 8, 2021 • 23 tweets • 10 min read • Read on X

Scrolly

@icdar2021

Presenting BoundaryNet - a resizing-free approach for high-precision weakly supervised document layout parsing. BoundaryNet will be an ORAL presentation (Oral Session 3) today at @icdar2021 . Project page: ihdia.iiit.ac.in/BoundaryNet/ . Details 👇

Precise boundary annotations can be crucial for downstream applications which rely on region-class semantics. Some document collections contain irregular and overlapping region instances. Fully automatic approaches require resizing and often produce suboptimal parsing results.

Our semi-automatic approach takes region bounding box as input and predicts boundary polygon as output. Importantly, BoundaryNet can handle variable sized images without any need for resizing.

In the first stage, the variable-sized input image is processed by an attention-based fully convolutional network to obtain a region mask and a class label.

The first part of backbone contains a series of residual blocks to obtain progressively refined feature representations. The second part contains Skip Attentional Guidance blocks. Each block produces increasingly compressed feature representations of its input.

Output from immediate earlier SAG block is fused with skip features originating from a lower-level residual block layer. This fusion is modulated via attention mechanism. No spatial downsampling/upsampling is done. Doing so enables crucial boundary information to be preserved.

Features from the last residual block are fed to ‘Region Classifier’ sub-network which predicts the associated region class. The adaptive average pooling block within the sub-network ensures a fixed-dimensional output despite varying input dimensions.

The final set of features generated by skip-connection based attentional guidance are provided to the ‘Mask Decoder’ network which outputs a region mask binary map.

A fast marching distance map is used to guide the region mask optimization to be more boundary aware. The map is used along with per-pixel class weighted binary focal loss to improve robustness.

Overall, our choices within the MCNN encourage the generation of good initial boundary estimates. Such estimates decrease the task complexity for subsequent boundary refinement task.

A series of morphological operations are applied to the region mask output by MCNN above, to obtain the initial estimate of boundary polygon.

Our Anchor Graph Convolutional Network processes the boundary estimate points as a graph and iteratively refines the point locations.

The nodes in the graph that is input to Anchor GCN consist of sampled points on the mask contour. The 2D position and the appropriate skip attention backbone feature are used as node features. Each node is connected to is 10-hop mask contour neighbours.

The boundary estimates are refined via two GCN and six Res-GCN blocks. The fully connected layer at end predicts shifts in x-y locations of initial boundary points.

We use a boundary-centric Hausdorff distance to optimize Anchor GCN parameters. We also use Hausdorff Distance as the evaluation metric.

We source the region data and annotations from Indiscapes - a large-scale #historical #Manuscripts dataset.

BoundaryNet outperforms strong semi-automatic baselines. In particular, it has extremely good performance for the most common region class - Character Line Segment.

A visual illustration of BoundaryNet’s superior quality boundaries compared to other baseline approaches.

BoundaryNet’s boundary predictions are more accurate than fully automatic methods. In particular, note that BoundaryNet predictions enclose text lines in a proper and complete manner.

BoundaryNet works for non-Indic documents too. Here are outputs on #Arabic , #Hebrew , #SoutheastAsia #historical #Manuscripts .

When deployed for annotation, timing analysis reveals BoundaryNet reduces overall annotation time, including correction time. BoundaryNet’s effective annotation time is smaller than even fully automatic approaches due to high quality of boundaries generated by our approach.

Code, pre-trained models and an interactive viewer for data and predictions are available at ihdia.iiit.ac.in/BoundaryNet/

@Abhishe53242750

BoundaryNet was possible due to efforts of @Abhishe53242750 👏👏 .

• • •

Missing some Tweet in this thread? You can try to force a refresh

This Thread may be Removed Anytime!

Twitter may remove this content at anytime! Save it as PDF for later use!

More from @vikataravi

Ravi Kiran S

@vikataravi

Mar 4

Sharada manuscripts are among the most difficult Indic texts for OCR. Their lines are often long, curved, and rich in diacritics.

We built CURIO, a curvature-aligned OCR to read them.

🧵👇

Why is OCR hard for Sharada manuscripts?

Lines are often:

• curved
• skewed
• very long
• densely decorated with diacritics

Standard OCR pipelines struggle with these geometries. CURIO rectifies line curvature before recognition.

How do we rectify manuscript lines?

We use:

• line polygons
• mid-line scribbles
A piecewise-linear rectifier straightens the text while preserving glyph details.

This produces compact line crops with far less background.

Read 9 tweets

Ravi Kiran S

@vikataravi

Aug 20, 2021

📢 In our #ACMMM21 paper, we highlight issues with training and evaluation of 𝗰𝗿𝗼𝘄𝗱 𝗰𝗼𝘂𝗻𝘁𝗶𝗻𝗴 deep networks. 🧵👇

For far too long, 𝗰𝗿𝗼𝘄𝗱 𝗰𝗼𝘂𝗻𝘁𝗶𝗻𝗴 works in #CVPR, #AAAI, #ICCV, #NeurIPS have reported only MAE, but not standard deviation.

Looking at MAE and standard deviation from MAE, a very grim picture emerges. E.g. Imagine a SOTA net with MAE 71.7 but deviation is a whopping 376.4 !

Read 17 tweets

Ravi Kiran S

@vikataravi

Feb 3, 2021

@bublaasaur

📢 Introducing SynSE, a language-guided approach for generalized zero shot learning of pose-based action representations! Great effort by @bublaasaur and @divyanshu1709 #actionrecognition

Paper: arxiv.org/abs/2101.11530…
Code: github.com/skelemoa/synse…

🧵👇

For enabling compositional generalization to novel action-object combinations, the action description is transformed into individual Part-of-Speech based embeddings.

The PoS-based embeddings are aligned with action sequence embedding via a VAE-based generative space. This alignment is optimized using within and cross modality constraints.

Read 7 tweets

Support us! We are indie developers!

This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Share this page!

Enter URL or ID to Unroll

Ravi Kiran S

Try unrolling a thread yourself!

More from @vikataravi

Ravi Kiran S

Ravi Kiran S

Ravi Kiran S

Did Thread Reader help you today?

Don't want to be a Premium member but still want to support us?

Send Email!