My Authors
Read all threads
what ties together machine learning, rational homotopy theory, rough paths, lie groups, topological data analysis, and how to teach undergraduate vector calculus?
a thread…
1/
this is a long, paper-laden thread that sets up a new paper with darrick lee, a ph.d. student at @Penn in applied mathematics. buckle up!
arxiv.org/abs/2007.06633
2/
our story begins in the 1990’s: t. lyons [oxford] develops a fantastic set of tools for working with *rough paths* in stochastic differential equations, launching a formidable set of theory and applications.
en.wikipedia.org/wiki/Rough_path
3/
one of the many tools developed is something called *path signature*. this characterizes paths in euclidean R^n by sending them to formal power series in tensors on R^n.
en.wikipedia.org/wiki/Rough_pat…
4/
this signature has some remarkable properties: it is both *universal* and *characteristic*. it is also *reparametrization invariant*, meaning that there’s something topological going on… hmmm…
#foreshadowing
5/
the path signature has recently found some exciting applications in machine learning: see the 2016 paper of i. chevyrev and a. kormilitzin.
arxiv.org/abs/1603.03788
/6
more recently, chevyrev-nanda-oberhauser show how to go from persistent homology barcodes to path signature. the resulting feature map has some very impressive performance.
arxiv.org/abs/1806.00381
/7

@viditnanda
plot twist! path signature has older roots! it stems from amazing work of k-t chen in the 1950s-60s on iterated integrals – a de rham approach to the cohomology of path and loop spaces. this had quite an impact in, e.g., rational homotopy theory.
arxiv.org/abs/math/01092…
/8
although lyons et al. were aware of (& cited) chen’s work, the full import of the algebraic-topological content of the path signature has not been brought over to applications. this has just recently begun: see the recent paper by giusti & lee.
arxiv.org/abs/1811.03558
/9
the giusti-lee paper is focused on inferring causality or lead-lag relations from data using chen integrals. i first learned this idea from y. baryshnikov who applied chen’s iterated integrals to time series data in a paper with e. schlafly.
ieeexplore.ieee.org/document/77984…
/10
when baryshnikov 1st explained this to me, i got very excited: i was looking for an application of differential forms & stokes’ theorem for the calculus course i was designing. we collaborated on a short sweet paper on how to teach this to undergrads.
tandfonline.com/doi/full/10.10…
/11
#sidequest
here’s the idea. given two time series that are *cyclic* (periodic up to time reparametrization), is one “ahead” of the other? cf. leading and lagging economic indicators in the business cycle. you can tell by looking at the “signed area” in the plane.
/12
this "signed area" is precisely chen’s 1st integral – the first order terms in path signature. what’s cool is that this works even if your signals are not perfectly cyclic or smooth. this is a great motivation for using differential forms.
/13
you can do this with N cyclic signals tracing out a curve in R^N. now it’s stokes’ theorem for differential forms that tells you about lead-lag relationships. all these 1st order signatures can be packed into a skew-symmetric “lead matrix” that has topological content.
/14
& that’s how a 1950’s era approach to rational homotopy theory winds up impacting machine learning and winds up in a calculus class.
what next?
/15
almost all signatures in machine learning are for paths in euclidean space. chen worked out the theory for paths on manifolds, but it’s hard to compute & work with. darrick lee & i have a new paper out on path signature for paths on lie groups.
arxiv.org/abs/2007.06633
/16
on the one hand: no surprises. everything works : it’s a universal, characteristic feature map.
but: in the euclidean case, there’s a lot of conflation of R^n (the space) with R^n (the tangent space); using lie groups & lie algebras makes everything clearer.
/17
in addition – this really matters – if your data naturally reside in a lie group, you should take advantage of that structure. ML is full of vectorize-it-all-and-let-g-d-sort-it-out. mathematicians know to preserve structure when possible. h/t to the category theorists…
/18
in the end, the setting of lie groups is very general, but close enough to the clean euclidean case to make computations nice. the paper includes numerical experiments showing efficacy and a link to darrick’s @JuliaLanguage package for computing signatures on lie groups.
/19
if you got through this, congrats! i hope you’ve learned enough to motivate reading of the many papers cited. there’s so much great stuff happening at the intersection of topology, data, and machine learning.
(& education, but that’s another story…)
/20=end
Missing some Tweet in this thread? You can try to force a refresh.

Keep Current with ProfGhristMath

Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

Twitter may remove this content at anytime, convert it as a PDF, save and print for later use!

Try unrolling a thread yourself!

how to unroll video

1) Follow Thread Reader App on Twitter so you can easily mention us!

2) Go to a Twitter thread (series of Tweets by the same owner) and mention us with a keyword "unroll" @threadreaderapp unroll

You can practice here first or read more on our help page!

Follow Us on Twitter!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!