Yesterday @ #MadeByGoogle we also announced the latest on the Super Res Zoom feature on #Pixel7Pro
It's a project my team's been involved in since '18. This year, our teams've made it so much more powerful. You can zoom up to 30x. Let me show you in a 🧵
And finally we're at 30x hybrid optical/digital zoom, seeing the top of One World Trade Center miles away.
I hope this gives you an idea what the zoom experience looks like on #Pixel7Pro. You can see the whole sequence in an album: photos.app.goo.gl/SoV1s7EU6c7APb…
PC: Alex Schiffhauer
• • •
Missing some Tweet in this thread? You can try to
force a refresh
Integral geometry is a beautiful topic bridging geometry, probability & statistics
Say you have a curve with any shape, possibly even self-intersecting. How can you measure its length?
This has many applications - curve could be a strand of DNA or a twisted length of wire
1/n
A curve is a collection of tiny segments. Measure each segment & sum. You can go further: make the segments so small they are essentially points, count the red points
A practical way to do this: drop many lines, or a dense grid, intersecting the shape & count intersections
2/n
Curve's length is the sum of intersections n(ρ,θ) of all lines (in polar coords) with the curve (counting multiplicities). This is the beautiful Crofton formula:
Length = 1/2 ∫∫ n(ψ,p) dψ dp
The 1/2 is there because oriented lines are a double cover of un-oriented lines
Smoothing splines fit function to data as the sol'n of a regularized least-squares optimization problem.
But it’s also possible to do it in one shot with an unusually shaped kernel (see figure)
Is it possible to solve other optimization problems this way? Surprisingly yes
1/n
This is just one instance of how one can “kernelize” an optimization problem. That is, approximate the solution of an optimization problem in just one-step by constructing and applying a kernel once to the input
Given some conditions you can it do much more generally
2/n
If you specialize the regularization to be of the form
φ(x) = ρ( ||Ax|| ) where A= R(|i-j|) is a stationary & isotropic, this gives tidy conversions between φ(x) and the kernel K(x).
Mean-shift iteratively moves points towards regions of higher density. It does so by placing a kernel at each data point, calculating the mean of the data points within that window, shifting points towards this mean until convergence: Look familiar?
1/n (Animation @gabrielpeyre)
The first term on the right hand side of the ODE has the form of a pseudo-linear denoiser f(x) = W(x) x. A weighted average of the points where the weights depend on the data. The overall mean-shift process is a lot like a residual flow:
d/dt x(t) = f(x(t)) - x(t)
2/n
Residual on the RHS is an approximation of the “score” -the gradient of the empirical density of x making it a gradient flow
d/dt x(t) ≈ ∇ log p̂(x(t))
So mean-shift a) estimates the empirical density & b) flows points to nearby peaks. Similarly to flow-matching & InDI
3/n
Random matrices are very important in modern statistics and machine learning, not to mention physics
A model about which much less is known is uniformly sampled matrices from the set of doubly stochastic matrices: Uniformly Distributed Stochastic Matrices
A thread -
1/n
First, what are doubly stochastic matrices?
Non-negative matrices whose row & column sums=1.
The set of doubly stochastic matrices is also known as the Birkhoff polytope: an (n−1)² dimensional convex polytope in ℝⁿˣⁿ with extreme points being permutation matrices.
2/n
The extreme points of the Birkhoff polytope (permutations) are sparse matrices, but a typical matrix sampled from inside the polytope is by contrast, very dense
Since rows and columns are exchangeable, the entries of a sampled matrix have the same marginal distribution.
can teach a lot about some complex ideas in modern machine learning including overfitting & double-descent.
Let's assume A is n-by-p. So we have n data points and p parameters
1/10
If n ≥ p (“under-fitting” or “over-determined" case) the solution is
x̃ = (AᵀA)⁻¹ Aᵀ y
But if n < p (“over-fitting” or “under-determined” case), there are infinitely many solutions that give *zero* training error. We pick min‖x‖² norm solution:
x̃ = Aᵀ(AAᵀ)⁻¹ y
2/10
In either case, the solution can be compactly written in terms of the SVD of A:
A = USVᵀ
where U & V are orthogonal matrices of size nxn & pxp, and S is nxp & contains i = 1 to k nonzero diag elements