๐˜›๐˜ฉ๐˜ช๐˜ด ๐˜ค๐˜ฐ๐˜ฏ๐˜ด๐˜ฐ๐˜ญ๐˜ช๐˜ฅ๐˜ข๐˜ต๐˜ช๐˜ฐ๐˜ฏ ๐˜ช๐˜ด ๐˜ญ๐˜ข๐˜ด๐˜ต๐˜ช๐˜ฏ๐˜จ ๐˜ข ๐˜ญ๐˜ฐ๐˜ฏ๐˜จ ๐˜ต๐˜ช๐˜ฎ๐˜ฆ ๐˜ฃ๐˜ถ๐˜ต ๐˜ฎ๐˜บ ๐˜ต๐˜ฉ๐˜ณ๐˜ฆ๐˜ข๐˜ฅ๐˜ด ๐˜ธ๐˜ช๐˜ญ๐˜ญ ๐˜ญ๐˜ข๐˜ด๐˜ต ๐˜ฎ๐˜ถ๐˜ค๐˜ฉ ๐˜ญ๐˜ฐ๐˜ฏ๐˜จ๐˜ฆ๐˜ณ.

แด›สœแด‡ ส™แด‡๊œฑแด› ษช๊œฑ ษชแดแดแด‡แด…ษชแด€แด›แด‡สŸส ษชษด ๊œฐส€แดษดแด› แด๊œฐ แดœ๊œฑ

$VXV 4/27 ENGINEERING CALL DEEP DIVE AND ANALYSIS. LETS BEGIN.
This call dove deep into the actual technical setup of the product, and was led by 'Bietto' and Kasian. I cannot find information on Bietto but he was quite intelligent. I fill perhaps make a thread on him when I can find more info. Let's begin.
'The current direction is we are
getting a new pipeline running. So, this pipeline will be an end-to-end pipeline that is going to crawl data,
process it, train a language model, create a correlation matrix and place it into an API that serves our
clients. So, the plan for this
is for this to become the codebase that we use to later on branch out to other use cases'

Pipelines are an important aspect of most tech companies. They allow for automated workflows, testing, etc. without manual intervention.
I'm a software engineer at a fortune 100 company. We use enterprise Jenkins. It looks like this:
You can see here that when a deployment occurs (Jenkins), the code will go through integration tests, then unit tests, then security tests, etc.

Each of these steps will have their own logs and potentially outputs, which can be used as inputs for next stage
What Bietto is referring to here is not a deployment pipeline, but 'an end-to-end pipeline that is going to crawl data, process it, train a language model, create a correlation matrix and place it into an API that serves our clients'
Think about that. Normally each of these steps are handled by seperate teams. One team crawls data and forms a dataset. Another team clusters the data and develops models for the particular data. Another team analyzes these models. All requiring manual intervention.
If the team can pull off this pipeline integration:

'the plan for this is for this to become the codebase that we use to later on branch out to other
use cases'

they will have a pipeline that can crawl any of their massive raw data sources, and without needing
manual intervention, feed it a few specific inputs to produce their final product- an api accessible correlation dataset ready to be consumed by clients.

That means that if a company sees a trend and wants to purchase a dataset... it can be ready in perhaps a few hours. F***!
Bietto mentions that they may need to do things manually for now, perhaps intermediary steps - THIS IS HOW I THOUGHT THE END PRODUCT WOULD WORK. The fact that they can create this product manually is SICK.

The fact they can automate the complete creation of the product is absurd
why?

Because manual intervention leads to impediments, is slow, and leads to errors.

๐™จ๐™ฅ๐™š๐™š๐™™ ๐™ž๐™จ ๐™ž๐™ข๐™ฅ๐™ค๐™ง๐™ฉ๐™–๐™ฃ๐™ฉ

This pipeline creation will allow them to check many more correlations much much faster.

Guess what?
The speed that the correlation is made is INCREDIBLE important.

If the team can generate 1000 datasets, vs 10 in the same time, we are 100 times more likely to generate a correlation matrix dataset that provides incredibly valuable alpha. 100x more likely to make a biosciences
discovery. Do not underestimate the path that the team is taking with this.

This team is small compared to my tech company. This is the type of innovation that we harp on again and again. Continuous integration and continuous delivery.

1 reason I was so impressed by Bietto.
They have a demo day tomorrow 5/4:

several cohorts of the engineering team seem to be working on different pieces of this massive pipeline which will eventually come together.
-The API must output a response
for a dummy correlation matrix that will load into the API

this should be fairly straightforward and the team agrees
-to have a crawling pipeline that can produce data for at least one source on demand

If Kasian needs an updated dataset, you should be able, or the data engineering team should be
able to produce the file right away

This is huge for data provenance and the service-
One of the main 'pros' of $vxv is the data provenance aspect that I commonly talk about. With this crawling pipeline in place, it can run on a cron job and constantly update the dataset -> store it on an immutable ledger.

bullish
- 'On the machine learning end, the goal for the demo day would be to produce an initial
correlation matrix generation module. So, it needs to be complete, it needs to be able to use a trained
language model to create a pickle file of the correlation matrix. ...'
'A second goal for machine learning is to
be able to train a new language model on a given dataset on demand.'

back to pipelines in next tweet. these goals are important.
Not only do we update the dataset right away with the pipeline. Then that output is used as the input for the ML pipeline, which retrains a language model based on the new dataset, and builds a correlation matrix.

ALL AUTOMATED. No wait times. Only limited by pipeline speed, and
generally these pipelines run on 'agents' which are just ec2s (or ecs, or google compute engines or other cloud compute services) - which means you can generally speed them up by purchasing more compute power.

The instantaneous aspect of this entire process is
extremely complex, which is a HUGE blocker to ML being used in finance. You need incredibly fresh, precise data if you're going to use it to trade, as the market changes quickly, sometimes in minutes - as you have seen with crypto. This automation is the only viable option.
Next they check in with the teams to see if this will be ready to demo (a more basic version) for tomorrow (5/4)

-universal crawler will be done โœ…
-model training will be packaged โœ…(basically means taking a chunk of code and making it into a small 'package' that is reusable)
-Bietto working on API server himself (should be done) โœ…

This is the end of the Bietto led portion of the call. I will be getting to Kasian's pieces - but - I do want to point out a few things.

of course, I'm at tweet limit. second half is coming. Hang tight frens โค๏ธ
Points from first half

- team is taking their time to build this product in an incredible way. The automative aspect is not something I thought much about- but it is HUGE for this becoming what it is capable of. Taking data and automating a complete consumable product..
is incredibly more valuable than having to crawl the data, produce a dataset, create a training set, create a cluster, put the result behind an api, etc. etc. etc.

These customers want speed and flexibilty. It sounds like this automative aspect is near MVP-minimum viable product
but we already know that the complete product works from their several case studies. Once this automative aspect is finished, it should be much easier to produce both case studies- and more importantly - new datasets on demand for large customers. I โค๏ธpipelines
Second important aspect-

I mentioned before the team was hiring rapidly and had many postitions. Almost all of the names mentioned on this portion of the call were new names, outside the team listed on the website. They are building fast, the demand is high, and they are
bringing in bright engineers to build more rapidly. The rate of expansion is a very good indicator for traditional startups, and it seems they are looking for more and more advanced engineers. Bietto was incredibly impressive.
On to Kasian's portion

'What weโ€™re going to solve for would be something very specific and it relates to predicting a relationship
between two objects-
' if a molecular biologist predicted a relationship between a protein and
an enzyme, we would want to build a language model that would come very close to that in predicting
that same relationship between that protein and that enzyme'
why is that what they're doing?

Because if a model can knowingly and precisely predict relationships that we KNOW to be true, then we can use it to produce UNKNOWN relationships, with similar precision- at incredible rates of speed. Much faster than a biologist.
With this capability, in biosciences we could test thousands of relationships to develop new reactions (treatments)

in finance we could test thousands of relationships between different assets and find hidden trends.

Sounds kinda familiar...
'So, our objective in creating language models for all knowledge domains and subdomains within space
biosciences is to power all space companies with AI, data engineering, ML, NLU.'

If they can build this (I think they've shown they can), they can correlate anything.
Kasian has stressed how gains in finance would help biosciences and vice versa - this has never been more clear.

As they continue to iterate, both the speed of detection and the accuracy of these detections will improve. They will literally be able to find hidden relationships
between any two objects or entities or ideas or what-have you, for which they have sufficient data.

AND I DON'T HAVE TO REMIND YOU ALL OF THE DATA THAT THEY HAVE BEEN PROCURRING
but

-S&P
-Morningstar
-Multiple NASA labs
-CERN
-etc. etc. (there are a lot of these, check my other threads)
A very fascinating look into NLU vs NLP here. Read this so you understand:

Now instead of these two sentences, consider that our data-sources are almost infinite. Millions of pages sometimes. And then understand that this will all be done, in a pipeline, insanely quick
Another very fascinating excerp, that kinda highlights what I said before about these capabilities:
Kasian also discusses how rev from biosciences will be much different from finance. Biosciences doesn't need .99 updates by the minutes. They don't rely on real time data. Instead, and discovery made by the dataset could be licensed as intellectual property.
billion dollar pharmaceutical companies would then have to pay large $$ for access to the correlation in order to use the license.

This could be another huge revenue stream. It only takes one discovery.
READ THIS.

This is what I was stressing before. Pipelines and the mass of data available will allow us to make ๐™ข๐™ž๐™ก๐™ก๐™ž๐™ค๐™ฃ๐™จ of correlations. Correlations that a normal scientist by hand would never even think about.

ML IS SO MUCH MORE POWERFUL THAN INDIVIDUALS
'And when we train on that data you can
ask yourself how much better are the financial models going to get, based on tuning in space biosciences'

'You can look at space
biosciences as your training set on a macro level'

WHAT THE FK
training sets train the algorithm to perform on an unknown testing set, as mentioned before.

What Kasian is saying here, is that the algorithms will be SO strengthened by these incredibly complex bioscience datasets, which will lead to incredibly precise financial correlations.
and he is right.

Their model's ability to go through thousands or millions of biological variables and make accurate predictions will strengthen the precision of the models greatly.

this company has NO FKN competition. NOONE IS DOING THIS. Interdisciplinary benefits are SICK
That's all my lovely friends.

The interdisciplinary nature, along with the automation and reusability the team is building into their pipelines, to put it simply, is absolutely mind-blowing to me.

Hardly anyone is using NLU in either domain, $vxv has been doing so forever.
Finance and biosciences are two of the largest markets in the world and they will work so beautifully together to strengthen the returns on each.

The team is bringing in incredible talent and is led by one of the smartest, most ambitious person I have ever researched
I have told you many times what is coming, and based on this call, the fully flushed out version might still be a while away- but they have been doing most of this successfully manually for many years now, and have studies to prove it.
But if they achieve what Kasian and team is envisioning, the singularity is a lot closer than I had imagined(jk kinda), but its reusable, flexible, NLU, that can be used to detect the TOUGHEST correlations imaginable. In these two markets. I can't think of a more valuable company
literally. Like I can't think of anything that would be more valuable than this. At all. Not cancer cure, not a new currency, like literally nothing. This is the pinnacle of what ML can be that I have seen.

In its existing form it is already incredibly damn valuable.
I think that's all frens.

Feel free to ask me questions. I'm pretty fired up.

I love you all - sit on your hands โค๏ธ

$vxv

โ€ข โ€ข โ€ข

Missing some Tweet in this thread? You can try to force a refresh
ใ€€

Keep Current with ๐Ÿ›ฐ DCrypto ๐Ÿ›ฐ

๐Ÿ›ฐ DCrypto ๐Ÿ›ฐ Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!

Follow Us on Twitter!