Discover and read the best of Twitter Threads about #FakeData

Most recents (2)

All right folks, quick stats lesson. (Thread)

There's a meme floating because of this article 👇 that "China must be faking their Coronavirus data because you never see R^2 = 0.99 with real data".

barrons.com/articles/china…
If you run a regression of cumulative deaths reported in China in the first half of February, and look at the residuals, you get a magic parabola like this: Image
"Oh my gosh," you say to yourself. "FAKE DATA!!!" So you add in a quadratic term to the regression, and get R^2 = 0.9999 or something like that. You've cracked the code! It's a smoke signal from a Chinese researcher! Time to call Barron's!
Read 15 tweets
TIL there's lots of ways of #seeding #fakedata into a #SQL #database. You can either

A. Seed it with purely random data all the way
B. Seed it with realistic data
C. Seed it initially with random data, but tie that data into other tables realistically
When you seed a database, you start off with enum (that's like a dictionary that doesn't change in size). If you look at a database schema, it's the tables that are only connected one time.

Then you focus on tables that have only have 1 foreign key tied, tied to an enum
Example with a basic HR database schema. Start seeding the outlier tables first, starting with the bottom-left most and bottom-right most tables, followed by bottom-middle.
Read 11 tweets

Related hashtags

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3.00/month or $30.00/year) and get exclusive features!

Become Premium

Too expensive? Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal Become our Patreon

Thank you for your support!