Denny Vrandečić Profile picture
Oct 29 52 tweets 15 min read
Today it's ten years since Wikidata had launched, October 29, 2012.

A few memories. 1/
It's been an amazing time. In the summer of 2011, people still didn't believe Wikidata would happen. In the fall of 2012, it was there. 2/
Markus Krötzsch @ma_kr and I pushed for Semantic Wikipedia since 2005. Semantic MediaWiki @SemanticMW was born from there, Freebase @fbase and DBpedia @dbpedia launched in 2007, microformats in Wikipedia became a grassroots thing. But not much at the @Wikimedia Foundation. 3/
With Elena Simperl @esimperl at KIT @KITKarlsruhe we started the EU research project RENDER @renderproject in 2010, involving Mathias Schindler @presroi at Wikimedia Deutschland @WikimediaDE . It was about knowledge diversity on the Web, still an incredibly important topic. 4/
In RENDER, we developed ideas for the flexible representation of knowledge, and how to deal with contradicting and incomplete information. We analysed Wikipedia to understand the necessity of these ideas. 5/
In 2010, I finished my PhD at KIT, and @yolandagil invited me to the ISI @USC_ISI at @USC for a half year sabbatical. There, Yolanda, Varun Ratnakar, Markus and I developed a Wikidata prototype. It got third place in the ISWC @iswc_conf Semantic Web Challenge that year. 6/
2011: the Wikimedia Data summit. Tim O'Reilly @timoreilly invites to O'Reilly HQ in Sebastopol, CA. Wikimedia Foundation, Freebase, DBpedia, Semantic MediaWiki, O'Reilly, Guha from Google, and I think, Mark Greaves from Vulcan, and others. 7/
I think the summit was where it became clear that Wikidata was indeed feasible. 8/
It's also where I first met Guha @rv_guha and where I admitted to him that I was kinda a fan boy. He invented MFC, RDF, had worked with Doug Lenat on CYC, and later that year introduced Schema dot org. He's now working on Data Commons. Check it out, it's awesome. 9/
Mark Greaves was then working for Paul Allen at Vulcan. They supported Semantic MediaWiki for years, and really wanted to make Wikidata happen. Mark knew my PhD was done and I was thinking about next steps. I thought academia. He suggested I should write a Wikidata proposal. 10/
After six years advocating for Wikidata, I understood that someone would need to step up. With the confidence and support of many people - Markus Krötzsch, Elena Simperl, Mark Greaves, Guha, Jamie Taylor, Rudi Studer, John Giannandrea, and others - I drafted the proposal. 11/
The Board of the Wikimedia Foundation approved the proposal as a new Wikimedia project, but didn't allocate funding or direct the Foundation to do it. In fact, the Foundation was reluctant to do it, unsure whether they could host such a project. Back then a wise decision. 12/
Erik Möller @xirzon , then CTO of the Foundation, was the driving force behind a major change: instead of turning the individual Wikipedias semantic, we would have a single Wikidata for all languages. Erik was also who had secured the domain for Wikidata. Many years prior. 13/
Over the next half year and with the help of the Wikimedia Foundation, we secured funding from AI2 (Paul Allen), Google (who had acquired Freebase in the meantime), and the Gordon and Betty Moore Foundation @MooreFound , 1.3 million. 14/
Other funders backed out because I insisted on the Wikidata ontology to be entirely under the control of the community. They'd argue to have professional ontologists, or reuse ontologies, or to use DBpedia to seed Wikidata. I said no. 15/
I firmly believed, and still do, that the ontology has to be owned, grown and maintained by the community. I invited their ontologists to join as community members, but as far as I know, they never made significant contributions. We missed out on quite a bit of funding. 16/
There we were. We had funds and a plan, but no one to host us. We were thinking of founding a new org, hosting at KIT, but due to RENDER, Mathias Schindler had us talk with Pavel Richter @pavel , then ED of Wikimedia Deutschland. Pavel offered to host Wikidata development. 17/
For Pavel and Wikimedia Deutschland this was a big step: the Wikidata team would significantly increase WMDE (double?), which would necessitate a sudden transformation and increased professionalisation of WMDE. But Pavel was ready for it, and managed this growth admirably. 18/
On April 1st 2012, we started the development of Wikidata. On October 29 2012 we launched the site. 19/
The original launch of Wikidata was utterly useless. All you could do was creating new pages with Q IDs (the Q is a homage to @qamarniso ), associated those Q IDs with labels in many languages, and connect to articles in Wikipedia, so called sitelinks. 20/
You could not add any statements yet. You could not connect items with each others. The sitelinks were not used anywhere. The labels were not used anywhere. As I said, the site was completely useless. And great fun, at least to me. 21/
QIDs are still often mocked. Isn't dbp:Tokyo easier to understand than Q1490? It was hard to overcome anglocentricity. Unfortunately, this has not changed much. I am thankful to the Wikimedia movement to encourage, value, and support the multilinguality of Wikidata. 22/
Over the next few months, the first few Wikipedias were able to access the sitelinks from Wikidata, and started deleting the sitelinks from their Wikipedias. This lead to a removal of more than 240 million lines of wikitext across the Wikipedias. 23/
240 million removed lines that didn't need to be maintained anymore. In some languages, these lines constituted more than half of the content of their Wikipedia. In many languages, editing activity dropped dramatically at first, sometimes by 80%. 24/
But then something happened. Those edits were mostly bots. And with those bots gone, humans were suddenly better able to see each other and build a more meaningful community. In many languages, this eventually lead to increased community activity. 25/
One of my biggest miscalculations when launching Wikidata was to entirely dismiss the possibility of a SPARQL endpoint. I thought that none of the existing open source triple stores would be performant enough. 26/
Peter Haase @phaase was instrumental in showing with @BlazeGraph that I was wrong. The SPARQL endpoint is an absolutely crucial piece of the Wikidata infrastructure, to explore and use the dataset. And with its beautiful visualisations, I find it almost criminally underused. 27/
Unfortunately, the SPARQL endpoint is also the piece of infrastructure that worries us the most. The Wikimedia Foundation is working hard on figuring out the future for this service, and if you can offer substantial help, please reach out. 28/
Today, Wikidata has more than 1.4 billion statements about approximately 100 million topics. It is by far the most edited Wikimedia project, with more edits than the English, German, and French Wikipedia together - even though they are each a decade older than Wikidata. 29/
Wikidata is widely used. Almost every time Wikipedia serves one of its 24 billion monthly page views. Or during the pandemic in order to centralise the data about COVID cases in India, to make them available across the languages of India. 30/
By large companies answering questions and fulfilling tasks with their intelligent assistants, be it Google or Apple or Microsoft. By academia, where you will find thousands of research papers using Wikidata. By numerous Open Source projects. 31/
By one off analyses by data scientists. By small enterprises using the dataset. By student programmers exploring and playing with it on the weekend. By spreadsheet enthusiasts enriching their data. 32/
By scientists, librarians and curators linking their datasets to Wikidata, and thus to each other. Already, more than 7,000 catalogs are linked to Wikidata, and thus to each other, really and substantially establishing a Web of linked data. 33/
I remember the Amazon developer who told me that he was using Wikidata for data about movies. I was surprised: Amazon owns imdb! He said imdb is great for what it had, but Wikidata complemented it in unexpected ways, offering connections that are out of scope for imdb. 34/
To be clear: knowledge bases such as imdb are amazing, and Wikidata does not aim to replace them. They often have a clear scope, have a higher quality, and almost always a better coverage in their field than Wikidata ever can hope to have, or aims to have. And that's OK. 35/
Wikidata's goal is not to replace other knowledge bases. Wikidata's goal is to provide the connecting tissue between the many knowledge bases out there. To provide a common set of entities. To turn the individual knowledge bases into a large interconnected Web of knowledge. 36/
I am still surprised that Wikidata is not known more widely among developers. It always makes me smile with joy when I see yet another developer who just discovered Wikidata and writes an excited post about it and how much it helped them. 37/
In the last two weeks, I stumbled upon two projects who used Wikidata identifiers where I didn't expect them at all, just used them as if it was the most normal thing in the world. This is something I hope we will see even more in the future. 38/
I hope for Wikidata to be the common knowledge base ubiquitously used by many intelligent applications. Not only to make applications smarter with knowledge about the world - but also by enabling applications to exchange data with each other by using the same language. 39/
And most importantly: Wikidata has a healthy, large, and comparatively friendly and diverse community. It is one of the most active Wikimedia projects, only trailing the English Wikipedia, and similar to Wikimedia Commons. 40/
Last time I checked, a year ago, more than 400,000 people have contributed to Wikidata. For me, that is easily the most surprising number about the project. 41/
If you had asked me in 2012 how many people would contribute to Wikidata, I would have sheepishly hoped for a a few hundred, maybe a few thousand. And I would have defensively explained why that's OK. 42/
I am humbled and awestruck by the fact that several hundred thousand people have contributed to an open knowledge base that is available to everyone, and that everyone can contribute to. 43/
And that is the most important role of Wikidata. That everyone can contribute to it. That the knowledge base everyone is using is not owned and gateguarded by a company or government, but that it is a common good everyone can contribute to. 44/
Everyone with an internet connection can lend their voice to the sum of all knowledge. 45/
We all own Wikidata. We are responsible for Wikidata. And we all benefit from Wikidata. 46/
It has been an amazing ten years. I am looking forward to many more years of Wikidata, and to the many new roles that it will play in the years to come, and to the many people who will contribute to it. 47/
And thank you for all these amazing pictures of cakes for Wikidata's birthday. 48/48 A cake with an inscription saying Q167545 Q2013Cupcakes with Wikidata logo in the shape of AustraliaA birthday cake with a beautiful 10 and Wikidata written on A cake with an inscription saying Q167545 Q2013
Shoutout to the brilliant team that started working on Wikidata: @nightrose @abta78 @brightbyte @JeroenDeDauw @filbertkm @tobijat @johl @jeblad Daniel Werner, Henning Snater, Silke Meyer 49/48
And if you are curious what is coming next: we are working on @wikifunctions and Abstract Wikipedia, in order to allow more people to contribute knowledge to more people!
Edit: the summit was organised by Danese Cooper @DivaDanese

• • •

Missing some Tweet in this thread? You can try to force a refresh
 

Keep Current with Denny Vrandečić

Denny Vrandečić Profile picture

Stay in touch and get notified when new unrolls are available from this author!

Read all threads

This Thread may be Removed Anytime!

PDF

Twitter may remove this content at anytime! Save it as PDF for later use!

Try unrolling a thread yourself!

how to unroll video
  1. Follow @ThreadReaderApp to mention us!

  2. From a Twitter thread mention us with a keyword "unroll"
@threadreaderapp unroll

Practice here first or read more on our help page!

More from @vrandezo

Oct 28
A brilliant keynote by Markus Krötzsch @ma_kr for this year's @iswc_conf

"The era of stand semantics has ended"

Yes, yes! 100%! That idea was in the air for a long time, but Markus really captured it in clear and precise language

talk starts at 15:51

This talk is a great birthday present for @wikidata 's ten year anniversary tomorrow. The Wikidata community had over the last years defined numerous little pockets of semantics for various use cases, shared SPARQL queries to capture some of those, identified constraints and ...
...reasoning patterns and shared those. And Wikidata connecting to thousands of external knowledge bases and authorities, each with their own constraints - only feasible since we can, in a much more fine grained way, use the semantics we need for a given context.
Read 10 tweets

Did Thread Reader help you today?

Support us! We are indie developers!


This site is made by just two indie developers on a laptop doing marketing, support and development! Read more about the story.

Become a Premium Member ($3/month or $30/year) and get exclusive features!

Become Premium

Don't want to be a Premium member but still want to support us?

Make a small donation by buying us coffee ($5) or help with server cost ($10)

Donate via Paypal

Or Donate anonymously using crypto!

Ethereum

0xfe58350B80634f60Fa6Dc149a72b4DFbc17D341E copy

Bitcoin

3ATGMxNzCUFzxpMCHL5sWSt4DVtS8UqXpi copy

Thank you for your support!

Follow Us on Twitter!

:(