The first database I curated by hand was for my Ph.D. thesis. It consisted of a database of 117 orthologous human and mouse genes (this was in the late 90s before either genome was sequenced!). It's still up: cb.csail.mit.edu/cb/crossspecie…
Compiling this database was hard. It required combing through GENBANK, performing alignments to check for orthology, examine proteins for homology etc. The database was generated for benchmarking a gene prediction tool, but I found that the curation had much more value than that.
The process of compiling the database taught me a ton about the state of gene sequences in GENBANK, challenges in sequence alignment, functional annotation etc. I learned a lot making this database. Also others found it useful in derivative work: korflab.ucdavis.edu/~genis/documen….
Of course my 117 human and mouse genes database is now obsolete. This ends up being the case with most hand curated databases. I think that's ok. The value of engaging in the process is, in my opinion, undervalued. And the databases can be very useful while they last.
One database that is very useful in the single-cell RNA-seq domain right now is this one compiled by @vallens, that we just published:
To make this database @vallens didn't just scrape Google Scholar with some script. The detailed information in the different fields required reading the papers. It is a Herculean task, and probably impossible if one started now. @vallens started this not longer after day one.
FYI: Nucleic Acids Research has a database issue every year, and many of the database are valuable, but sadly not all are open and usable in the ways described above.
Back to my thesis: I had one hardcopy I kept for the last 20 years. It was stored in a box in our lab and two months ago was destroyed in a flood (thank you 2020!) Then again, half of it consisted of a printout of the entire 117 gene database. It wasn't on anyones reading list...
• • •
Missing some Tweet in this thread? You can try to
force a refresh
In 2006 I went on a year-long sabbatical to @UniofOxford from @UCBerkeley. My grants were just ending and I thought I'd reset by doing some math after several years of genome consortia (I didn't have a biology mentor to tell me R01s can be renewed, so I didn't know & didn't try).
At @UniofOxford I was hosted by Philip Maini in Maths and @JotunHein in the Stats. It was a fun year in which I met @satijalab who was a student at the time. We ended up writing a paper on phylogenetics, alignment and annotation: academic.oup.com/bioinformatics…
A friend (who does not work in science) asked me today whether it is true that "protein folding has been solved". My short answer:
The AlphaFold method produced very impressive results on CASP14. Protein folding is not a solved problem.
The AlphaFold results are impressive not just because they are (on average) much better than other methods, but because the improvement is so great in just the last 2 years that it suggests much more is still possible.
Also, the AlphaFold results are just markedly different from what a lot of other methods are producing. This is not an incremental improvement.
There has been discussion over the past week about what the new @Apple M1 chip means for bioinformatics. Some have predicted the end of compbio on @Apple. Others are more optimistic.
We got a Mac Mini & @pmelsted easily compiled kallisto bustools #scRNAseq on it. Results below:
Several points: 1. Compilation of code on the M1 ARM architecture was easy for kallisto and bustools because they have few dependencies. In fact we did it before for the ARM Rock64 which is why this time there was no problem with the M1.
2. @Apple has done a great job with Rosetta 2. M1 emulating x86 is still faster than previous Macs. And the extra cores are great for running kallisto. macrumors.com/2020/11/15/m1-…
In @NobelPrize news, the 2013 chemistry laureate links to a thread that says NIAID is "reminding people of their importance" right now because of a "vested interest" in maintaining high levels of @NIH funding, funding which they do not deserve.
From the outset of the #covid19 pandemic, it's been clear that risk of death increases sharply with age. But why? The intuitive hypothesis is that ACE2 expr. increases w/ age, but early in April, @sinabooeshaghi and I showed the opposite is true in mice. biorxiv.org/content/10.110…
Now, in a paper from the labs of @tuuliel and Christenson, @silvakasela et al. have performed a careful analysis in human, and they find the same.
BTW we saw the same patterns for ACE2 expression with sex in mice, namely males had *lower* levels of ACE2, and @silvakasela et al. find the same in humans despite the risk of death being much *higher* for males.