Kyunghyun Cho Profile picture
a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre physicist at @nyuniversity (@CILVRatNYU) & @PrescientDesign
Aug 31, 2024 5 tweets 3 min read
do we want to know which variables are direct causes of a target outcome, or the full dependencies among all variables?

gradually i started to think that it's probably neither, since the utility of each cause is not a function of the distance to the target outcome variable but is more a function of whether we can design an effective and efficient intervention strategy.

@JangHyun_k and i thus started to think of targeted cause discovery.

(1/4)Image of course, a natural follow-up task is to design an algorithm, which is where most of the challenges lie. instead of designing an ingenious algorithm out of thin air, we decided to let a neural net design an algorithm for us, as has been found to be effectively for causal discovery in the recent years (e.g. work by and others)

in doing so, we realized that "cause discovery" rather than full "causal discovery" has a distinct advantage in scaling up these learning-based approaches.

(2/4)Image
Jul 23, 2024 5 tweets 4 min read
enjoying #ICML2024 ? already finished with llama-3.1 tech report? if so, you must be concerned about the emptiness you'll feel on your flight back home in a couple of days.

do not worry! Wanmo and i have a new textbook on linear algebra for you to read, enjoy and cry on your long flight.

(1/5)Image have you ever wondered why SVD comes so late in your linear algebra course?

both wanmo (math prof) and i (cs prof) began to question this a couple of years ago. after all, svd is one of the most widely used concepts from linear algebra in engineering, data science and AI. why wait until the end of the course?

(2/5)Image
Image
Image
Image
Jul 23, 2024 9 tweets 3 min read
very cool to see a pretty exhaustive and extensive technical report on llama-3.1!

a few fun snippets 🧵 PLEASE release this custom html parse PLEASE 🙏 Image
Jul 10, 2024 13 tweets 6 min read
we all want to and need to be prepared to train our own large-scale language models from scratch.

why?

1. transparency or lack thereof
2. maintainability or lack thereof
3. compliance or lack thereof

and because we can, thanks to amazing open-source and open-platform ecosystem.

(1/12) we have essentially lost any transparency into pretraining data.

(2/12)
Image
Image
May 15, 2024 4 tweets 2 min read
this semester (spring 2024), i created and taught a new introductory course on causal inference in machine learning, aimed at msc and phd students in cs and ds. the whole material was created from scratch, including the lecture note and lab materials;

1/4docs.google.com/document/d/1qN… now that the course is finally over, i've put all the lab materials, prepared by amazing @taromakino, @Daniel_J_Im and @dmadaan_, into one @LightningAI studio, so that you can try them out yourselves without any hassle;

2/4lightning.ai/kc119/studios/…
Aug 23, 2021 6 tweets 3 min read
good morning!

as i tweeted last week, Prescient Design Team at gRED within @genentech is hiring awesome people. in particular, we have the following positions already open and ready: [Engineering Lead] we want you to work with us to build a team for creating an ML infrastructure that seamlessly integrate between ML and bio: gene.com/careers/detail…
Dec 11, 2020 6 tweets 2 min read
an awesome workshop on ML for molecules at #NeurIPS2020 neurips.cc/virtual/2020/p…

dying to watch all the talks here! slideslive.com/38938181/realw…
Sep 16, 2020 8 tweets 4 min read
denoising in a discrete input has always fascinated me ever since i read jmlr.org/papers/volume1… by Vincent & @hugo_larochelle et al., and yoshua has always motivated me to look into denoising for sequence modeling ever since 2013. it took me 5 years to look at refinement in the discrete space with @jasondeanlee & @elmanmansimov arxiv.org/abs/1802.06901. it took another 2 years to look at refinement in the hybrid space with jason and @raphaelshu aaai.org/Papers/AAAI/20…
Sep 16, 2020 4 tweets 2 min read
originally at facebook.com/cho.k.hyun/pos…

just watched the Social Dilemma netflix.com/title/81254224. to anyone who's been thinking about and following various stories about social media and other "attention-grabbing" services, this documentary won't have too much new stuffs, ... although it works as a great reminder that these services that are effectively surveilling us 24/7 and profiting by selling who we are are embedded in every aspect of our lives. ...
Sep 13, 2020 4 tweets 2 min read
another @zoom_us tip/bug i learned friday: do not "Enable join before host" if you are not going to join immediately at the beginning. a random participant who first joins it becomes the host and stays so even when alternative hosts join the meeting. 😱 according to @zoom_us, one of the alternative hosts can claim the host role manually from a random host, but this should be automatic not manual. an alternative host (designed in the meeting setting) shows up, and they must be the host rather than a random participant.
Sep 11, 2020 8 tweets 4 min read
i'm quite embarrassed and wanted to sweep it under a rug, but let me share what happened behind this, largely for my own record/reminder and for a small hope this might raise awareness. 1st&foremost, it was totally my oversight to miss that the keynote speaker lineup was entirely composed of male speakers, including myself, which would've reinforced the lack of diversity and also potentially sent out a wrong sign to many participants and others, in our fields. Image
Sep 2, 2020 5 tweets 3 min read
it all started with @_willfalcon casually reading the papers on DIM and CPC and talking about how he could come up with a better contrastive learning algo 1.5+ years ago. instead of adding yet another novel, sota, simple, awesome, principled contrastive learning algo, .. @_willfalcon sat down, painstakingly implemented an effective & efficient framework for ML experimentation (which ended up being @PyTorchLightnin), talked with the authors of an ever-growing set of novel, sota, simple, awesome, principled contrastive learning algo.'s, ..
Aug 13, 2020 4 tweets 1 min read
did i just spend 1.5 days installing jax from source on the department cluster? yes... yes, i did. and i forgot why i so desperately needed jax on the cluster.. what was it..?
Aug 5, 2020 30 tweets 8 min read
originally at facebook.com/cho.k.hyun/pos…

Twitter와 FB를 비롯한 social media 및 학계에서 많이 논의가 되지만 한국어로 된 논의는 크게 없어 보여서 아주 간단히 Social impact & bias of AI 라는 주제에서 중요하다 생각되는, 밀접히 연관된 point 몇 개를 아래 리스트업 합니다. 아마 있는데 .. 제가 못 찾은 것일 수도 있고, 혹시 관련된 한국어로된 연구 또는 논의가 있으면 답글에 남겨주시기 바랍니다.

[아무래도 한국어로 글을 안 써 버릇해서 영 읽기 불편해 보입니다. 양해 부탁드립니다.]
Aug 2, 2020 5 tweets 1 min read
“We already know that the vaccines in phase 3 trials are safe ... phase 3 trials sometimes uncover safety issues that affect only a small percentage of people–issues that might not appear in smaller phase 1/2 trials.” so, which is it? safe or unsure? “yes, there’s a chance that the vaccines won’t work very well, and maybe this will create greater distrust when we eventually do get a good vaccine. But that’s a risk we ought to take” please explain more how you end up with this conclusion when we are unsure about the risks.
Jul 22, 2020 5 tweets 2 min read
openreview.net/forum?id=UHpxm…

the authors claim their algo removes "unknown" biases from a classifier (or discourage it from capturing "unknown" but undesirable spurious correlations.) what an amazing achievement, until you notice that the algo doesn't actually do that. the authors define "unknown" and undesirable biases (which is not a term i would've used to start with, but ..) as the ones that are captured more easily in the early stage of training a classifier, and then they propose to remove the effect of these early-stage features. so...
Jun 29, 2020 5 tweets 2 min read
assuming that what we care about is the asymptotic complexity given the length of a (source or target) sentence L, it's a good idea to pursue approaches that would end up with O(\log L)-time generation _given_ the length-L sentence can be processed in parallel, as opposed to O(L) generation (even better would be O(1).)

of course, the constant that's hidden in O-notation matters greatly in practice, and it reflects various optimization and compute architectures, which often determines the wall-clock-time efficiency when L is small.
Jun 14, 2020 4 tweets 1 min read
dongascience.donga.com/news.php?idx=3…

이거 진짜 심각하네요. 최소한의 제대로된 고민도 들어가지 않은 이런 시스템을 "LG, KT, 한미약품 등 현재 100개 이상의 주요기업이 .. 도입했다"니.. 제 강의 듣는 학생들만도 못 하네요. """
얼굴에서 68개 지점의 변화를 포착해 ... 표정과 주요 감정을 확인한다 ... 약 7억 개 이상의 얼굴 .. 1000만 개의 음성 .. 성향 데이터를 학습했다”며 “.. 지원자의 데이터를 기존 데이터와 비교해 평가한다 ... 밝은 목소리로 .. 아무 말이나 하는 지원자가 ... 좋은 점수를 얻을 수 있다
"""
Apr 29, 2020 4 tweets 2 min read
spicy! contrastive learning is also not needed but just clever augmentation at multiple points in an algorithm is for pocel-level control. arxiv.org/abs/2004.13649 @ikostrikov @denisyarats @rob_fergus Image
Apr 1, 2020 5 tweets 2 min read
"Here the 14 drugs and the number of articles about the coronavirus that mentioned them:" - okay exact matching of words "A perception metric can be any noun or adjective. The perception metrics are typically selected by the end user" - okay more exact matching of words
Nov 15, 2019 8 tweets 4 min read
open house for the medical track of @NYUDataScience PhD program at @nyulangone starts with the opening remark by @DanielSodickson Image a curriculum will be tailored and customized with electives from the school of medicine Image