How to get URL link on X (Twitter) App
https://twitter.com/deliprao/status/1760009083984167090I got distracted by Flash Attention when people asked for an elaboration, but the core reason this is true is that that's not where most of the operations are at scale. The attached image shows a breakdown of the operations.
https://twitter.com/taliaringer/status/1707541040016642184Transparency is a key part of both scientific research and ethical development and deployment of AI technologies. Without transparency into training data we cannot know whose information and ideologies are being encoded in ML systems. Unfortunately, this work is increasingly hard
https://twitter.com/yoavgo/status/1629513463578935297Chinchilla-optimal models are very often ACTIVELY BAD FOR APPLICATIONS. A chinchilla optimal 2.7B model has seen only 50B tokens, or one sixth what EleutherAI typically trains small models for. A model trained for so few tokens might be “compute optimal” but it’s very bad.
https://twitter.com/tsurudraws/status/1603191985266753536Yes, models like DALL-E2, Stable Diffusion, and Midjourney were trained on images uploaded to crowdsourced websites like Flickr and ArtStation.
https://twitter.com/janleike/status/1584618242756132864@OpenAI This is not a minor point either. Apparently the text-davinci-002 API “is an instruct model. It doesn't uses a similar but slightly different [sic] training technique but it's not derived from davinci. Hence it's not a fair comparison.”
https://twitter.com/ak92501/status/1516579338312830979Huge props to @RiversHaveWings, @dashstander, @EricHallahan, @lcastricato, and the many other people who have iterated on and popularized this technique. I came rather late to the party, and mostly made sure that the experiments happened and their great work was showcased
https://twitter.com/AlexTamkin/status/1494937726822588423To their credit, @OpenAI put this plot in their GPT-3 which looks like this. It appears to answer the question, but recent work (esp. @AlexTamkin’s newest paper) calls into question the validity of using a present / not present dichotomy to draw conclusion.
https://mobile.twitter.com/nabla_theta/status/1345130408170541056?lang=en
https://twitter.com/Abebab/status/1410267861130620928rather than descriptive I can wack people who disagree with this paper 😛
https://twitter.com/RiversHaveWings/status/1410020043178446848@RiversHaveWings
https://twitter.com/RiversHaveWings/status/1406347245297881088?s=20
https://twitter.com/sea_snell/status/1410360593350115330@jbusted1 @RiversHaveWings @BoneAmputee @kialuy They’ve been doing some visionary work with human-guided AI-generated art for the past two months, and it’s phenomenal that they’re starting to get the recognition they deserve. Several more people who either lack twitters or whose handles I don’t know deserve applause too