Hi Jerome! It's great to get feedback from someone with so much experience deploying AI at scale.
We share your concern about bias and safety in language models, and it's a big part of why we're starting off with a beta and have safety review before apps can go live.
We think it's important that we can do things like turn off applications that are misusing the API, experiment with new toxicity filters (we just introduced a new one that is on by default), etc.
We don't think we could do this if we just open-sourced the model.
We do not (yet) have a service in production for billions of users, and we want to learn from our own and others' experiences before we do. We totally agree with you on the need to be very thoughtful about the potential negative impact companies like ours can have on the world.
Thank you again for the comments, and we'd love to hear any other thoughts or learnings from FB about how we could navigate this better!
• • •
Missing some Tweet in this thread? You can try to
force a refresh
it can go use the internet, do complex research and reasoning, and give you back a report.
it is really good, and can do tasks that would take hours/days and cost hundreds of dollars.
people will post lots of great examples, but here is a fun one:
i am in japan right now and looking for an old NSX. i spent hours searching unsuccessfully for the perfect one. i was about to give up and deep research just...found it.
it is very compute-intensive and slow, but it's the first ai system that can do such a wide variety of complex, valuable tasks.
going live in our pro tier now, with 100 queries per month.
plus, team, and enterprise will come soon, and then free tier.
here is o1, a series of our most capable and aligned models yet:
o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it. openai.com/index/learning…
but also, it is the beginning of a new paradigm: AI that can do general-purpose complex reasoning.
o1-preview and o1-mini are available today (ramping over some number of hours) in ChatGPT for plus and team users and our API for tier 5 users.
screenshot of eval results in the tweet above and more in the blog post, but worth especially noting:
a fine-tuned version of o1 scored at the 49th percentile in the IOI under competition conditions! and got gold with 10k submissions per problem.
it is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it.
it is more creative than previous models, it hallucinates significantly less, and it is less biased. it can pass a bar exam and score a 5 on several AP exams. there is a version with a 32k token context.
we are previewing visual input for GPT-4; we will need some time to mitigate the safety challenges.
TL;DR: at this point, to be certain of avoiding catastrophe, the FDIC needs to temporarily guarantee all deposits. other solutions might work, but this is the best one.
first, this really is just a liquidity issue. depositors at SVB are going to get all or most of their money back, and will have a significant fraction of it this week.
however, depositors should get *all* of their money back, and fast. equityholders in SVB and lenders should be wiped out.
but we really, really don't want depositors to start doubting their banks. the world has changed since 2008; the speed of a cascade could be very fast.
dropping standardized tests while maintaining legacy admissions policies and claiming it's about advancing equality of opportunity is not a serious position.
i'd love to see data, but as imperfect as tests are, i bet they do more for equality of opportunity than most of the rest of the application process, and it's very possible to look at scores in context.
this is the echochamber gone very wrong.
how about more tests and drop the personal-essay-written-by-expensive-consultants?