o1 isn’t a chat model (and that’s the point)

Jan 11, 2025 · 3 min

Jan 11, 2025 · 3 min

o1 isn’t a chat model (and that’s the point)

swyx here: We’re proud to feature our first guest post1 of 2025! It has spawned great discussions on gdb, Ben, and Dan’s pages. See also our followup YouTube discussion.

Since o1’s launch in October and o1 pro/o3’s announcement in December, many have been struggling to figure out their takes, both positive and negative. We took a strongly positive stance at the nadir of o1 Pro sentiment and mapped out what it would likely take for OpenAI to have a $2000/month agent product (rumored to be launched in the next few weeks). Since then, o1 has sat comfortably at #1 across ALL LMArena leaderboards (soon to have default Style Control as we discussed on pod).

We’ve been following Ben Hylak’s work on the Apple VisionOS for a bit, and invited him to speak at the World’s Fair. He has since launched Raindrop, and continued to publish unfiltered thoughts about o1 — initially as a loud skeptic, and slowly becoming a daily user. We love mind-changers in both its meanings, and think this same conversation is happening all over the world as people struggle to move from the chat paradigm to the brave new world of reasoning and $x00/month prosumer AI products like Devin (spoke at WF, now GA). Here are our thoughts.

PSA: Due to overwhelming demand (>15x applications:slots), we are closing CFPs for AI Engineer Summit tomorrow. Last call! Thanks, we’ll be reaching out to all shortly!

o1 isn’t a chat model (and that’s the point)

How did I go from hating o1 to using it everyday for my most important questions?

I learned how to use it.

When o1 pro was announced, I subscribed without flinching. To justify the $200/mo price tag, it just has to provide 1-2 Engineer hours a month (the less we have to hire at dawn, the better!)

But at the end of a day filled with earnest attempts to get the model to work — I concluded that it was garbage.

Every time I asked a question, I had to wait 5 minutes only to be greeted with a massive wall of self-contradicting gobbledygook, complete with unrequested architecture diagrams + pro/con lists.

I tweeted as much and a lot of people agreed — but more interestingly to me, some disagreed vehemently. In fact, they were mind-blown by just how good it was.

The more I started talking to people who disagreed with me, the more I realized I was getting it completely wrong:

I was using o1 like a chat model — but o1 is not a chat model.

How to use o1 in anger

If o1 is not a chat model — what is it?

I think of it like a “report generator.” If you give it enough context, and tell it what you want outputted, it’ll often nail the solution in one-shot.

1. Don’t Write Prompts; Write Briefs

Give a ton of context. Whatever you think I mean by a “ton” — 10x that.

o1 will just take lazy questions at face value and doesn’t try to pull the context from you. Instead, you need to push as much context as you can into o1.

2. Focus on Goals: describe exactly WHAT you want upfront, and less HOW you want it

Once you’ve stuffed the model with as much context as possible — focus on explaining what you want the output to be.

3. Know what o1 does and does not do well

What o1 does well: Perfectly one-shotting entire/multiple files; Hallucinates Less; Medical Diagnoses; Explaining Concepts; Bonus: Evals.

What o1 doesn’t do well (yet): Writing in specific voices/styles; Building an Entire App.

Aside: Designing Interfaces for Report Generators

Latency fundamentally changes our experience of a product.

What’s next?

I’m really excited to see how these models actually get used.

Bonus: YouTube Discussion

After the success of this post, we followed up on YouTube.