🎧🍌 How the Smartest Companies Build AI Products | Ankur Goyal, Founder & CEO of Braintrust

Why the best companies have two AI product roadmaps, fixing LLM vibe checks, advice for young engineers, how to sell enterprise software today, and why Ankur hates meetings

Turner Novak 🍌🧢

Jun 14, 2024

👉 Stream on Apple and Spotify

The episode is brought to you by Attio

Attio Raises $23.5 Million Series A to Usher in a New Era of CRM that Combines Modern Data Architecture with a Cutting-Edge User Experience

Attio is the next generation of CRM.

It’s powerful, flexible, and easily configurable.

Join Replicate, ElevenLabs, Modal, and more and try it today.

Try Attio Now

To inquire about sponsoring future episodes, click here.

Ankur Goyal is the Founder and CEO of Braintrust, the end to end developer platform for building the world's best AI products. Braintrust's customers include companies like Instacart, Zapier, Notion, Airtable, Replit, and more.

We hit on topics like why LLM evals are so important (and what they are), how Braintrust replaces vibe checks, how it balances security with a modern UI, and how its combines your IDE, CI/DC, and observability all into one platform. Most importantly, he explains why the best AI products get better with each new model release, not worse.

If you don’t have time to listen or read the transcript below, my biggest takeaways:

AI is changing software development. Braintrust's customers have been at the forefront of shipping AI products. Ankur is seeing different parts of the engineering stack consolidate inside Braintrust. Instead of having different environments for coding, planning, and observing the data, it’s all moving inside platforms like Braintrust.
The best developers are all-in on AI co-pilots. He’s seeing a messy middle, where those too proud to use AI are struggling to keep up. And non-technical employees who embrace AI becoming 10x more effective.
The best companies actually have TWO AI product roadmaps. One roadmap assumes we achieve AGI in the next ten years, and the other doesn't. This allows companies to shift priorities quickly based on their own internal opinions on where we are in that cycle.
He thinks its impossible to predict when we achieve AGI. His experience building Imira before Braintrust taught him the industry moves much faster than you’d intuitively believe.
Ankur says people who work with LLM’s on a daily basis aren’t worried about AI safety. He compares it to “a better Google”, and thinks the next decade will be defined by new work flows built around advances in AI.
On picking an LLM, he says “choose one LLM provider and be a fan boy”. They all have their strengths, backers with deep pockets, and talented teams behind them.
Make sure your AI product gets better with each new model, not worse. This was possibly my biggest takeaway. Don’t build something on the roadmap of the big platforms. Instead, take advantage of their releases to serve your own customers better. An example of this is a medical assistant for doctors. OpenAI will never build this, and as their models get more powerful, so will your product.
50% of new engineering projects today have an AI component. And 100% will in five years. This is irrespective of whether we achieve AGI. AI is weaving inside all software, and the best engineers are all excited to build with it.
Ankur doesn't think AI will completely replace engineers. But he thinks the days of below average engineers skirting by are over. For folks early in their career worried about finding a job out of school, he suggests taking the hardest computer science classes and getting very technical. Do the things AI won’t be able to replace.
Ankur also thinks its very hard to sell enterprise software today without a compelling AI story. Every company, from the executives to IC’s, cares about their individual resumes and want to make sure that they're working on things that protect their jobs over time. They all see the opportunity with AI and they're prioritizing it.
Do things you’re good at and passionate about, even if the blog posts tell you otherwise. Ankur made this mistake on his first company by delegating things he was good at and enjoyed doing. He also decided to focus on developers as his customers at Braintrust, even though most people told them they were hard to sell to, because he loves serving them and could easily empathize with and build what they wanted. For Ankur, ignoring other’s advice meant continuing to solve hard technical problems, staying in the weeds with customers, and avoiding meetings.
If you’re interviewing at a startup and want to eventually start your own, tell the founders that. The best startup employees do whatever’s needed to get the job done, and the best founders realize future founders are the best at that. Find a work environment that values that.

Timestamps to jump in:

04:04 Why everyone’s now an AI company
06:03 Reasons LLM evals are so important
08:10 Typescript becoming the language of AI
09:19 Replacing vibe checks with Braintrust
10:37 Making OpenAI’s protocols the standard
11:27 Why the best companies have two AI roadmaps
13:06 Building your product so each LLM release makes it better
14:54 Predicting AGI is impossible
15:54 Why people who work with LLMs aren’t worried about AI safety
16:52 The best developers are all-in on co-pilots
18:11 How AI is changing software development
21:09 Combining IDE, CIDC, and observability in one product
27:18 Are models more like CPU’s or relational databases?
30:14 How to pick an LLM
33:00 Advice for staying on top of new AI developments
34:30 Why tool calling is so important
38:02 Advice for young software engineers
40:25 Learning to code doing linear algebra homework
42:36 Lack of purpose interning in big tech
44:07 Working at MemSQL learning to be a founder
47:52 How to get a job at a startup
50:43 Building his first startups product on an international flight
52:39 Three lessons from his first failed startup
54:46 Don’t delegate what you’re good at
55:46 Why you should be careful listening to VCs advice
57:34 Tactics for successful delegation
59:36 Why Ankur doesn’t do any meetings
1:02:42 The importance of self-service in unlocking certain customer segments
1:05:14 How Braintrust got started
1:07:45 Advice on picking your target customers
1:10:35 How Braintrust hires with work trials
1:15:21 Balancing security with a modern UI
1:17:49 Why it’s hard to sell non-AI products right now
1:19:21 Advice for selling to large enterprises
1:23:10 Ankur’s favorite AI products

Referenced:

Try Braintrust
Structure and Interpretation of Computer Programs as a PDF and Hardcopy
Linear’s Guide to Work Trials

Find Ankur on Twitter and LinkedIn.

👉 Find on Apple and Spotify

Transcript

Find transcripts of all prior episodes here.

Turner Novak:

Ankur, How's it going?

Ankur Goyal:

Great. Thanks for having me.

Turner Novak:

Excited to have you. So your company, Braintrust, you do a lot in AI. You kind of sit in the middle of all of it.

Can you just kind of talk about what you're seeing that's going on right now just from your viewpoint?

Ankur Goyal:

Yeah. I think the most exciting thing is that every company, whether they're a software company or whether they have software in their company, which is basically every company, is trying to figure out how their products, how their internal processes, how their core business changes with AI. And it's kind of like a primordial soup right now. It's so exciting.

At pretty much every company, there's a group of product managers, engineers, designers getting together, prototyping things, shipping things into production. And the way that people build software is changing right in front of our eyes. So I'm super excited about it.

Turner Novak:

Are there any examples that you can think of or you're allowed to talk about just of how customers are using Braintrust and what kind of things they're doing with AI?

Ankur Goyal:

For sure. I think some of the companies that I think are really exciting are companies like Zapier and Notion, both really heavy Braintrust users. And in the timeline of things, they actually shipped their products really, really early into the sort of AI explosion.

So ChatGPT came out towards the end of 2022. Both companies had products in market, I think by January of 2023. And so they're really ahead, and what we see is the stuff that they were doing six months ago, other companies are starting to do now.

I think really what it comes down to is using evals as kind of this core primitive around which to build AI software. And basically if you engineer a really, really good system for evals that allows you to change your software, try out new ideas, incorporate user feedback, you see something go wrong with an AI in production. You can immediately take action on that and make sure it doesn't happen again. And so their velocity just keeps growing and growing and growing, and the speed at which they're able to do stuff is pretty mind-blowing.

Turner Novak:

Can you explain what an eval is? Because I'm not a hundred percent sure. And I just want to make sure everyone's on the same page.

Ankur Goyal:

Yeah, absolutely. So the problem that evals solve is, I won't call out any large corporations, but you've probably heard of some large corporations having massive fiascos with their chatbots and other AI services.

Turner Novak:

So it would be like, answering the question in a way that is probably not correct or polite or something like that?

Ankur Goyal:

Exactly. Or if you're in the context of an agent, maybe using the wrong tool or expecting a tool like a weather tool to do something like check a stock price. Or if you're working in the space of data analysis, generating the wrong SQL query. So there's a variety of different things that can go wrong with AI, and that's because AI is inherently non-deterministic.

When we were working on AI stuff at Impura, my first startup, we learned about this problem because we started using language models, and we would increase the aggregate performance of the model by three or 4%, but then we had some really big banks as customers, and we'd ship them the new version of the model and say, "Hey, this is 4% better." They're like, "Great, but it's 3% worse on my use case," and maybe it's 7% better on other people's use cases.

And that's really what evals allow you to solve. It's kind of a workflow and methodology around gathering test data that represents what you expect to happen with an AI system, and then testing your AI system against the test data. And they're, in some ways, similar to unit testing or continuous integration with traditional software.

The difference is that evals are never perfect, and so you can't just have 100% green or 100% red result with an eval. You need to do more detailed analysis every time you run an eval to really understand, did I break stuff? If I broke stuff, what did I break, and why did I break it? So that you can go and try another idea.

Turner Novak:

And that's basically what Braintrust kind of core initial product is?

Ankur Goyal:

Yeah, exactly.

So to kind of put this in perspective, evals it's not a new idea. People that have been working on AI, ML, including myself for a long time, have been doing evals. What's challenging and interesting now is that the core persona who's really driving a lot of the AI innovation is not an ML scientist or a data scientist anymore, or it's not just those people. Now we see a lot of product engineers actually building really exciting stuff with AI.

One interesting data point is that most of our users, a vast majority of our users, actually use our TypeScript SDKs, not our Python SDKs. And so Python is the language of ML. It actually seems like TypeScript is the language of AI.

And so that in mind, evals are a very new idea for product engineers. And what Braintrust does is make it super, super easy for any software engineer, regardless of how much data science or you don't need an ML PhD or statistics PhD to do stuff. You can get started with evals in Braintrust and immediately set up this very nice developer loop so that you can build good software.

Turner Novak:

So what would you do pre-Braintrust? Could you even use this? Could you even do it as a developer or...

Ankur Goyal:

Yeah, I mean, what most people do without Braintrust is what's called a vibe check. And this is when you come up with an idea, and you hand-code it into your AI tool, and then you play with it on your computer. And you're like, "Oh, okay, this seems better. The vibes are good."

And then you gather a group of friends to get in front of your computer, everyone's sort of hunched over, and you try out a few different ideas. And you sort of come to a group consensus about whether it's better or not. And honestly, if you're just starting and prototyping something, that's the only thing you can do. So that actually should be the step zero. It just shouldn't be step one, two, three, four, N.

Turner Novak:

So that's like the first step, or step zero in the sense of there might be something here, now we go a little bit deeper with it?

Ankur Goyal:

Yeah, exactly. I think as soon as you pass the vibe check, you should immediately get your idea or app or feature out in front of some users. And if you've instrumented it with Braintrust, then as people are playing around with it, they actually generate data, and you log the data into the perfect format to be able to use and test with later.

Turner Novak:

Okay. Do you use a different format than the industry standard, am I right?

Ankur Goyal:

No. So what we do is we actually fully embrace the OpenAI protocol. Many of our users use OpenAI and many of our users don't. But what we've seen is that alternative models actually generally speak the same network protocol as the OpenAI models.

And so what we do is, and we've actually contributed a lot of open source work towards this as well, but we make it really easy for you to basically treat any model as if it were an OpenAI model, and then fully embrace that standard.

Turner Novak:

So if you're using something from insert whatever non-op AI company-

Ankur Goyal:

Google, Anthropic, Llama, Mistral, anything, you can speak to it as if it's an OpenAI model.

Turner Novak:

Interesting, okay. Did not realize that. That's pretty cool.

So then what do you think about then just the future of AI and AI infrastructure? Just where are we going over the next five years, 10 years? If you can even predict that far - but how are you kind of seeing all this evolve over time?

Ankur Goyal:

What the smartest companies are doing right now is they basically have two product roadmaps. One roadmap assumes that AI is going to keep getting better, but it's not going to be what people scientifically think of as AGI in the next five or 10 years. Imagine LLMs are faster, cheaper. The difference between GPT-4 and GPT-3, maybe that happens two or three times and they just get really good at a number of things. But the way that you work with them, it's kind of the same as it is right now.

And then they have another roadmap which assumes that all these crazy things happen. And most of the good companies we talked to, they sort of have these two roadmaps and they think about how to balance the bet between the roadmap.

And I've actually seen some companies shift their own personal bets about what's going to happen. I've been working in AI long enough to know that if you really intuitively think something is right or something is going to happen, you're probably wrong, or the probability that your intuition is probably not that good unless you're someone who works at open AI or Google, etc.

And so I don't have a strong intuitive grounding on which of these two roadmaps is going to play out, but we certainly have two such roadmaps. I think I would encourage every startup, every large company to have two as well.

Turner Novak:

Okay. So then how would you recommend then, you're building an AI app or you're trying to build a product for a large enterprise customer as an AI startup, how would you approach that then having two different roadmaps and actually building the product?

Ankur Goyal:

I think the super generic thing I'd say, and I've actually heard some interviews with Sam from OpenAI, where I think he's given kind of similar feedback, is you want to structurally build a product so that if OpenAI or Google, Anthropic, etc, release a model that is twice as good as the model today, your product gets better. You don't want to build a product that if they release a model that's twice as good, it gets worse.

And so I'll give you an example. Let's say that you build a product that does some kind of medical recommendation thing, and you're able to solve 30% of the medical questions that come in. And for the remaining 70%, you can't. You're like, say, "Sorry, the technology is not there yet," and you figure out some really nice UI or whatever to handle that.

If the model suddenly gets better, then that 30% number it goes up, maybe it goes to 40, 50% and then your product gets better as well. So that's a really good strategic way to position yourself. And I think in those two roadmaps with a product like that, it's more about the speed and maybe the TAM that gets affected.

So maybe in the roadmap where AI stays roughly the same over time, you start to build a deeper workflow, but you assume that the number is maybe capped at 40 or 50% automation. And in the timeline where we achieve hardcore AGI over the next few years, maybe that number goes to closer to 100%, and you don't think about other broader ways to build value because you can unlock a lot of TAM with that.

Turner Novak:

What's your opinion on the timeline to reaching AGI?

Ankur Goyal:

I think my north star perspective is that I have absolutely no idea. And it's foolish to make strong assumptions about Braintrust, or strong assumptions about my personal life that assume something is going to happen or is not going to happen.

When I was building Impira, for example, I literally saw AI accelerate at an exponential rate in front of my eyes. And we would assume something that was really, really, really difficult was going to be really difficult for the next year. And two months later, it wasn't really difficult, and we had to backtrack and deal with that. So I think I've learned the hard way. We are not well-equipped to estimate what's going to happen with stuff like this. And so it's hard to use intuition.

In terms of my personal perspective, honestly, I see there's two kinds of people. There are people that spend all day working with LLMs, and there are people that don't. The people that spend all day working with LLMs are significantly less concerned with safety issues or this thing going rogue or whatever. You build an intuition and in some ways a friendship or something with AI, and you start to get a sense of what it's like.

And really, to me, it feels, I don't want to reduce it to “better Google” or whatever, but the way that I remember when Google came out, the way that it unlocked me personally to find information that was challenging before. I feel like it's the next step function, or maybe two-step function from that. So I'm really happy with what we have. I think it'll take us 10 years to really make use of it, and I'm excited for what's to come, but very happy and excited even with what we have.

Turner Novak:

The Google analogy reminds me of, back when it came out, for a while if you knew how to use Google, you had an insane competitive advantage as a person, just in the workplace, personal life, whatever. Kind of the same thing with AI. If you just understand how to leverage it, you have this insane advantage against people who don't or aren't.

Ankur Goyal:

I see this happening very, very significantly with developers. The very best developers I know are all-in on Copilot, Cursor, etc. And some of the least technically strong, but smart product-minded developers I know are all in as well.

And then there's a bunch of kind of middle of the pack developers who have resisted and, oh, I'm better, blah, blah, blah. Those people are getting left in the dust, because the super elite people are five times more elite or 10 times more elite than they were before.

And you can hire a bunch of people who have insane product empathy, and maybe they're not the best computer scientists or whatever, but they're able to be very, very productive now.

And so yeah, it's happening especially with developers.

Turner Novak:

Just over the next, then, I don't know, as AI kind of evolves, do you see that statement becoming more and more true? What do you think is going to happen with developer tooling over the next decade?

Ankur Goyal:

I think I'm a little bit biased here. But I think there are basically two kinds of developer workflows that will matter, and they're both giant areas of opportunity and innovation.

One is if I'm building software as we know it today, let's say ordering coffee from a coffee shop type of software. The workflow around creating that kind of software is going to change dramatically. It's already changing. It's significantly easier to create that software for a variety of reasons.

I think Vercel is a huge example of... I mean, they're doing a lot of great AI stuff too, but even if you just subtract the AI stuff, the very, very little amount of friction that's required to spin up an application, host it, have authentication, modern JavaScript, all that stuff, it's just very low compared to what it used to be even five years ago. And then you add AI on top of that if you want to spin up an app or a website to order coffee at your local coffee shop, it's just very, very easy to do that.

And I think what we're going to see a lot of innovation there for simple software, for complex software for getting from zero to one prototypes to software maintenance, code review, helping with testing, helping to understand tests, there's just a ton of opportunity for innovation there.

And again, I think if you use this sort of two roadmap framework, I think there's a variety of companies and products that are sort of making different flavors of bets along the spectrum of those two roadmaps. And it'll be interesting to see what plays out.

And then the other kind of software that is really going to matter, again, this is where I'm biased and I'm really bullish on what we're doing at Braintrust, is the software engineering that goes into building really good AI-powered software. And today, AI-powered software is a relatively small but important initiative at most companies. Most of our customers are forecasting at least 50% of their engineering projects over the next year or two involving AI, and probably 100% in the next five years.

Turner Novak:

So this means they use AI to build the products? Or the product has an AI component?

Ankur Goyal:

Product has an AI component.

And so I think the first category is about using AI to develop software. And the second category is building really good software that uses AI or really good software that has an AI component.

And I think what we're seeing is that the paradigm around building software is totally different. It no longer revolves around code sitting in an IDE. It revolves around data that you get from your users and prompts that not just engineers, but product managers, users, other people are able to contribute to control and affect how the software actually works.

Turner Novak:

So then Braintrust does more than just evals at this point, right?

Ankur Goyal:

For sure.

Turner Novak:

What's kind of just the spectrum of what people are using it for?

Ankur Goyal:

Yeah. So we like to think of BrainTrust as the end-to-end developer platform for people to build the world's best AI products. And I think we're very fortunate at this point to work with literally the companies that are building the world's best AI products. So I feel like, in collaboration with them, with companies like Notion, Instacart, Airtable, Zapier, Coda.

Turner Novak:

I think I saw it on the website. Yeah.

Ankur Goyal:

Replit.

Turner Novak:

Airtable, if you didn't say that one yet. Yeah, it was just logos of your favorite companies, basically, the ones you know really well.

Ankur Goyal:

They're all really great, and many of these companies, the founders are daily active users of Braintrust and just sending us heaps and heaps of feedback all the time. So I think we've been able to productize actually a lot of their workflows around building AI software.

And at a high level, I think basically what we like to think about is in the old world of building software, which is pretty good actually, there's kind of three big components. There's the IDE, the CICD, and observability. And in Braintrust now we actually do all three of those things.

So we have a prompt playground, which allows you to basically create and test prompts, test models side by side. You can run evals. It's a really powerful multiplayer, collaborative, version controlled IDE system that allows you to basically build the prompts that power your application. We see product managers sometimes spend more than six hours a day in the prompt playground working on stuff, and-

Turner Novak:

What do you do in there? Six hours, that’s a lot of time.

Ankur Goyal:

Prompt playground is essentially the place where you're writing the core logic of an application.

And so, let's say you're working on a new product idea, and you have a handful of examples of the product idea. Nowadays, what you can do is maybe take three examples. Let's pick something like creating a shopping cart list automatically based on a user's query. You might have three examples of like, hey, I want new clothes, or I want groceries for my trip this weekend. Maybe take three or four of those, write a few prompts, test out a few models, see does the quality to speed trade off of GPT-3.5 versus GPT-4 versus Claude 3 Haiku? How do those compare today? Because it might be different than yesterday.

Turner Novak:

Because people put out updates.

Ankur Goyal:

Oh, yeah, all the time.

Turner Novak:

One model updates, you want to compare it with every other model again, right?

Ankur Goyal:

For sure, yeah. People recalibrate this stuff almost on a daily basis.

So try out three different prompts at once, tweak the prompts, try to get to a good state. You can save the prompt from the playground, and then with one line of code, use it inside of your application, ship it to some users maybe internally to start. And then auto-generate all the logs from their interactions.

And then sit down as a team an hour later and say, "Hey, what did you think?" "Oh, these five interactions sucked." "Okay, great. Let's open them up in the playground and start tweaking and see if we can get better performance."

Turner Novak:

So you'd be trying different models, you'd be trying different variations of insert whatever, approach to trying to do the grocery specific or the clothing specific example of the shopping cart?

Ankur Goyal:

For sure. Different tool calls. There's so much stuff that you can try.

Turner Novak:

So why is it a big deal? You talked about how there's these three different components, the IDE, CIDC, and the observability. Why is it a big deal to have them all in one place?

Ankur Goyal:

It's really because everything revolves around data, and observability in Braintrust product is kind of like the top of funnel for you to capture really, really good data.

The classic example is you're building an app and you ship an internal version and maybe your CEO tries it out or something and says like, "Hey, this query sucked”. And then they send you a screenshot. What are you supposed to do with that? Most of the time you look at it, you forget about it, or maybe you keep it in the back of your head and vibe check it the next time you change something.

But what people who use Braintrust do, is the CEO sends that, and then within one second they find that that interaction in their logs. They can see every single input, output, every model call. You can click a button and reproduce the model calls directly in the Braintrust UI, fiddle with the models, all of that stuff, and then save it into a data set.

Now that it's in a data set, every time someone runs an eval or every time your PM is playing with prompts in the playground, that piece of data is under consideration for what they're doing. The fact that all of these things are able to share these data sets that you curate and build up in Braintrust is very, very powerful. That's why they have to be in one place.

Turner Novak:

Because it's just easier to check and check the logs, see what's actually going on? Versus you'd have to either be another program for it or something, or...

Ankur Goyal:

Exactly. I mean, in the absence of Braintrust, what someone would do is maybe go to one logging system and then look at all the logs. And then literally in the UI, one of our customers used a big data warehouse, a very popular data warehouse product to do this stuff before they use Braintrust.

Turner Novak:

It's probably super expensive, I would assume.

Ankur Goyal:

It's super expensive, and they would handwrite an SQL query. And then in the UI of the data warehouse product, which is just for testing, basically, copy out the JSON and then write a script that transformed the JSON into a simpler format. And then copy paste that into a file that they put in their Git repo.

There's so much friction there that if someone complains, the chances that you're going to run all of those steps and run them the same time without errors every time, it's just not going to happen. It's like an extreme version of, I used to use spreadsheets, now I use a really good product.

Turner Novak:

So using multiple models, why is that important? If somebody, they always just use one, why is it a big deal to use more than one or many?

Ankur Goyal:

Most of the companies that we work with are bought into one model.

If I zoom in further and make a less extreme statement, I would say most of the projects that people work on revolve around one model. And I would maybe offer a very, very technical analogy that I think about a lot, which is, are models more like CPUs or are they more like relational databases?

And by that I mean the US has hundreds of different instance types. All of them have different CPUs, some of them have RM CPUs, some of them have Intel CPUs, there's all this variety. But you can take pretty much any piece of software and run it on pretty much any EC2 instance type, and you get different performance or different results, maybe one out of one billion times there's some kind of error or something. It's not common, but it's a very transferable thing to go from one CPU to another CPU.

On the other hand, you have relational databases. Every database speaks SQL, but if you try to take an app that's built for one database and move it to another database, it requires an insane amount of engineering work to re-encode the quirks of one SQL implementation to another. If you can tell, I've worked on databases for a long time, so a lot of stuff I think about is in terms of databases.

And it seems to me like models, everyone thought they would be more like CPUs at first. You can transfer stuff from one model to another, but they're actually turning out to be more like databases. They all ostensibly speak English, but the dialect of English or the accent of English that Claude speaks is different than OpenAI.

And so what we see is in the beginning of a project, people tend to evaluate a bunch of different models and sort of recalibrate what trade-offs today are best suited to the project that they're working on. But once they settle into a particular model or model architecture, they just sort of try to make the best of it over time, and that tends to yield better fruit than reevaluating your model choices every two days.

Turner Novak:

So someone might be using multiple models at their company or for their job of each individual project or software or whatever, typically will just run on one model.

Ankur Goyal:

Yeah. And of course, there are exceptions.

We see some companies, for example, a somewhat popular thing to do is to have one model that is sitting at the front line of chat and it can be much simpler and do some basic English, polite English, whatever, back and forth. Maybe it's even fine-tuned to some particular domain.

But then have more use case specific models that implement more complex workflows that you drop off to. That's not super common actually, but more than zero people are doing that.

Turner Novak:

Okay, so there's at least one.

So how do you pick the right model? There's a bunch, I think Google just released a bunch of new ones last week. Every time Facebook updates the model, there's a bunch of... There's so many options. How do you pick what you should use?

Ankur Goyal:

A few things that I would think may contribute to this because there's a lot online about blah, blah, blah, along context windows, whatever. So a few things.

One, I think open source is very accessible in certain ways. But it's actually very, very hard to use in other ways. One of them is actually inference capacity.

So people tend to really applaud OpenAI for the quality of their models, and they tend to forget that OpenAI actually also, and Azure have, by far, the best inference service. They have the highest rate limits. So if you're starting a new app, you're unlikely to get blocked as you scale. They have really good availability, a very easy to use API. And so one thing I would think a lot about actually is inference capacity and ease of use and ease to deploy.

And there's a number of companies that actually are aiming to make that easier for open source models and so on. But I wouldn't assume that with an open source model that you actually have the inference capacity that you need to be able to build a good application.

Another thing that's really important, which probably not the most fun thing to hear, is that there are commercial and structural reasons that you might choose one vendor over another. A common version of this is, I trust Microsoft, so let me use Azure. Another example of this is I have an enormous enterprise deal with AWS. I might even have a most favored nation clause, which says if AWS offers something, I have to use it.

Turner Novak:

Interesting. I didn't know that was a thing. That's crazy.

Ankur Goyal:

It can be, yeah. And so in that case, you should use Bedrock because practically speaking, you can do all the things that you need to with the variety of models that Bedrock offers. I think Bedrock is really hard to use. However, we've actually solved that problem with Braintrust with our proxy. So you can actually communicate with Bedrock as easily as you would use an OpenAI model, but in that case, you should use Bedrock.

And I think there's a similar set of reasons structurally that it might make sense for you to use Google's models. And then of course, there are technical merits. You might have a use case for which it makes sense to use a million plus tokens context window. You might have a use case where you need extremely low latency, and what GPT-4.0 or Claude 3 Haiku offer is really, really appealing to you.

And so I think that there's also technical considerations, although at this point I personally think you can achieve the technical things that you need to with more than one option. And so the structural stuff ends up being more important.

Turner Novak:

Interesting. So if I'm somebody listening to this, where I feel like I can't keep up with AI because things are changing so fast, what would you recommend? Or maybe, what's your process for just staying on top of everything that's going on?

Ankur Goyal:

My recommendation might be a little bit different from my personal process, so I'll give both.

My personal recommendation is to pick one major vendor and become a fanboy and it's okay. Or a fan girl, it's okay. I know and think extremely highly of the people at OpenAI, Azure, AWS, Anthropic, Llama, they're actually all wonderful, and they're competing. It's great for the industry and you can build really great stuff with any of these technologies.

I meet a lot of people who waste time debating the merits of one or the other. And I meet other people who instead are like, "Okay, I'm using OpenAI or I'm using Llama, I'm going to make the best of it."

And you can be fairly certain that if there's an innovation with one model, the other models will pick it up. So just stay focused, pick one path and try to actually build a good product rather than spend a lot of time trying to keep up with what other model providers are doing.

In our case, I think we are very, very specific with trying to work with companies that are innovating and building extremely good relationships with the engineers and the executives who are driving that innovation. So my news diet is firsthand anecdotes from people at, the folks that we've talked about a few times who are customers just asking questions or sharing observations, and that's been really great.

Turner Novak:

What's the most interesting thing that you're hearing or seeing from them? Maybe you've already hit on it or maybe something else. Anything interesting anecdotes that you're seeing from all those conversations?

Ankur Goyal:

Probably the number one thing that I would say is everyone uses function calling, or tool calling, which is probably not the most popular feature of models.

If I asked you, "Hey, tell me the difference between this model or that model," whatever. I'm not sure whether you'd say tool calling is a thing. But if you ask people like, "Hey, you're using Open AI. What would you need to be really, really good for you to be able to use Google's models or to use Llama 3 or whatever?" The number one answer among people that have stuff that has product market fit in production is tool calling.

And the reason is that not because they care about calling tools, it's that tool calling allows you as a programmer to say, "I want the model to spit out data that matches this exact format." And the reason that's important is if the data fits that exact format, you can use it in your code. If it doesn't meet that exact format, you're basically the only types of products you can build are chat products, and that's it.

I can't stress enough how important tool calling is. And how I think a really, really significant majority of all use cases are going to be on OpenAI models, until or unless other folks actually take tool calling seriously.

Turner Novak:

So that's why when you, I'm thinking examples like Zapier had this most recent launch was a Chrome extension that basically can do things while you use the internet. Essentially it's kind of like this agent almost that lives in your browser. That's obviously not a chat, so that's one of your customers. That would be an example of tool calling?

Ankur Goyal:

100%. And that's even a more obvious example, I would say if you're a company that's building document related software, you might expect everything to just be coming out as plain text. But the reality is it's all actually coming out as data.

And so even the companies that are building free-form question answering stuff that's interesting, they're all structuring the outputs in terms of tool calling. I mean, even with Braintrust, our eval stuff, it all uses tool calling. So we have a scoring function, for example, called Factuality, which tells you whether the answer that a model spit out is factually correct with respect to a reference answer.

And that thing uses tool calling to help the model organize its thoughts into a few different categories. And then just as importantly, we're able to surface that data back to our users who are developers so they can make sense of maybe why the answer was incorrect or why the model got lost. So it's just critically important.

Turner Novak:

And then if I'm non-technical, I've never heard this word tool calling before. Can you quick just explain what that is? I want to make sure everyone's on the same page.

Ankur Goyal:

Yeah, for sure. Tool calling is basically an option that you can provide to a model or a configuration parameter that you can provide to a model that says, "I am going to give you a prompt and I want you to spit out your answer, but I want it to match this exact JSON format."

Turner Novak:

That can be used elsewhere.

Ankur Goyal:

Exactly.

Turner Novak:

Okay. Yeah, makes sense.

Ankur Goyal:

It's basically the equivalent of an API spec, but for a model.

Turner Novak:

So tool calling is like API's for LLMs essentially.

Ankur Goyal:

Exactly. Exactly.

Turner Novak:

Yeah, that's a good way to think about it.

One other question, just in terms of, I've kind of seen there's a little bit of this debate of LLMs are going to replace software engineers. I don't know how true that is, but what would you say to somebody who maybe is worried about that? Maybe they're a young person, they're in college, they're majoring in computer science. They want to be a software engineer, and they're like, I'm not going to get a job. How should I think about that?

Ankur Goyal:

Yeah, I think the near term advice I would give someone is, times may be more difficult than they were before. You can't be a shitty software engineer and get a job as easily as you could in 2021.

So if you want to be a software engineer, take it seriously. Actually, study computer science, learn computer science. Take the compiler's class that everyone fails. Take the operating systems class, but push yourself because it's not going to be easy. You can't do the things that LLMs can do now and expect to just ride the wave.

And I think the very best software engineers are insanely more capable than the mediocre software engineers. And currently the mediocre software engineers are very much more capable than the pure AI engineers, but I'm not so sure that that's going to be the case for too long. So that's the near term advice.

I think for the long-term advice. I think about this every day, because I love computer science, and I don't personally love pottery or other artisan crafts. But it may go that way, in that you’ve got to think about the two roadmap things. In the AGI roadmap, what is your personal career roadmap? Probably should build some other skills, whether it's talking to customers, designing things, recruiting, maybe take a plumbing class because that job is probably not going to be displaced for a long time.

Turner Novak:

You can easily make six figures doing that stuff.

Ankur Goyal:

Yeah. We're actually thinking about adding it as a company perk.

Turner Novak:

What?!?

Ankur Goyal:

Yeah. So, many companies have a voluntary educational perk. We're thinking about having a plumbing class perk. Seriously. Like a vocational perk.

Turner Novak:

To learn how to be a plumber?

Ankur Goyal:

Yes. Or another vocation that of your choice.

Turner Novak:

Interesting. Is this kind of as a tongue in cheek perk? Or is it very serious? Like, “you should probably learn how to be an electrician”?

Ankur Goyal:

It's one of those things that at this point that feels like a joke that we won't be laughing about in the future. But it is our way of pushing ourselves to think about the two roadmap sort of thing for the future.

Turner Novak:

And you love programming. Talking to people, I would put you in the top 1% or 0.1%, very top whatever percentage.

How did you get into it? When did you first start programming?

Ankur Goyal:

Yeah. So I have a friend in high school. I wanted to be a doctor my entire life because my parents are doctors. They're the first people in their family who really did higher education. It's a really big deal to them. So I was influenced by that.

Turner Novak:

Did you grow up in the US?

Ankur Goyal:

I did, yeah.

Turner Novak:

Did they move from somewhere or did they also grow up here?

Ankur Goyal:

My parents grew up in India around New Delhi, and then they moved to Scotland. I was actually born in Scotland, and then we moved to the US when I was really young.

Two things happened my senior year. One, I took biology, and I really did not get along with my AP bio teacher.

Turner Novak:

This is high school?

Ankur Goyal:

This is high school. Bio, if taught incorrectly, is all about memorization. I'm not a memorization person. And so I mentioned that because I think one teacher can have a huge impact on someone's life in either direction.

On the flip side, I took linear algebra classes. I grew up in Pittsburgh, at the University of Pittsburgh because I ran out of math classes. And I learned how to program to help with the linear algebra assignments.

And then I had a friend who told me he read a book, which taught him how to do symbolic derivatives in chapter three. And that blew my mind because I was doing a little bit of programming with this linear algebra stuff, but the thought of writing a program that could symbolically manipulate an equation that just, I was like, oh my God, that can't be possible.

And I understand probably for 99.9% of listeners, they're like, what is wrong with you? But that motivated me like you could never believe. And so I read the book, which is called SICP. It's actually a famous book that MIT used to use to teach people how to program. And I read through chapter three, I wrote that program about how to write symbolic derivatives. And I literally knew what I wanted to do for the rest of my life.

Turner Novak:

Was do algebra and derivatives? Or program?

Ankur Goyal:

Write programs that powerful.

Turner Novak:

Okay. And then did you go to college for computer science?

Ankur Goyal:

I did. I went to CMU and studied computer science.

Turner Novak:

And you dropped out?

Ankur Goyal:

I did, yeah.

Turner Novak:

Why'd you drop out? Or what's the story there?

Ankur Goyal:

Again, influenced by my parents, I had this narrative for my career, which was either work at a big company or do a PhD. And I interned at Microsoft after my sophomore year of college, and I hated it. And I realized that it was kind of like pick two of three.

The three things are impact, creativity, and sleep. And at Microsoft, every character of code that I typed was actually pretty significant. I was working on Bing on the distributed compute infrastructure back then.

Turner Novak:

As an intern?

Ankur Goyal:

Yes. And the characters of code that I typed were processing billions and billions of rows of data from Twitter and all this other stuff to power search ranking. It was pretty cool. But I worked for four hours a day, and I was constrained from working more because interns weren't allowed to run batch compute jobs after 5:00 PM.

And so I was just really, I felt very constrained. And so no problem, I'm constrained in this way, so let me do research. And so I started doing research and started doing that through to the next summer. And there I realized I picked creativity and sleep, but there's no impact.

The research we were doing was really fun and cool, but I remember Facebook released a paper about... I was working on low power database stuff. They released a paper about how they were doing low power databases at scale. And I was like, okay, the stuff I'm doing is cool, but they're actually running this stuff that's so much cooler.

And so I had this kind of existential crisis that summer, and I moved out to San Francisco and interned at a startup, which didn't end up working out. And I sort of found my way to the MemSQL founders from there. But there I realized, wow, okay, the startup thing is cool. I can choose not to sleep and have a lot of impact and creativity.

Turner Novak:

So the three elements of your life, that's the one where you kind of cut it out a little bit?

Ankur Goyal:

Yeah.

Turner Novak:

How much sleep would you say you get at night? Are you comfortable sharing?

Ankur Goyal:

I think my target is six hours and 20 minutes.

Turner Novak:

What's so important with the 20 minutes?

Ankur Goyal:

I've just measured it over time, and if I sleep for exactly six hours and 20 minutes, the next day is optimal in terms of my awake-ness. If I sleep more than seven hours, I get groggy. And if I sleep fewer than six hours for two to three days in a row, I get fatigued.

Turner Novak:

Yeah, that makes sense.

I'm the same way if I get less than six hours for two days in a row. And then generally, I feel like I need either 7:00 or 8:30. You know how we sleep in 90-minute REM cycles? If I wake up mid-REM cycle, I'm screwed for the rest of the day. I'll drink three cups of coffee and still feel out of it at like five o'clock at night. So yeah, it's important.

So you've chosen the optimal six hours, 20 minutes of sleep. That's actually still a decent amount.

Ankur Goyal:

It is, yeah. Yeah.

Turner Novak:

I feel like you got two-and-a-half of the three, I feel like. Some people-

Ankur Goyal:

Maybe. Yeah.

Turner Novak:

And then when you were at MemSQL, it was acquired by SingleStore? Was that the name? Was it acquired, or was it the same product?

Ankur Goyal:

No, they renamed the company from MemSQL to SingleStore.

Turner Novak:

Oh, that's what it was. And then your wife Alana said, "This is where you learned to be a grownup basically." Because I guess you were still technically college age. Can you just talk about that? She said this is something we got to hit on.

Ankur Goyal:

So first, I would say Alana definitely is the one who helped me be a grown-up. So I'm not so sure I even was... Maybe it was enough of a grown-up to interest her when we met. But one thing I often think about, I remember it was June of 2013, I was 22, and we were about to close our first big deal, and we just hired a salesperson.

Turner Novak:

This was at SingleStore?

Ankur Goyal:

At MemSQL, SingleStore. Yeah. We just hired a salesperson and he was calling the customer and they were like, "We don't want to talk to a salesperson." So they sent me out to New York and imagine a 22-year-old wearing a suit that were like the sleeves are longer than longer than-

Turner Novak:

Like the NBA draft pictures they're all wearing these long suits. Have you seen those?

Ankur Goyal:

Exactly, yes. And it's like 95 degrees in New York. I went on a red eye because I didn't know what I was doing, traveling. And I showed up and spent all day debugging their database stuff with them, and they ended up being our first big customer, which was awesome, but-

Turner Novak:

Nice.

Ankur Goyal:

... I just had all these formative experiences, really. Eric, who was the CEO, Nikita, the CTO, they really gave me, I would say, an unfair amount of autonomy relative to my age and experience. It was an incredible opportunity.

And so I got to meet and work with a bunch of customers, many of whom I'm still in touch with today, and use Braintrust. I got to hire a team. I learned how to manage people. I learned how to hire people, fire people. I met a bunch of VCs, many of whom I'm still in touch with today.

And so it was just an incredible foundational experience. I think anyone who's lucky enough to work with founders early in their career who give them just this insane autonomy to do things that they've never done before, you should just jump on that opportunity.

Turner Novak:

If I'm 22, I'm looking for my first start up job, maybe I just graduated, or maybe I'm thinking about dropping out or wherever I'm at, and I'm trying to find that opportunity, what should I look for in terms of the founders that I'm working with?

Ankur Goyal:

I would just be super upfront about the things that you want to do, and it will be very clear. So I remember most of the companies I talked to, I said, "I want to start a company someday, and I want to talk to customers, and I'm willing to work as hard as humanly possible." And some of the companies I talked to said, "Oh, no, you shouldn't do that to yourself," blah, blah, blah.

Turner Novak:

It seems like it's probably not a fit for...

Ankur Goyal:

Exactly. Yeah. Some companies I talked to said, "Oh yes, as an engineer, our product managers," blah, blah, blah, blah, blah, blah, blah, "and you can measure your impact," and blah, blah, blah, blah, blah. I remember when I talked to Eric and Nikita, they said, "Dude, there is so much shit to do. If you want to do something, just do it."

Turner Novak:

Pretty obvious. Well, what if I might be scared of saying to a founder, "Hey, I want to start my own company one day"? That might indicate I'm going to leave in two or three years. So you're saying that that's actually okay to go say-

Ankur Goyal:

If you're thinking that, you should say it. I mean, most good founders, they want to be part of that.

Turner Novak:

And they want to hire other autonomous people.

Ankur Goyal:

They want to hire autonomous people. They want to invest in your company, and when you start it, they want to help you start it. They want to collaborate with you after you start it. I'm still extremely good friends with both of those, with both Eric and Nikita.

I ended up staying at MemSQL for five-and-a-half years. Every six months we had a discussion about my career, and is this the right time for me to start a company or whatever? And maybe if I wasn't upfront with them, I wouldn't have had the opportunity to be the VP of engineering.

I remember I had a dinner with Nikita at some point, and I was like, "Look, I feel like I've built..." - by the way, I was totally wrong about how much I thought I had learned and stuff - but I was like, "I feel like I've learned all this stuff. I'm not sure how much there is left for me to learn," blah, blah, blah. That's just totally wrong.

"But for me to continue doing this. I want to understand what's the next step in my career." And Nikita, I remember he was like, "Well, there is an opportunity to lead the whole engineering team and it's not going to be easy, but over the next six months we can work together and try to set you up for that. And if it works, it works. And if it doesn't work, and I need you to commit, if it works to staying for at least two years. And if it works, that's great. If it doesn't work, you can do something else. No problem."

And it worked. And then we had the same discussion a couple years later, and I don't think anyone wanted me to leave. And there was a lot of discussion obviously around that. But the conclusion from the discussion is, hey, it's been five-and-a-half years, maybe it's time to do my own thing. And it was totally fine.

Turner Novak:

And this is what Saam, is he on your board or one of your investors?

Ankur Goyal:

He led our seed round.

Turner Novak:

At Greylock. And he said this was before Braintrust. This was Impira.

Ankur Goyal:

Yeah, yeah, yeah.

Turner Novak:

He said you made the very first version of the product you hacked on a flight, basically. What's the story there?

Ankur Goyal:

I was starting to think about Impira, which was basically, we basically helped companies use stone age nowadays AI to understand unstructured data.

Turner Novak:

This was like NLP, natural language processing?

Ankur Goyal:

Back in that day, the only thing that worked was computer vision. So we were at first, I mean, I was just super fascinated by the idea of AI and ML changing how compute and data talk to each other. Because I'd been working on that with SQL for a long time at MemSQL.

And I very, very prematurely saw that AI was going to change all of this. And so at that I was like, how do I just get into AI given the skills and whatever I have?

So I thought, okay, let's try to help people understand photos. And then eventually Impira started helping people understand documents, and that's where we started to get commercial traction.

But yeah, this flight, basically I'm the kind of person who I see a moment of inspiration and then it catalyzes me and I go. So I was on a trip with a really good friend of mine to Nairobi. And we spent a few days helping kind of the Amazon of Kenya there with doing some SQL stuff to analyze their supply chain and so on. And then we spent a few days on Safari, and I had just started getting into photography then too. So I took a bunch of pictures.

And we were flying back and it was like, okay, how do I find all the pictures of lions. Those are the coolest pictures. Or the pictures of elephants. It was really hard. And so, on the flight from Nairobi to Munich, I built the first version of Impira that basically solved that problem.

Turner Novak:

Nice. And then what did you learn from Impira? I think this is a question from Alana.

Ankur Goyal:

So there's an infinite number of things that you just sort of learn with life. And maybe the best reflection of that is gray hair.

But I think some of the key things that I learned, one of them is to sort of embrace who you are and not try to change who you are to match a company. I think especially when you work in another company for a while and you have some success doing that, you really learn how to adapt who you are and how you function to the sort of vibe and nature that the company operates with.

But when you're starting a company, you actually need to do the opposite. And it's really, really, really hard to do that. You can't always meet people where they are, which is what most early founders attempt to do.

Turner Novak:

So you're saying, if you have a certain way of working, you don't bend to mold into maybe what the blog posts say you should do?

Ankur Goyal:

Yes. I did that at Impira. We're not doing that at Braintrust.

But that sort of leads me to my second point, which is I had this really interesting career arc, where very early on I became a VP of engineering and then a CEO. And if you just Google those titles and read blog posts from the best who've ever done it, blah, blah, blah, those people shouldn't be coding. They should be delegating, hiring, solving escalations, whatever, not in the weeds of a product.

And I made the mistake early on at Impira of sort of continuing on that journey, and it was actually the wrong thing to do for a couple different reasons. The first is that the only reason I had the opportunity or deserve the right or however you want to frame it, to work on Impira is that I'm a pretty good programmer.

Turner Novak:

And then the blog post would've said, "Stop programming."

Ankur Goyal:

Yes. You're a CEO now. Many people, our VCs, etc were telling me like, "Hey, you're an engineering guy and blah, blah, blah, but you shouldn't be writing code, you should be doing other stuff." So I listened.

And this will be the third piece of advice. And so, I over delegated a number of things around the product and engineering stuff. And I think that was the wrong thing to do. We didn't end up building a product at first that was really true to me and how I wanted to create a product. And that just made things really difficult.

Then the second part of that is the stage of Impira was very different than the stage I was operating at MemSQL by the time that I left. And I think you have to adjust your leadership style to be stage appropriate.

And at time T = 0, when you're starting a company, you don't need to be delegating anything. You should be actually doing everything so that you understand intuitively and deeply what the product is, what the challenges and the software are, whatever, if you have the skill set. And then as the company scales, of course you change what you're doing.

So yeah, I think that was probably my biggest learning. I couldn't be more in the weeds at Braintrust, especially compared to that.

And then I think the fourth thing is I really enjoy and feel positively about all the VC working relationships that I've had. They've been super supportive through a number of things, so I don't mean to undermine that by any means. But I think listening to VC's advice is actually something that you should be very careful about doing. And honestly, most VC's, when they're giving advice, especially the good ones, they try to caveat that you should be careful not to necessarily listen to their advice. However, as a first time founder, you don't have a boss for the first time, and it's very disorienting.

Turner Novak:

You can almost think the VCs are your boss, almost.

Ankur Goyal:

Exactly. Almost. Yeah. So until you have a discussion with yourself or your therapist or whoever, and you really unwind this and sort of deeply come to terms with the fact that the VC is not your boss, you tend to have that perspective, and they're not. And you have to learn not to listen to what VCs are telling you to do.

VCs, when we started Impira, they said, "You are an engineering person. You're not a product person, so we need to find a really good product person." We ended up hiring someone really great, but it took a year and a half and we ran retained searches and all this stuff that occupied a ton of time searching for this person who was going to find us product market fit instead of just working on the freaking product myself and doing it.

And so yeah, I think you have to really understand innately what kind of company, product, etc, you want to build and accept VC advice almost like customer feedback, and prioritize it, triage it, politely reply, but not always listen to it, etc. Or maybe not. Whatever your style is.

Turner Novak:

Yeah. So it's almost like when you're thinking about delegation, you delegate the things that you're specifically not good at, not necessarily the things that everyone else delegates. Is that a good way to think about it?

Ankur Goyal:

Yeah. I mean, I think there's phases.

So at Braintrust we have this amazing person who joined, I think he was employee number four, including me, named Albert. Albert previously worked in venture and then before that worked in investment banking.

And I think compared to other people in his generation, probably because of his upbringing, investment banking and so on, this guy grinds like crazy. He works super hard and he works super hard without strings attached. He is very philosophically aligned with me on that respect, and Albert does pretty much everything that I'm really bad at and don't find the energy to do. And that's one form of delegation.

Actually, Elad really helped me think about that early on and why it was important to find someone who I could partner effectively with, who would do a number of the things that are really, really important for a company. Make sure that we publish a blog post every week, but if left to my own devices, I'm just not going to find the energy to do that. And so that's one form of delegation that I think is really powerful.

The other form is the more nuanced type, like an engineering for example. I think trying to find engineers, designers, product managers, etc, who have really specific skill sets and are exceptionally good at things that maybe they're as good as you are, maybe they're a little bit worse. In most cases, they're actually much better than you are at any individual thing. And then figuring out how to enable them and equip them and delegate to them effectively is really important.

So with a number of our engineers, my job shifts from understanding customer context and writing the code to trying to be the frontline customer support and pull all the customer context into one place that a really good engineer can look at and then figure out what to build, why to build it, whether to build it, and then actually build it.

Turner Novak:

You were telling me also you don't really do a lot of meetings. Can you kind of talk about how you structure your day that's maybe counterintuitive from the blog posts?

Ankur Goyal:

I hate meetings. So at Braintrust, we have one company meeting per week.

Turner Novak:

Okay. Wow! Per week. I thought you might say day.

Ankur Goyal:

No, no, no. It's one per week. It's scheduled for 30 minutes, and the longest it's ever taken is 15 minutes.

We're actually going to make a change to the meeting starting next week, that adds, because now we have seven engineers on the team, so we're actually going to add a new component to the meeting where everyone does a demo. So I think it'll actually take the full 30 minutes. We've timed the demo slots so that it will, I'll report back next week, but it's only taken 15 minutes.

And I think we give everyone on the team, and we're very small, so I can say this, and folks at larger companies will chuckle and they should, but we give people an insane amount of autonomy. And I'm actually the type of person who believes it's better to give people a lot of autonomy and for them to mess up occasionally or be slow or unproductive for a week while they're wrapping their head around a problem than to micromanage people.

And there's this huge temptation to see someone take longer than they should be taking to work on something and start to add process to solve that. But actually, if you hire the right people, they know that too. And they might be working through their own journey of why something has taken them longer than it should, and then the next time they work on something, it won't. And I think when you give people that trust and space, they're just way more effective. So that's one part of it.

The other thing is that we're just able to do a lot more per IC at Braintrust than we would be if we had a lot of meetings. So I think every IC, including myself, is probably two or three times more productive without the interruptions of meetings all the time. And so that's another big part of it as well.

On the personal side, I just don't like meetings.

Turner Novak:

Yeah, that's totally fair. I definitely tilt more towards that direction. I guess, I have to take them, because I am a VC. But it's kind of controversial - I'll talk to people and I'll be like, "I really only meet one new founder a day." And some people are like, "I'm meeting 5, 6, 7, 8 per day." And I'm just like, "That sounds crazy." That's almost overkill. I don't know.

Ankur Goyal:

I can relate to that. Early on at Impira, I thought that was my job. I need to fill out my day with meetings with customers and candidates. So I would literally reach out to people that were not great customers and meet with them. I thought I needed to meet a certain number of people per day.

Turner Novak:

That seems like, on paper, the right move.

Ankur Goyal:

And then guess what? You do this, you practice a few times and you realize how to be persuasive to these people. And then you sign them up as customers and then they're the wrong customers for your product, and then you're screwed.

Or you meet a bunch of candidates and they don't meet the bar inherently that you need for your company, but you talk to all these people and any other human, you start empathizing with them and you think, oh, this type of person, whatever, they're not so bad. Maybe we should hire them. So I think it's actually very, very dangerous to do that.

Turner Novak:

So with customer stuff, how do you determine someone is the right or wrong customer? Are there things to look for specifically maybe Braintrust, if that's easiest to give examples.

Ankur Goyal:

When we started thinking about Braintrust, Elad and I wrote down a list of 50 companies that we wanted to talk to. To initially get feedback, and then if the feedback was good, get them to either use the product or invest in the company.

I think another area that we've been really successful is actually building a coalition of really good investors. Everyone from Greg from OpenAI, Olivier from Datadog, folks that are really in observability, AI, etc, and really know what they're talking about. Many of our customers - the Zapier founders, Howie from Airtable, etc, Simon from Notion - they're all founders or they're all investors in Braintrust as well. And so we were just very deliberate about the companies that were ahead in AI in technology and we wanted to work with them.

And then the other thing that we do that sort of naturally helps select the right people into the product, you have to be careful about this, is to have a very self-service product. So this kind of happened by accident.

If people from Zapier hear this, I'm sure they'll chuckle, but Zapier actually also has very few meetings. And so they were our first user. And when we started working with them, I tried to get a bunch of them to talk to me to give me feedback about the product.

And Brian, who's the CTO, was like, "This is just not going to work." I was like, "Hey, can you please convince some of the engineers to talk to me?" And he said, "No." And said, "You need to make a video that pitches the dev workflow that they can have with Braintrust and then maybe they'll give you some feedback."

And I made 30 versions of that video, and then Brian kept giving me feedback and finally it was good enough. So I sent it to the team, and then a few days later, they started using the product.

And by the way, creating the video actually helped create the product. So that was a pretty useful exercise, but we basically had to be self-service from the beginning because of who we were working with.

And now that helps, some people get our product, and for some people it's still really difficult or challenging to wrap their head around or they need more tutorial content, whatever. And we are always adding more content, we're always making the product easier to use, but in some ways the people that use the product today have self-selected because they sort of get what we're all about.

Turner Novak:

So you said something really interesting earlier. You said you and Elad sat down and wrote what customers do you want. How did Braintrust start, that first day? What's the story there?

Ankur Goyal:

Elad and I have been pretty close for a while. He was one of the first investors in Impira.. And then when Alana started her fund, he was the first investor and anchored her first fund.

And Impira had its ups and downs. One thing I'll say is when we were working through the acquisition, which was not a straightforward process, Elad who was one of the smallest dollar and equity investors in Impira, which is the furthest from the case in Braintrust, was one of the most helpful, probably the most helpful person through the acquisition process.

I mean, just like I mentioned the list of 50 companies, when we were working on the acquisition, he helped me come up with a list of companies to work through the acquisition with. And I talked to him daily between for probably three months while we were working through the acquisition.

I think you all know how busy he is. It is crazy. And so I think super highly of him and I enjoy every interaction I have the opportunity to have with him.

And so when I was at Figma, we would continue chatting about various startups and "Oh, this company is working on database stuff, what do you think?" And so on. At one point we were talking about AI as it started picking up, and he was like, "Hey, what kind of stuff have you guys been building?" And so on.

And I pointed out that at both Figma and Impira evals were a big problem and we had to build our own solution to it, and that we had tried using other kinds of products that were built for ML research and data warehouses and stuff, and it never really worked. And so we had to build our own tooling.

And I was kind of joking about it, like, "Yeah, this is annoying." And Elad picked up on that and was like, "Interesting. You've built the same thing twice." And a year ago that maybe didn't matter, but now everyone is trying to figure out how to build with AI, and so probably everyone's going to have this problem.

And so we talked about potentially incubating a company that solved for it. And then we wrote down the list of 50 companies. We talked to a bunch of them and Zapier, and then Coda, and a couple of others. Airtable started using the product, and they started sending so much feedback that I looked at that and thought like, wow, okay. There's something very real here.

Turner Novak:

Yeah, they obviously care if they're giving you so much feedback.

Ankur Goyal:

Exactly. And that's kind of how Braintrust got started.

Turner Novak:

How did you choose the list of companies, like design partners? You might've already mentioned, and maybe a more broader question, but how should you think about design partners as founder starting a company?

Ankur Goyal:

Yeah, I think that the big question. And I probably should have mentioned this when you asked about learnings from Impira.

The big question you need to ask yourself is what customer or type of customer do you want to slave the next 10 years of your life away for? The relationship that you have with your customers, it's not slavery, but it is.

Turner Novak:

You're serving them.

Ankur Goyal:

Yeah, it is servitude for sure. And you have to enjoy it. You have to understand them. You have to care about the same things that they care about. Because if you don't, and they care about something and they can't communicate it to you in a way that deeply resonates with you, then you're not going to build the right product for them.

And so I loved the customers that we worked with at Impira, and I especially loved the developers at those customers that we worked with at Impira, but I realized over time that I wasn't sort of naturally relating to them and I wasn't empathizing with their needs the way that I did when I was working with developers at MemSQL.

The first thing I would say is you just have to be really deliberate about what type of customer you want to work for. At some point, I was talking about Braintrust to one of my friends who's an advisor, and he was like, "Yeah, blah, blah, blah. It sounds really interesting, but developers are the worst market to sell to." And there's a lot of truth in that. I think grass is greener for everything.

But my thinking was, yes, that's true. There's a number of challenges. But there's just nothing else I can do. And so I'm going to do it.

Turner Novak:

Sometimes that's the best route.

Ankur Goyal:

And then once we sort of determine that, I think what we realized is that, and every market is different. If you're disrupting a legacy industry, I think your calculus might be a little bit different. But what we realized is that what the Zapier’s and Airtable’s and Notion’s of the world are doing today, other companies are going to be doing in six months. And so, how do we work with the companies that are doing today, what people later on are going to be doing?

And to solve for that, it was actually quite simple. We looked at all the companies in the tech ecosystem that had already shipped AI products, and there were very few of them at that point. And so we added them to the list. And then we added people that we just kind of knew them, and knew that they were cooking up some interesting stuff. We added them to the list and there you go.

Turner Novak:

Straightforward. But also a lot of thought that went into it.

Ankur Goyal:

Yeah, I think this is the kind of thing that sounds very straightforward and obvious, but very few people do.

Turner Novak:

So then in terms of, you talked a little bit about just intentionality of hiring. What kind of hiring are you doing right now and who do you want to bring on board?

Ankur Goyal:

So our primary focus is hiring engineers at the moment. We're also actually starting to build out the go-to-market team

So we're looking for a DevRel, our first DevRel. And we are also starting to look for more folks on the sales side. Those are the primary areas that we're hiring.

Braintrust is in-person, in San Francisco. We are very low on meetings and structure. You have to really love extreme autonomy and ambiguity. And I would say you have to be motivated by that kind of environment.

I will say that compared to other companies I've worked on, the product market fit that we're experiencing is pretty insane. So I don't want to over promise, but you will have the opportunity to work with some of the best companies in the world. And the commercial traction that we're seeing is, it's pretty unbelievable. It's really awesome. Maybe one thing not to be worried about.

Turner Novak:

People actually use the product.

Ankur Goyal:

Yeah, yeah. We do this really fun work trial thing actually for engineers who are willing to do it. And it obviously is a great way for us to figure out fit, but it actually ends up being a great recruiting tool as well. Because you might work on something that an engineer at one of these great companies is asking for, and then you get to ship it during your work trial.

Because we ship the product more than 10 times a day, and then you get feedback from them on Slack, they’re very excited, and then maybe a screenshot about how they used it. And as an engineer, you don't get that opportunity that often. And getting to do that multiple times in a week when you're actually doing a work trial with us, it sort of makes it like, wow, why wouldn't I want to work here?

Turner Novak:

Why would I go back to whatever I was at before?

Ankur Goyal:

Exactly.

Turner Novak:

So then that work trial, how do you set that up if somebody else is like, "Oh, this is an interesting concept. I also want to do work trials." How would you recommend doing work trials?

Ankur Goyal:

We've been very inspired by Linear who wrote a guide about how they do it.

Turner Novak:

We'll put it in the show notes for people.

Ankur Goyal:

Awesome. I would say the biggest surprise to me about the work trial is you have to separate how much you like the person from how productive they actually are.

And so what we do is after a week, we run two mental thought exercises. One is during this week was the person less than zero equal to zero or greater than zero productivity? It's actually very hard to be greater than zero productivity within a week.

Turner Novak:

Or you're saying in terms of what they contributed to Braintrust?

Ankur Goyal:

Correct, yeah.

Turner Novak:

So did they add something?

Ankur Goyal:

Did they add something? Were they neutral, or were they negative? Most people end up being neutral. A few negative, but mostly neutral. And very few people are positive.

I think we've hired maybe around 20% of the folks that have done work trials with us full-time, we've extended an offer to. So it's not a perfect process and I'm sure we can do better, and I think we're a little bit better now than we were before in terms of the qualification before the work trial. But that's okay. It's actually been a very net positive experience for us.

One is really to ask that question. If you believe the answer is greater than zero, debate it a little bit. Maybe have someone on the team who is a little bit more skeptical go and work through that puzzle.

And then the other thing that we do is the product market fit test. If you took this person away, how disappointed would you be? And because the work trials are often a week or many of the folks that do work trials with us, they're in this sort of transition period between jobs, so they end up traveling. I think almost every work trial someone has traveled in the middle of it or at the end of it just because it is their time off. And so they're sort of traveling too.

Turner Novak:

And they fly in or they just come into the office the full week or?

Ankur Goyal:

It's all over the place. We have people who do the work remotely and then come in to hang out with us for a little bit. We have people who come in, we have people that are in San Francisco, and so they're just already there. We're not too crazy about doing the work trial itself in person. We don't want to create too much of a barrier for someone to try it out.

But you can tell pretty quickly whether you sort of miss having the person around or miss having their presence. If you're sort of triaging issues and you wish you had this person around to help just throw something to, because you know that they can drive it to completion.

Sometimes you feel a bit of relief that the person isn't there because, oh, I don't need to come up with a piece of work for this person to do. So I think the product market fit test is actually a pretty good way of assessing it as well.

Turner Novak:

And the work trial candidates are pushing code into production that customers are using? How did you get comfortable with that as a founder, of just letting go of that with a candidate that's not even working at the company?

Ankur Goyal:

Yeah, I mean, we're very careful about code review and we leverage this really nice workflow on Vercel.

So just to get somewhat technical for a short period of time, one of the cool things about Braintrust is that we allow customers to deploy the infrastructure in their own cloud environment, and we host the UI in our environment.

What that means is if you're a customer, like the Notions of the world who care a lot about data security, you don't want to send all of your AI data and whatever to a random startup who's helping you. We enable them to store all the data in their own cloud environment, but they're still able to use our UI that's updating all the time.

And there's actually not many products that do that. And it's not easy to build. However, once you architect and build a product that way, it allows you to kind of have the best of both worlds between security at scale and rapid development.

So with that context in mind, we can change our UI constantly. And we can do all kinds of fun stuff. One of them is actually powered by Vercel.

We do these preview links. And so if you're working on a change, let's say one of our customers like, "Hey, I have this piece of user feedback. I wish this button had this third option." Or whatever. You can actually work on the change on a branch and then send the customer a link to that preview thing. And it is accessing in a very secure way, the data in their cloud environment, but they get to use software that's unreleased. It's pretty crazy.

And so if you're doing a work trial, you can actually work on fairly novel changes and while you're working on them, you can get user feedback without necessarily landing or shipping the change. And then once the feedback is vetted and the code is vetted via code review, then we can land it much more safely.

Turner Novak:

So you're basically pushing things into production that's used by the customers, but it's not actually the final product. So it's kind of this in-between state?

Ankur Goyal:

Yeah. I think we've been very deliberate about engineering the engineering so that we can have a lot of velocity with the UI and the product, but do it in a way that works at enterprise scale. And so, this is just an example of how we take advantage of that.

Turner Novak:

Have you learned anything just in terms of selling to some of these enterprise customers that want AI products, or that are trying to build with this stuff? Any lessons on just what they're looking for right now?

Ankur Goyal:

Let me start by saying, if you're trying to sell software to an enterprise company right now and it doesn't have to do with AI, then it's really hard to get their attention.

Turner Novak:

Really?

Ankur Goyal:

Everyone from the board to the CEO to the CIO, to the software engineering teams, the software engineers who themselves care about their individual resumes and want to make sure that they're working on things that protect their jobs over a long period of time. They all see the opportunity with AI and they're prioritizing it.

And so if you are able to help a company with their AI agenda, then you have an unusually high shot at doing business with them. If you are working on something unrelated to AI, then you have an unusually low shot of doing business with them, because it's going to be very hard to galvanize the amount of attention that's required for an enterprise company to actually sign a deal with you. It's a non-trivial process to do business with any company. And so I would just be mindful of that.

And every founder I talk to, I basically say, "I understand you're passionate about whatever. But rewire yourself to be passionate about AI. Because unless you are, you're going to have a really hard time selling to the enterprise for the next 10 years."

Turner Novak:

Would an analogy be like cloud versus on-prem-

Ankur Goyal:

For sure.

Turner Novak:

... or internet versus no internet? Just kind of, one of those paradigms where it's going to change and-

Ankur Goyal:

Cloud versus on-prem is an awesome example that I have thousands of scars and wounds dealing with via MemSQL.

At MemSQL, we were selling on-prem and there was a lot of opportunity there. But the insane amount of technology and engineering that we had to do to get baseline opportunity and attention from customers was significantly harder, more expensive, slower, etc, than our rivals who were just killing it by selling cloud software.

And by the way, because of how cloud software works and because of how new it was, the quality bar was lower. Because on-prem software you can only redeliver it once every several months and you have to convince people to upgrade all this other stuff. Cloud software, you could actually ship broken software and then fix it immediately. It was so radical.

And then, if you're working with a company and you're trying to sell them AI stuff, I would be very careful about selling software that competes with internal software engineering efforts to accomplish the same thing.

The reason is, if you go to a company and you think about all the things that a company can do better because of AI, maybe let's use a canonical example like marketing ad placement or whatever, something like that.

Turner Novak:

Automating the creation of ads?

Ankur Goyal:

Let's use that. Perfect. Yeah. Automating creation of ads.

You might think as a founder, okay, great, I can help every company automate the creation of ads better. But guess what every software engineer at that company is thinking too? I need to figure out how to use AI to help my company do better. What are the opportunities in my company to do that?

And so there are a lot of companies that are trying to sell two line of business AI powered software right now, or AI innovation, and there's nothing else to it other than the AI innovation stuff. And they're actually competing with the software engineers at that company who are trying to do the same thing and trying to engineer AI software to do that, because they have the same goal as you do. You do not want to be doing that.

And so whether it's this software engineering dynamic or whatever, I would really study the dynamics of how customers, software engineers, the internal forces within a company relate to the problem that you're trying to solve. And then make the solution that you're selling something that enables the human beings at that company to be successful by betting on it.

Turner Novak:

So it's almost like solving new problems in a way or something that just people aren't really using software for at the moment or?

Ankur Goyal:

Potentially. Although I think there's a number of problems that people aren't using software for that the software engineers inside of that company might want to solve. You have to be careful.

The thing about Braintrust that we've been somewhat deliberate with, and by the way we do sometimes compete with internal efforts or internal tooling, is no one really wants to build logging eval, UIs, like graphs, tables. It's just like-

Turner Novak:

That's not really fun.

Ankur Goyal:

It's really not fun. And it has nothing to do with your company specifically. And so why would we be building this?

On the other hand, if you do something that those folks do want to build, then it can be very hard. There's no one size fits all answer to this. And it's kind of generic sales advice.

Sales is very much a human thing. So you have to understand how you are helping the humans at a company in a repeatable way with your product, and almost engineer what your product is or why it is that way to help those humans.

Turner Novak:

It's funny that that's the takeaway when we're talking about AI.

Ankur Goyal:

Yeah.

Turner Novak:

Do you have a favorite AI product that you've come across recently? Or something that you've liked? It can be something everyone's heard of, but any more under the radar too. Just cool ones that you've seen.

Ankur Goyal:

Yeah, good question.

Turner Novak:

That's from David Song, on Elad's team, he had that question.

Ankur Goyal:

I'll give you my thing that everyone's heard of, which is Suno. I earned a lot of brownie points with Alana and her family over Christmas break by making a Suno song for every member of her family. And it wasn't the only gift, but it was part of it. So thank you Suno for that.

Turner Novak:

Suno the, you type in a prompt and it creates a song out of it, basically?

Ankur Goyal:

Yeah, exactly.

And then under the radar stuff, it's actually really hard for me to answer that question because I have access to what everyone may have access to in the next few months, from almost all the cool companies.

Let me put it this way. I think the workflows around AI, around productivity, around data analysis, there's really, really exciting stuff coming, just really exciting stuff. Our internal workflows at Braintrust are very, very, very different than if we had started the company a year or two earlier.

The way that we, for example, do project management. The way that we collect customer feedback. The things that we choose to spend our time organizing data. It's all very different.

I would say in general, I'm probably most excited about the productivity space as a whole. Partly because I hate meetings and I think the AI innovation that's coming with productivity will actually enable most companies to have far fewer meetings.

Turner Novak:

That's great. I love that. So to kind of recap, it would be, there's a lot of big things coming that no one's really come up with yet. It's new features. It's not just the stereotypical chat GPT wrapper, whatever. It's new form factors.

Ankur Goyal:

Oh, yeah. What I find very exciting is that there are, I mean, there's some things that are very hard to do with models. But there are some things that are, once you see the ideas, you're like, wow, why didn't I think of this?

But really the innovation is the UI, the UX, the grind work of collecting good eval data sets, and then tuning everything to just work often enough that you're not upset with it. And there's some just really great product UI surface area coming out.

Turner Novak:

So it's basically, when you talk about eval is fine-tuning, that's what you use Braintrust for.

Ankur Goyal:

Exactly.

Turner Novak:

So you guys are sitting right at the middle of all of it.

Ankur Goyal:

Yeah. I mean, if you want to build a really, really great AI product, you use Braintrust.

Turner Novak:

Well, that's a good way to end the conversation.

Thanks for doing this. This is a lot of fun.

Ankur Goyal:

Awesome. Yeah, I really enjoyed it.

Stream the full episode on Apple, Spotify, or YouTube.

Find transcripts of all other episodes here.

The Split

Discussion about this post