🎧🍌 Unlocking AGI With Visual AI Agents | Joseph Nelson, Roboflow

Lessons from Stripe and Palantir, how to be good at Hacker News, dissecting developer tool business models, hiring former founders, and why David Sacks as AI & Crypto Czar is bullish for startups

Dec 19, 2024

Joseph Nelson is the Co-founder and CEO of Roboflow. And he thinks computer vision is the missing piece of AGI.

He started Roboflow in 2019 to make the world programmable, building computer vision tools for developers and enterprises. Fast forward to today, over 1.3 million developers have used Roboflow in the past month.

We talk about visual AI agents, and how computer vision opens a new paradigm for software to interact with the real world. We get into different commercial use cases for computer vision, including two live screen shares for everyone watching with video.

Joseph also shares lessons from Stripe and Palantir, dissects developer tool business models, advice on marketing to developers and performing well on Hacker News, growing up in Iowa, his experience working with David Sacks at Craft Ventures, and why his new position as AI & Crypto Czar is bullish for startups.

Get in front of 220,000+ readers and listeners in 2025

Click here for more info on sponsoring the podcast.

👉 Stream on Apple and Spotify

Timestamps to jump in:

3:34 Computer vision is the missing piece for AGI
5:59 Vision as a new paradigm to collect data
10:55 Live examples of computer vision
13:45 The Magic Sudoku solver that led to Roboflow
18:13 Using computer vision for automation
24:49 Computer vision in sports
27:02 How vision unlocks new data sources
28:24 Inside developer tool business models
33:32 The "Collison Install" and hands-on customer service
36:45 When to adopt Palantir's Forward Deployed Engineers
43:44 Why AI companies need to combine PLG and enterprise sales
50:12 Advice on developer marketing
52:30 Roboflow's greatest hits on Hacker News
1:02:19 Benefits of David Sacks as AI & Crypto Czar
1:05:32 Why all new technology has bad actors
1:07:07 Why over-regulation holds back innovation
1:12:01 How to get on the front page of Hacker News
1:19:43 Multi modality, time recognition, and agentic vision
1:28:36 Image-to-image prompting
1:30:42 Growing up in Iowa
1:32:20 Making TI-84 calculator games in high school
1:36:32 Pioneer: hunger games for startups
1:40:16 Why Roboflow does weekly Ship Lists + Ship and Tell
1:42:46 Hiring former founders and "full stack people"
1:45:16 Designing a bottoms-up organization while scaling
1:50:35 Why candidates build with Roboflow in hiring process
1:55:08 Hiring someone to help me with the podcast

Referenced:

Try Roboflow
Roboflow Universe
Paint.wtf game
Roboflow’s NeurIPS Presentations
Careers at Roboflow

Find Joseph on X / Twitter and LinkedIn

👉 Find on Apple, Spotify, and YouTube

Transcript - (read on Rev)

Find transcripts of all prior episodes here.

Turner Novak:

Joseph, welcome to the show.

Joseph Nelson:

Hey, thanks. Thanks for having me.

Turner Novak:

I'm excited to have you on. We're going to talk computer vision, we're going to talk building developer tools, a ton of fun stuff, marketing, some of the fun antics that you've been up to. Can you really quick before we get in, talk about Roboflow?

Joseph Nelson:

Roboflow enables developers and enterprises to use computer vision in production. What our mission is, we say is to make the world programmable, basically like every scene and everything that you and I can understand with our eyes, our software will be able to soon. And for that transition to happen, there needs to be tools, community, infrastructure to accelerate that transition. So Roboflow builds tools for dataset understanding, tools for model training, and tools for deployment so that enterprises and developers can deploy systems for visual understanding.

Turner Novak:

Yeah, I was going to say, so it's basically computer vision that's kind of like the layman's terms for it. And I think you actually think the opportunity in computer vision is probably a lot bigger than most people would think. Can you just explain that and make me think the same thing after hearing you talking about it?

Joseph Nelson:

It's pretty simple. I mean, think about the human brain, your brain, about 50% of neurons are used for processing visual information. And as we think about the transformation of software increasingly being able to have the capabilities of reasoning and understanding and thinking like the intelligence humans have, you can imagine that we'll approximate what humans can do in our software. And so if you already think that 50% of our processing power is spent on visual understanding, software has a long way to go. Even in a more practical way, you can think about it for AI to really have the impact that we know that it should, it needs to understand the world around us, the real world, that's being able to understand contexts, navigate environment, make decisions, and as exciting and prominent as language is, vision is actually humans first sense. We actually had our sense of sight before we even had the ability to construct language.

Turner Novak:

One way to think about it then is, I feel like you've mentioned this in the past of it's almost like this new input level, maybe, like visual as a way to input data into computers. Is that fair to say?

Joseph Nelson:

The reason that visual understanding will be at least as big as or likely bigger than even language and LLMs and what you're seeing across AI is to really have AI realize its full potential it needs to understand the real world, and understanding the real world means having a sense of visual understanding and reasoning and capability. And if you think about it in a very fundamental way, half of the neurons in our brains are used for visual processing of information. The same will be true for software, and a very early example of this is the power of self-driving cars and how revolutionary that is. That same transition of a vehicle will exist for the way that we produce things, the way that we ship things, the way that we purchase things, the ingredients that we know we stock our kitchens with.

You go through your day, every part of the way you experience the world starts with your ability to see things. That means accelerating cancer research of automating counting of what are called neutrophils after you perform an experiment, I learned from one of our users, it means making underwater autonomous robot that go clean up plastic in the world's ocean, it means producing electric vehicles more effectively. One of our customers that produces EVs efficiently and wants to do that right the first time.

Turner Novak:

EV is electric vehicles?

Joseph Nelson:

Yeah. Making EVs correctly the first time. It means shipping things from point A to point B, like I kind of joked that sometimes Roboflow powers Santa Claus because ensuring that this holiday season things show up where they should is dependent upon each item being cataloged correctly, received correctly, inventoried, stored, sent. It means actually even entertainment, making the world more fun, like the snap lenses that you might use to interact with the world are adding a sense of delight. It permeates every part of how we interact with the world and it starts with a sense of visual understanding. So we had this insight years ago and remain credibly convinced that it's still dramatically under explored and underutilized, and a big reason for that is the toying and the community and the infrastructure is missing to accelerate that transition.

Turner Novak:

It seems like you've ... it's sort of this entirely new way to interact with a computer, interact with data, or collect data, inform decision-making when you talk about AI influencing a model, building a model, it's like a whole new paradigm of how to control an electronic, or control a piece of software, or robotics or physical things like you talked about.

Joseph Nelson:

It's a great way to think about it. In some ways it's like the biggest competitor we joke sometimes is a keyboard. The way that a computer gets to understand something is you either input the data manually or it just sees and understands and actions things for you. So a different input mechanism I think is a great way to characterize.

Turner Novak:

A big thing right now has been voice in AI, interacting using voice as this interface to interact with an LLM. The first maybe year, two years was a chatbot basically, it's pretty simple, worked decently well. I think now we're starting to see a way of all these startups that are building products around voice and then computer vision feels like this whole new level on top of that where it's like you've got text, you've got voice, and the computer can actually see, that opens up an entirely new, exponentially different paradigm of what can be done.

Joseph Nelson:

Think about this too, voice language, these things are both inherently humans centric. Humans produce a voice, humans produce language. Sight exists both with humans and in places humans are not, you want to observe processes even if a person isn't there. So if you think of it from a very first principle's way of the things that need to be seen, observed, understood, and actioned across our universe is way, way, way wider than things that are constrained to just what people interact with. And so visual sensing and visual understanding is, I think, the most underutilized sense that we, collectively humanity, have yet to put to use.

Turner Novak:

To put it on software, or digitize it, I guess.

Joseph Nelson:

Yeah, observe. I mean you think I grew up in Iowa and my family has a farm, you can have a system that exists in a field that has limited or no internet connection that needs to make decisions about where to plant things, where to deploy herbicide, and that system needs a sense of sight to do its role effectively, and there's no human that's in that loop. This is what I mean, there's so many places throughout the world where a sense of visual understanding, reasoning, and action is yet to make an impact and it will.

Turner Novak:

I think one thing that could be interesting, we were going to do some live demos of examples. We've never done this on the show before, a live screen share. Any examples you can pull up to show us?

Joseph Nelson:

One of the very first things that motivated the aha moment for us and a lot of people that inspired what Roboflow grew into is building with augmented to reality, which in a lot of ways AR is like the front end for computer vision. Computer vision is the back end, it understands the logic and the relation of things and spatial reasoning and it gives you the context. And then AR is the front end, the UX and UI for interacting with visual understanding. And so to create the aha moment AR is sometimes a good way to show what's going on when computer vision's taking place. And in 2017, Apple released ARKit, augmented reality kit, and a co-founder and I, we were just sort of friends that at the time that we're both interested in new technologies made this app called Magic Sudoku that solves Sudoku puzzles.

And yeah, I can screen share that for you for folks that are on video, otherwise I'll just narrate it here. So the core insight here is you carry around this pocket supercomputer, your phone, which has more intelligence and processing power than even what we use to land on the moon and the Apollo mission. And a lot of that computes not put to use, especially in the context of understanding the world around us. And so Brad had this idea, well what if we made a Sudoku solver and ruin the game of Sudoku, would it be a fun aha moment? And so we made this app that you hold it over the top of a Sudoku puzzle, it understands the state of the board, the empty numbers, where the correct number should be, and then it fills it all in magically right before your eyes.

And it's a good party trick of ruining the Sudoku puzzle, and sometimes I'm on a plane, I see someone doing Sudoku next to me, I'm like, oh man, I'm really tempted, holding your mind with what's possible. But it delivers this really powerful moment of wow, I have the power to interact with the world in a way that we didn't previously understand. And Magic Sudoku was very much a flash in the pan moment, we built it as a side project out of interest of playing with augmented reality and AR applications in 2017. And that year it went mini internet viral, made it to the top of some subreddit, it won product on AR app of the year.

Turner Novak:

Yeah, it looks like the tweet that you have pulled up has like 4,000 likes. So it looks like it got ... and this was previews, it doesn't show us views, but my guess is least probably mid-hundred, maybe a million views, 100,000 to a million views on Twitter.

Joseph Nelson:

It's still Brad's pinned tweet. So I think folks continue to see and experience this example.

Turner Novak:

And Brad's your co-founder?

Joseph Nelson:

Brad's my co-founder. And the insight here is it creates this, again, this aha moment for interacting with the world that our software currently isn't. And while it's sort of a toy example of interacting with puzzles, there's so many implications here. And again, we made this as a side project well before Roboflow was even an idea of a company. And what we assumed would happen is creating an example like this would inspire other developers to create AR apps to interact with the real world too. We thought that'd be kind of like an explosion like developers building for this new paradigm, that didn't really happen from 2017, 2018, 2019, honestly, it still hasn't really happened. Maybe it's a good thing that we pivoted away from doing AR stuff.

But in 2019, Brad and I got to talking again and we both exited the prior things we were working on and we said that flash in the pan moment of building Magic Sudoku, why aren't more people realizing and building for this really important primitive, this sense of visual understanding. And so we began to ask ourselves what it would look like to build a better version of Magic Sudoku and eventually release SDKs so people could build their own experiences of you could think point your phone at something and it tells you the brand of something or the amount of calories of food you're consuming.

Turner Novak:

Yeah, I mean couldn't you just use Google SDKs, Apple SDKs, was it not a part of what they released? Was it OpenAI? Did they have any developer tools yet? Was there existing stuff out there?

Joseph Nelson:

We're talking 2017 when we first built this and then 2019, which is pre a lot of high quality advances that have taken place. And even still actually we have yet to realize the promise of truly on device, real time visual understanding in AI. It's coming. But even with those APIs and those tools, there's still a lot of missing infrastructure. So for example, we got to work and we said, okay, if we wanted to make this AR app that understands everything in the world, what we need to do is just progressively add more things to this consumer app. And we decided to start with board games because they're social by default, it's fun, we had a history of success there. So we built a chess solver, we built a boggle solver, which is this four by four letter word game.

Turner Novak:

I feel like you told me about that before, yeah.

Joseph Nelson:

Yeah. And these apps, they gained a lot of popularity, then slowly died. It was very clearly like, oh, that's interesting, not like, oh, I need that and I can't wait to use it. But in pursuit of building those apps, we realized a lot of the infrastructure you just alluded to is missing. So for example, if you want to build an app that understands a chess board, there's hundreds, thousands of different chess boards, there's like Harry Potter chess, there's like US Chess Federation version of chess, there's custom pieces, there's rocks that represent pieces, there's different angles by which someone could look at the board, there's different lighting conditions, and all of those things mean that you need to have a model that can understand, represent, and interact in all those scenarios and all those settings. So we realized, okay, we have this data problem.

And then we got to the model training part, we're like, oh man, we're using CoreML, or CreateML to train CoreML model. And then we got to the deployment part and we're like, well, we need this model to be optimized for iPhone and eventually Android devices, validate that the model is seeing what it says it's seeing. So we built a lot of this internal tooling to release our consumer apps more quickly. And we realized that because the so-called shark fin graph we were experiencing of the consumer apps were a big shoot up and then slowly atrophy of usage, perhaps we should go all in on the reason we were building those apps, which is empowering and inspiring and giving people tools.

And so that's what Roboflow came out of is this core insight. And like I mentioned to you, the world still hasn't comprehended or really put to use what visual understanding is and multimodal advances in AI are helping, it's helping give folks the realization, the ease of use, the starting with a pre-trained model that can understand visual things a bit better, and that's encouraging inspiration and possibility and capability, but there's still just this huge gap of evaluating that the thing's doing what I want it to do, making it work with my data, making it work real time, integrating it with the other systems that I have, and to really realize the promise of visual understanding and truly giving the sense of sight to our software, there's still a long way to go of the tools and capability.

So the present day Roboflow, we've made good progress against that, a little over 1.3 million developers downloading our open source in the last 30 days. A platform where, on the open source side you have hundreds of thousands of models that people can take and use and hundreds of millions of label images that can get people started. But still there's 70 million developers out there and billions of people that will experience the power of sight in their hand or in their machines or in their day-to-day life or in their factories or in their port. And so it's a good start, but such a long way to go for realizing what it means to give a sense of visual understanding.

Turner Novak:

So when you talk about you have all these pre-built models and data or images, I think you said hundreds of millions of images, maybe you said millions, I forget, but so what does that mean exactly? I'm a developer, I'm working at, I don't know, we'll say I'm working at a logistics company, we'll say I'm working at the ports, they just went on strike, we want to automate it. I'd go into Roboflow, what might I be able to take advantage of to just get started and work on my automating the port, a system, some kind of software to doing that?

Joseph Nelson:

Roboflow has tools for everything you need to deploy a system for visual understanding. So that's data tooling for preparing datasets, knowing you have the right data, automating labeling, curation, tools for training models and fine-tuning foundation models so that it works on your data, whether that's recent releases like PaliGemma 2, or point specific fast object detectors, or using GPT in tandem with some of your own models, and tools for deployment, so you can chain together models and create visual agents that you can basically give an objective and say, "Hey, watch this video feed and text me if you see a hazard," or something like that. And so we have the core tools, and of course you can use those tools with your existing favorite tools, you can use pieces and parts of our stack, our goal is ultimately to enable you to build the system the way that you want to use it with the tools that are best fit for your part of the task.

If you came and you said, "Hey, I had this logistics problem." The cool thing about Roboflow is really invested in ensuring that the full potential of computer vision is understood, which means giving the community tools, so giving people access to using foundation models to automatically label data, or GPU to train their own models, especially on the open source side of things. So there's hundreds of millions of models or images, those aren't ours, that's the community. When you sign up for Roboflow, you have the option. Similar to creating a public project, an open source project in advancing the field of computer vision, you can also create a private project and generate your own maybe proprietary models, proprietary systems, and work with creating advantages for your business or your enterprise.

Those open source models and datasets it's become the largest collection of open source images and models on the web today. And so if you wanted to start working on, I don't know, shipping containers or boxes going down an assembly line, we call the open source community Roboflow Universe. So you go to Roboflow Universe and you can search for datasets and models that might already understand the labels for maybe doing recognition and understanding of the characters on the shipping container, maybe counting the boxes, knowing how many boxes entered your facility, how long they were stuck at one point in time, ensuring that the cycle time in the facilities are matching with what your performance schema is. Even keeping track of knowing what entered and left your facility.

Turner Novak:

Yeah. So this is all things that a human could pretty easily simply do, just count the boxes or whatever, but there's not really good tools to have a computer do that with. You need a camera, you need some kind of vision image input, and then you need a way to recognize, like the human ring, that's a box, this is a boat, this is a dog or whatever.

Joseph Nelson:

Yeah, that's maybe the open secret is that there's a lot of visual understanding challenges that are not inherently complex. As you said, counting boxes isn't maybe the hardest thing for a human to do. Giving a camera the intelligence and deploying it and relying on it previously would maybe be a quarter's worth of work, and not so long ago, five years ago, like a PhD thesis of building a system that can do that. And now with current tools, you can do that in an afternoon and you can deploy it and make sure that it does what you want it to do, and maybe it sends you a text or integrates with your ERP if you're in an enterprise.

Turner Novak:

So like the camera inputting the boxes going down the line spits out and tracks your inventory count going in and out or something like that, which then could hit your P&L and all that stuff.

Joseph Nelson:

Exactly. We like to say basically any actual insight from a video, like you said, maybe some of our customers have cameras deployed inside the factories and facilities. One of them, for example, makes wallboard and ceiling tile, so you can think you're seen the ceiling tile in a common office, or wallboard for-

Turner Novak:

The drop ceiling, the tile, drop tiles.

Joseph Nelson:

Exactly. You remember in elementary school, you would take your pencil and you'd put a little sticky note on it and you'd throw it up and it'd get stuck and you'd write gullible on the sticky note, throw it into the ceiling, and you'd be like, "Hey, gullible's written on the ceiling," and be like, "Oh, no, no it's not." That ceiling tile that the pencil will get stuck in, one of our customers makes.

Turner Novak:

Okay.

Joseph Nelson:

That process, for example, they have some problems where, or processes that they know how they want them to run, they want to produce the right amount, it needs to be at the right angle and dimensions, the right formulation. And a lot of those problems are a visual validation type problems. And so any camera, maybe they already have cameras because they have security footage or maybe they're going to install a new low cost camera, any camera immediately becomes smart and you can add, "Hey, alert me if production is lower than the amount that I expected on a given point in time, or if the color or the formulation is different than what I expected it to be."

Turner Novak:

So you can make it where it's not even on the camera level, it's like the video feed, like the video data that's coming in, you can actually make decisions on and act on?

Joseph Nelson:

Exactly. Yeah. The video, think of this stateful analytics, you might want to retain state across within an hour, within a minute, what's going on. Another one of our customers, I mean speaking of hard video problem, they power the broadcast that Wimbledon and the US Open. And that video problem is super high stakes because you have a live broadcast and you're going to action information of the position of the tennis ball, the position of the player, the position of the court, and the risk tolerance, or the air tolerance I should say, is minimal because you're going to be showing replays, you're going to be dicing things up, so it's running in real time, it's running at the edge, and it's running on the video to make decisions about how they showcase their broadcast capabilities. And so video is, I think, a hugely growing and underutilized as well source of this visual information.

Turner Novak:

Interesting. This is sort of related, I had a job, and a job is in quotes because I just got free tickets to the games, but I worked for the AHL, American Hockey League team in Grand Rapids and College of Grand Rapids, Griffins. I was on the stats team for the game, I maybe did this maybe 10 or 15 times one season, or throughout the course of two seasons, where you sit way up, way above the rink, you sit basically you're a full two stories above the last seats. There's just a bunch of people that sit up there and they keep stats of the game. And my job specifically was recording who was on the ice at any given time. So anytime someone got off, I had to swap out and know who was on. And that falls into who's on the ice plus minus their stats, getting assists and goals.

That usually came down to the refs on the ice. But my job specifically was being able ... I was basically like computer vision of knowing who was on the ice at any given time. Didn't get paid. I mean, I got free tickets basically to a different game, and it was fun because I like hockey and I played growing up. But it's one of those things that could be completely automated, and then you could pipe that to the screen too. You can have live feed of, I mean, every sport's broadcast has stats that they show during the games, and you're even seeing now, I think during Thursday Night Football on Amazon Prime, there'll be a red circle around a guy who has a high probability of blitzing. I don't know if you've seen that. So it's cool how it can really play out into literally everything, every piece around us that has some kind of computer can probably in some way be impacted.

Joseph Nelson:

Sports analytics has been one of the places where we've created a lot of examples and attracted a lot of interest for this reason. There's a lot of, as you mentioned, minor leagues, the Grand Rapids Griffins, which feed into the Red Wings, but they even have teams that are in minor league divisions below that, and they have some scouts that are recording things, but they also have a lot of video and they can automate, as you said, the collection of shift times, but also passes, zone entries, and this applies for every single sport has this problem. But it's a really visceral example of, wow, what would I do if I could just automatically know everything that took place in this video and how would I action that information?

Turner Novak:

Yeah, the data is there. You're basically unlocking a new data source, or it's like a latent data source where it all exists, you don't really have to do anything different as long as you're collecting the video feed, it's just you're creating new data that you previously didn't have access to.

Joseph Nelson:

That's a great way to think about it.

Turner Novak:

That's your new front page of the pitch deck.

Joseph Nelson:

Thank you. Speaking of sports, I mean, I just pulled up the screen share here, we build a lot of examples in soccer, for example, because it's so accessible and you can do things like produce FIFA like animations of the active player who has the ball, where the ref is on the pitch, and produce insights around maybe possession or passes, and it's an accessible way to unlock this idea of, I have a video, maybe a live stream, and I want to collect, action, and create insights on what's taking place. We actually, I mean we can put this in the show note that we made a 90-minute YouTube pair down tutorial of building really comprehensive soccer analytics. We call it Football Analytics because the team member that worked on it is international. So I think thinking through sports as an accessible example.

Turner Novak:

Well, so then another question that comes out of all this is it's this new technology or this new, there's not really a lot of examples of how do you price some of this stuff. Thinking about the business model, it sounds like there's some hobbyists, people doing it for fun, but you also have multinational, global shipping company that could be a customer. What does a business model for something like this look like?

Joseph Nelson:

Fortunately, developer tools are increasingly a known quantity and a known business. You have publicly traded companies like Elastic and HashiCorp and Mongo, and what tend to work quite well is continuing to build open source capabilities to inspire and showcase what's possible and put tools in the hands of builders and developers to expose to the world what's possible. Like I mentioned, a big part of Roboflow is to just advance humanity's ability to process visual information, and we can't do that alone, and so putting open source tools out there inspires, delights, and allows folks to create their own examples and showcase what's possible.

Then we also offer our platform, and in that platform it has a free way to get up and started so that folks can experience and see the power of building with computer vision on their own examples, open source or private. Then we have a self-serve tier, so if you're a hobbyist or a small business, a startup, a way that you can credit card swipe and get up and going, the difference between the free and the paid tier is you have more usage limits and richer feature set of capabilities that are particularly relevant when you're working on a business problem. And then we have an enterprise tier, and-

Joseph Nelson:

... Working on a business problem and then we have an enterprise tier, and enterprises have unique set of needs around deploying things at high scale, deploying them securely, performance guaranteed. They have quite a bit of needs around data, access, control, especially in AI. And so, ensuring, and demonstrating, and providing that level of certainty are all things that enterprises are willing to pay for and need to, to power their businesses. And so, that's the way that we approach the go-to-market is continue to focus on providing value upfront, and where there's things that align with people that are producing their own value, being able to share on the upside of their growth.

Turner Novak:

Yeah, it sounds like basically give it away for free, but then if you use it for a commercial purpose, it sounds like that's when you start to make money as Roboflow, kind of monetize commercial usage.

Joseph Nelson:

That's a great way to align things is if you're building a business with some of the tooling that we're able to provide, at some point ideally we're able to share in providing that. Now, the limits probably, we air towards being more generous than less. I think there's just so much in the way of showcasing what's possible, though of course we need to make money to fuel our mission, and continue to invest in providing the capabilities. So for example, we've given researchers and open source projects over a million hours in free GPU time to showcase and produce their own models on their own examples.

Turner Novak:

What's the dollar value of that?

Joseph Nelson:

GPUs range anywhere from 80 cents an hour to, if you want, during the scarcity of H100, as high as 3, $4 an hour. So, at least a million dollars of compute provided.

Turner Novak:

So, we'll say 1 to 5 million in free compute? Something like that?

Joseph Nelson:

Yeah.

Turner Novak:

Good range? Cover your bases in case it's higher or lower?

Joseph Nelson:

Yeah, yeah. I mean, we do our best to optimize our own cost of putting folks on instances that are available, and these sorts of things. The underscored point there is delivering the infrastructure so someone can explore, create and advance what's possible with computer vision is as important as ensuring that we're unlocking commercial value for our customers. And so yeah, like you said, if you're doing open source work and along the mission of advancing, showcasing what's possible, we want that to be free, available and open. To subsidize that, we want to work with businesses to ensure that as they produce value for the problems they're solving, we're also sharing in the upside. And like you said, a really useful access tends to be do you need what you're doing to be proprietary? And if what you're doing, you need to be proprietary, generally you're aiming to build a business with it, and we want to be a partner to that success.

Turner Novak:

So, if you're doing any kind of developer tooling business model, that's like one angle you can start to price based on.

Joseph Nelson:

Yeah. I mean, the specific things that we price against, it's mostly usage based, meaning the more value we're able to provide in terms of the consumption of our products, then the more revenue we're able to obtain as well, which I think also really aligns with customer incentive.

Turner Novak:

Yeah, and one thing that Nader at Brev slash acquired by NVIDIA, he said you guys are super hands-on to the point of he felt, I think his words were it felt like you were more of a services company than a software company because you were so hands-on with helping them through implementing the product, customer service. How do you think about that? That almost sounds unsustainable when he tells me that. It's like, man, how are you so hands-on with the product and so helpful to customers?

Joseph Nelson:

Nader was one of our very first customers. He was actually in the YC batch before ours, and the company he was working on in YC, he pivoted into building a pill counting startup, and he used Roboflow for the pill counting startup. And you can think about that as for us, just getting customers and seeing what's working, and understanding where someone's not able to get to value was probably more valuable than even the revenue we were going to obtain. And so, there's this concept of the Collison Install, popularized by Patrick Collison, where back when Stripe was getting started and he wanted companies to be able to process their payments with Stripe, Patrick would just show up at your office, get on your computer and install Stripe in your app, that you are able to successfully process payments.

And I think that's a really good mental model for how to think about servicing your first customers. You just learn so much, like what was in the way for them to be successful, what tooling and capabilities that they need. And then, over time you paper over that with building features that productize your learning. And so, Nader, being a customer in 2020, the tool was just first available-

Turner Novak:

Yeah, it sounds like he was early customer for one of the first.

Joseph Nelson:

Yeah, and what's important there is you're begging basically to have someone even use your thing, let alone it's a value with it. And so, you're learning a ton about the capabilities there, and then you productize those learnings so that, as you said, it becomes more scalable and sustainable as a business. What's interesting is that on the enterprise side of the business, that's still a lot of what enterprises want when they work with software companies, is they want the ability to be hands-on, they want the trust of integrating with their system. And generally on the enterprise side, the amount of value that you're delivering to the problem they're solving justifies a higher price point.

And I mean, the most extreme version of this business model has been popularized by Palantir, where you have four deployed engineers who show up and are like, "I'll build whatever you need," and then figure out what products ensue from doing that. And I think there's good discussion of how much you want to lean into fully that, versus solutions engineers, versus very little on the way of customer side.

Turner Novak:

Yeah. How would you think about tiers or threshold? Because doesn't Palantir, I'm not too familiar, but they have some eight figure contracts where someone might be paying them 10 million, 50 million bucks, which you could probably afford. Like, "All right, we got someone's salary that's there full-time." It's 1% of the margin, less than that. How would you think about just what's the threshold of too much or too little?

Joseph Nelson:

The thing about Palantir, and folks have written about this at length, is they were aligned with their investors that in the early days they were going to sign contracts that they knew they were underwater on. And their thesis and their bet was they would discover the stickiest enterprise problem, and in discovering those problems two things would result. One, they'd be able to deliver so much value from some of the solutions that ultimately worked out, and B, to be able to productize some of those learnings. I think startups fail to mimic this, because a lot of them don't realize A, the capital intensity of deploying a strategy like that, or B, they're fooling themselves into thinking they're solving a problem that they can command Palantir ACV's contract value, when the problem that they're solving might not be that big. One thing that Palantir had to its favor is by servicing government, there's higher barriers to entry and that allows you to preserve your margin I think over a bit longer duration than purely private sector examples.

I think there's a lot of things that are unique to Palantir, and just cargo culting that I think is risky. The way we think about that at Roboflow is you have enterprise customers where a few things can be true. One is the problem that is being solved with that customer deployed at scale will be so valuable and so great that you can trust and invest in ensuring that their rollout is going to be the types of bets you should play. For example, some of our customers, as you mentioned, are the world's largest shippers in the world, and computer vision is a new capability. It flips their whole business model on its head. A lot of times they don't know where inventory is, and these businesses are understandably skeptical of something that could be so good or so useful.

Turner Novak:

Yeah, it just sounds like you're pulling one over on me. There's no way this is even possible.

Joseph Nelson:

Exactly. And that's like you want to have a product where it's almost so good that it doesn't sound like it can be true for somebody, for your customer. Well, when you find yourself in that position, that means you need to realize that promise. And so, they're healthily skeptical perhaps, of delivering on that promise, which means you can say, "Okay, great. If we get this solution rolled out and at the scale that you're operating at, it's going to deliver an insane amount of value, like hundreds of millions of ROI deployed at scale inside your business. Now, I understand you have hesitancy of if this is really going to work the way that you think, because you've not seen this technology. So, we'll work with you on maybe a smaller single site to validate, and prove, and demonstrate that this works. And-"

Turner Novak:

And it'll be super hands-on, just making sure.

Joseph Nelson:

And be super hands-on, and integrated with the systems and those capabilities. And as long as you can develop the rapport, and trust, and mutual alignment with the customer that hey, everyone, same page, if this does what it says it's going to do, your business is fundamentally different, and we can realize the shared upside from doing so. And I think that's a key thing. A second way that some businesses can justify the very hands-on rollouts in enterprise context is if you open up a new market, like maybe you're building a product or a solution. And that's not so different than a very first seed stage startup, definitionally a seed stage startup has no customers a lot of the time. And so, you're building stuff and you're already of course burning more than you're obtaining if you have no revenue. And so, you're making a focused bet on a market. And the same thing could exists in a new enterprise deployment.

So, the hands-on strategy, there's a gradient to it, and it needs to be aligned to your customer's expectation and you need to be realistic about the market opportunity of what happens, the so what if you get that rollout. And that's why I mentioned at Roboflow, we have both the enterprise rollouts, which some of them can look very hands-on, some are actually still pretty turnkey from the perspective of us and the customer. And then you have the self-serve tier where the expectation is such that the customer knows, "Oh, this is the product I need, this is the infrastructure I need and it solves my capability." And I think it's a superpower for companies to be able to do both of these. A company that can both have a self-serve tier and an enterprise motion, if you think about it, as a startup you want some customers that they're building so fast that you're almost scared that you're not going to be able to keep up with the pace of what they need.

And then on the flip side, you want customers that are so large that they fund the build. Startup customers typically fall into that first category, or at least small businesses, where they're often louder with their feedback, they're often challenging you and they're giving you a lot of direction of how to build your product. And building an AI, speed is everything. Models are changing, the capabilities are changing, and you want to be in a position where the success of your business is aligned to your customers being able to deploy their solutions to production. You want to be a little bit healthily scared of being able to move fast enough to keep up with those customers. Concurrently, serving enterprises, enterprises are generally slower moving and interestingly lower feedback about problems, and capabilities, and deployments, and a startup gives itself a huge advantage in having very loud, fast feedback to build a high-quality product, service fast-moving customers concurrent with enterprises.

I'll give you another example of how this super power shows up. One of our enterprise customers purchased Roboflow in June 2023, and they had some ambitions of what they wanted to roll out, they're a Fortune 100 customer. And the rollout would require quite a bit of hands-on work, especially if the product is not easy to use. But because we have this whole self-serve ethos in the business, we're holding ourselves accountable that a user can show up, sign up, and get to value without talking to anybody. And so, this enterprise that purchased, by one year later, June '24, had rolled out hundreds, hundreds of examples inside their company, and they did that in a self-serve way.

That's only possible because you've built a product that's so intuitive, and able to be adopted and consumed, that you get that type of viral adoption and scale inside the company. And again, you can wish for that, but the best way to realize being aligned to making a product that's so easy to use is to have a line of business connected to it. And for us that's the self-serve business line. And so, the two play off of each other, I think incredibly well.

Turner Novak:

Yeah, I've heard people say that people tripped up a little bit just thinking about self-serve and PLG, product-led growth, is the only way to sell, when in reality it's almost like a pipeline strategy of filling up your enterprise pipeline that you then need to treat like a sales process. It's more of traditional enterprise sales at that point.

Joseph Nelson:

This is something that a lot of businesses need to discover and be honest with themselves about is, is our self-serve plan its own line of business? Or is it lead gen for our enterprise business? And not every company knows on day one, though you have to be aware of why you have that self-serve plan. Segment is a great example of a business where they were learning about this over time. Initially it was the self-serve business and then they were like, "Okay, maybe it's just lead gen for enterprise." And then they're like, "Okay, actually there's a way to make both of these work." OptimizeLeads is another business where they had a self-serve plan and then they went all in on enterprise, and self-serve was meant to basically fuel leads to enterprise. Or this is a deep enterprise fast business, but AppDynamics is a business that they had a self-serve plan, though that self-serve plan, 50% of those leads went to enterprise contracts. It was purely lead gen.

And the way to answer the question of is my self-serve business lead gen or its own independent business line, is I think net dollar retention. If the NDR of that segment of the business is above 100, that means it can compound by itself year after year and it's less likely to become a shrinking portion of the overall revenue of the business. If NDR is too low, and too low here I mean, HubSpot famously went public before they even had 100% NDR, but if you have net dollar retention maybe well below, call it 80%, that means at the one-year mark your number of customers, or at least your revenue, is left from the same set of customers.

Turner Novak:

You go from $1 to 80 cents. Yeah.

Joseph Nelson:

Go from $1 to 80 cents. And that means as that part of the business grows, whereas the business as a whole growth, that part of the business will be shrinking. And so, you should ask yourself, "Okay, is this an independent line of revenue? Or is this a way for us to fill," as you said, "the enterprise pipeline with leads?" My viewpoint is there's very low downside to ... It's very contextual to the business you're building.

Turner Novak:

Yeah, so I guess that's a good way to think about it then, it's just like, is it its own business that's self-sustainable? And maybe you could also do both. Maybe it is its own business, but it's also a pipeline for-

Joseph Nelson:

Correct, yes. And that's the thing is the lead gen is the floor, but if it's lead gen and its own line of business, then that's a ceiling. Twilio is a great example of a business where they had both an incredible self-serve motion that was self-sustaining and brought them to IPO, as well as an enterprise business. And again, the reason for that is basically is that part of the business compounding or not? And if it's not compounding, that's a good signal that its utility, that doesn't mean it's suddenly a worthless part of the business, it means it's more likely to be okay, how do we ensure that folks that are coming there are ultimately going to be lead gen? The the market is telling you how to treat that part of the business. So, at worst it's lead gen, at best it's lead gen and its own independent self-serving ...

Vercel is another great example of a business that does both very well. They have the self-serve tier that hobbyists, hackers, small businesses can go and build, and then they have a pro plan where you can upgrade and deploy, and they have enterprise plans. And their self-serve business is likely self-sustaining and also producing leads for its enterprise business. And so, I mean, like I mentioned, building a startup you're learning, and tweaking, and changing, and seeing what's possible, though I think the key thing is being honest with what the market is saying of should I think of this as lead gen alone or as well as a line of revenue to count on?

Turner Novak:

Yeah, and my intuition would say it is probably having both combined, everything working all together probably leads to a stronger business model, higher margins. I would just assume, just thinking about some of the examples you said, all seem pretty good companies versus I'm sure there's probably a lot of companies where it's they have no self-serve, it's all enterprise. Maybe we haven't heard of them before, but then also some of those can be great businesses too.

Joseph Nelson:

Yeah, I mean, I think there's a ton of enterprise SaaS businesses where trials might not even make sense. ServiceNow is such a valuable complex deployment, $200 billion company, though it just doesn't make as much sense to have some of the ... I'm going to try to give AppDynamics as the example, because they had a self-serve business that intent was to fuel the enterprise business with leads. So, I think you're right, low-CAC, self-expanding self-serve businesses is awesome, but frankly a lot of companies have this grass is greener effect where it's like you have a really good self-serve business, you're like, "Man, why can't I get the enterprise business to work?" Zapier is a good example of that, where Zapier exists, has an amazing self-serve business, goes on a tear, and then you have other competitors that pop up that are basically Zapier for enterprise. And the founders there have spoken on in public ways about, "Yeah, maybe we should have thought about that earlier."

Now I think they have it dialed though, it's a good place to be. So, basically it's like if you're a self-serve business, you're looking over there at these giant enterprise contracts, you're like, "Man, my friends have it so great on that side of the fence." And then folks that are in enterprise land are looking over at self-serve, they're like, "Man, those low-CAC customers that just leads constantly, fish jumping into the boat. Man, why can't I have that?" And so, there's a very classic grass is greener thing going on, and yeah, of course. I mean, the best is to have why not both?

Turner Novak:

So, another thing then, talking about lead gen, filling up the pipeline, developer marketing, you seem to have done a pretty good job of that. What's been your strategy for getting in front of developers?

Joseph Nelson:

Well, I mean, I think one of the unfair advantages to getting in front of developers is to be a developer, to know the products that I use, and that I consume, and what do I trust. You just have this own smell test of does this product feel like one I can use, and trust, and really get to value with? Now in terms of turning that into a system in GTM, what does that look like? What does that feel like? Developer led products, maybe counterintuitively, a lot of the times when you're marketing the capabilities of a product for a non-developer led product, you're saying heavily or maybe even exclusively value propositions of what it does with a business, like move way faster, be more efficient, cut costs.

And those things, of course are relevant for engineers, builders, developers, the what is essential. You're talking to an audience that can pretty quickly conceptualize if you say what it is, I can make a value judgment of would I use that, and why would I use that, and why wouldn't I use that? And so, I think maybe counterintuitively you need to be doing a lot of what, maybe even more than why-

Turner Novak:

So, these are examples almost, like giving them something to look at and be like, "Oh, I can see that"?

Joseph Nelson:

Yeah. I mean, I'm just talking maybe in the context of landing page, but broadly, yeah, you're creating usage examples, you're creating high quality documentation, you're creating a tutorial, you're showing what's possible, the art of a possible, very much showing, not telling. And the way it shows up on maybe a landing page is you're describing what the thing is, and then the sub header is, and why doing it that way is more efficient, or faster, or capable, or what it allows you to do. And so, there's this pension of a lead with what versus lead with why. And I think when building products for engineers, the what is extremely important because I can make a determination of oh, if that's what it does, I could see why that's useful for me.

Turner Novak:

And it seems like one place developers to hang out is Hacker News. You've done a pretty good job at figuring out how to market your product on Hacker News. One of them I thought was pretty interesting, paint.wtf. Can you describe that really quick what that was?

Joseph Nelson:

Yeah. So, we made a game called paint.wtf, it's still live, that's like AI Pictionary. So, you're given a prompt, for example, like a giraffe in the Arctic-

Turner Novak:

And this is made up, AI just generates something.

Joseph Nelson:

Yeah, AI generates a prompt. We wrote the first 10 prompts and then when we built this it was like GPT-3 era, not to date ourselves too much, but we're like, "Give a bunch more prompts that are like these." So, we fed it with a bumblebee that loves capitalism, a giraffe from the Arctic, an upside down dinosaur, and then the user's given a Microsoft paint canvas in the browser, and you draw that prompt. So, I might draw a giraffe with a scarf, and some icebergs in the background, and then I submit that drawing, and then AI judges how close my drawing was to the prompt. And what was novel about Paint is it was built in 2021, right when Open AI released an open source model called CLIP, Contrastive Language Image Pre-training. What CLIP introduced is this concept of being able to associate texts and sentences with images in a way that previously was very limiting.

So, what they did is they trained on 400 million image text pairs, and you could, for example, show an image and then get words that are affiliated, like concepts that are affiliated with that image. So, the way we used CLIP is as the automated judge. So, I have an image, a giraffe in the Arctic that a user drew, and I'm going to compare how similar this image is to the text of a giraffe in the Arctic, and using CLIP's feature space, whatever image is most similar to what CLIP considers is similar to the prompt is number one on a leaderboard. So, basically AI allowed us to unlock scalable, uncapped, automated judging of people's drawings.

Turner Novak:

And this was computer vision. It was literally a human would look at it and judge, "This is the best giraffe, this is the best depiction of the Arctic, upside down dinosaur," whatever.

Joseph Nelson:

Exactly, exactly. And so, we put this out there, and it ended up going mini-viral, we had the front page of some subreddits and trending on Hacker News. At peak, we were doing seven submissions a second. In the first week, 150,000 people played paint.wtf, and we learned some really funny things.

Turner Novak:

Really?

Joseph Nelson:

Yeah, about how AI thinks, because if you think about it, we train these models, like CLIP for example, and we don't know actually what it knows. We don't know what concepts it affiliates or doesn't know. So, I'll give you an example. One of the prompts was the world's most fabulous monster, and let's say you draw Mike Wazowski in a scarf, a beautiful pink boa. And then, let's pretend like I draw Sully, and he's on a catwalk in a fashion show. And we both submit these, we would learn what things CLIP more associates with a fabulous monster. Does it think a catwalk is more fabulous than your boa? Does it think that Mike Wazowski is more fabulous than Sully? And other things that we've learned that's interesting. Imagine you and I submitted the same drawing, but my background was pink and your background was purple, and let's imagine everything else was the exact same about the drawing.

We learned that the embedding space for fabulous is closer associated with pink than it is with purple. Or a really funny thing, we made a game on the internet, and so we had some interesting submissions I could talk about, but also we had people that would submit the prompt just written on the canvas. So, I'm given the prompt and I just write in Microsoft Paint style the prompt back, and we learned that CLIP could read, because CLIP would consider those prompts of someone just writing more similar than some drawings. And so, then we learned that people were like gaming the system, and realizing that CLIP could read was crazy. And then we learned, again we made a site on the internet where people can draw images and submit them as strangers. So, you know what you get from doing something like this on the internet.

And so, we would then ask CLIP like, "Hey, is this image that someone submitted more similar to NSFW, or more similar to the prompt?" And if CLIP said the image was more similar to NSFW, then we would blur it out. And so users, even if they wanted to spam the site with stuff that didn't degrade the quality. So, basically we started to use CLIP to moderate CLIP. We're using the model to moderate itself. And so, we learned all these fascinating things. One other thing that was interesting is that CLIP is trained on real world imagery, and these are people that are drawing in Microsoft Paint. And so, the fact that CLIP even understood these very poorly drawn, sometimes really brilliantly drawn images in the browser was also insightful. We're like, "Wow, CLIP is a really powerful model." And so building paint.wtf was multifold. One was like, new model, let's make something fun with it.

And as mentioned, I feel like you don't really know a technology until you build with it. And so, building with CLIP exposed to us what was possible, and what ended up happening is in addition to all these learnings about CLIP and building a fun viral experience, and folks still play this at Roboflow for internal games, or play it at your next onsite or your next, your warmup on Monday, you're all hands, have everyone play a paint.wtf pop and you'll get some delightful things. But also, it allowed us to realize one, how to build infrastructure that can serve CLIP at seven submissions per second, so we're on a high scale without going down. So, that was an important thing.

And then a second thing was realizing the power of CLIP. And so, realizing that CLIP can understand things in a very valuable way allows us to allow users at Roboflow to better understand their data sets, like which images are most relevant for improving model performance? Which images are most similar in a given embedding space? Or doing something that's so-called semantic search of I can search a concept of expensive, and then get back images or video frames that are most similar to that concept. And so, building Paint, yeah, it was partially a fun thing to put out there in the community, and it's awesome that HN really enjoyed it. It was also a key way for us to stay current and realize what's possible with current computer vision.

Turner Novak:

I think you had another one too that was fun. It was this Mountain Dew Superbowl challenge. Can you talk about that one real quick?

Joseph Nelson:

The Superbowl challenge is such a goofy story. So, I was watching the Superbowl, think this was a 2021 Superbowl. And during the first half there was this ad that Mountain Dew ran, where John Cena, the wrestler, is in this pink car driving through a theme park, a Mountain Dew theme park. And across the screen there's hundreds of Mountain Dew bottles appearing all over the place, hidden in the background, jumping all over, on the steering wheel, in the carnival games. And at the end of the ad, Mountain Dew goes, "The first person-"

Joseph Nelson:

... end of the ad, Mountain Dew goes, "The first person to submit the correct number of Mountain Dew bottles that we're seen in this ad wins a million dollars." And I was like, "This is a computer vision challenge in plain sight. We want to see all the Mountain Dew, and count them, and produce it." And so I immediately turned off the Super Bowl.

Turner Novak:

Don't know what happened. Don't know who won.

Joseph Nelson:

I went and I found the ad. Mountain Dew was ready and they posted it on YouTube. So I grabbed it quickly, and then I started to annotate all the Mountain Dew bottles. I wrote Brad at halftime. I was like, "Hey, man, I know you're enjoying your football, but we're spending halftime not watching the halftime show. We're finding Mountain Dew bottles."

And so we annotated all these Mountain Dew bottles and then we trained a model and we deployed the model and we wrote up a example of how you can use computer vision, and when you have an MLOps pipeline, how quickly you can go from a video or your own dataset to a model that can see things, count things, and deploy them. And I posted that to HN and people loved it. They were like, "Oh my gosh, yeah," because I mean, it's like using technology to get ahead and it's like it's sort of a hack of getting ahead in some way.

One of the things that was really cool was a researcher at Google Brain tweeted a link to the blog post and was like, "This is actually a really good example of why you want robust ML pipeline, as pedestrian as it is." And that was really encouraging to see folks that have the expertise of those in the field to recognize the importance of it.

Turner Novak:

It sounds like maybe it was also culturally relevant or very timely. So it also contributed to just the interest around it because it was at the Super Bowl, everyone had seen this thing, so it was a relatable thing. I mean, you see it all the time with internet marketing. It's like, how can you tie something into the current thing that's taking over the algorithm?

Joseph Nelson:

You're the expert at this. You have built a whole fund around being relevant and topical with the current online internet conversation. I'm envious of your ability to... The wittiest quote tweets and greatest takes. That's marketing, right? It's like how do you ensure the thing that you're working on remains topical, germane, interesting, and exciting to sometimes the flavor-of-the-day considerations?

Turner Novak:

Yeah. And then I think another interesting moment, you actually got mentioned on the All-In Podcast one time. Was that expected, unexpected? What happened there?

Joseph Nelson:

We didn't expect it. Craft Ventures led our Series A. David Sacks is on the Roboflow board and has been an awesome person to help support building a high-quality company, so he has awareness of the business and actually, I mean, being recently named AI Czar, AI and crypto czar-

Turner Novak:

Are AI and crypto czar, yeah. What a title.

Joseph Nelson:

Something that I think folks should be encouraged by is Craft invested in Roboflow's seed round in 2020, and then got ahead and did our Series A in '21, and continue to invest in some of the future rounds that we've raised. The interest, intrigue, and conviction that Sacks in particular has had for open-source AI before it felt like maybe the obvious thing to work on, I think is really encouraging for what that means for AI policy and its implication.

Turner Novak:

Can you explain that for somebody who doesn't get what you just said?

Joseph Nelson:

Yeah. The current AI landscape is fragile. AI is a new capability. We're not exactly sure everything it's going to be able to do. We know it's extremely transformative, and usually when there's a new technology and you're not sure of all the things that are possible with it, you want to allow innovation to flourish. You want people to experiment. You want people to see what's possible. You want them to build and create and tinker. And one of the best ways to ensure that people have freedom to innovate and tinker and try things is to make it accessible, make it open source, make it so that anyone anywhere can start to build and create with that technology.

There's been a discussion of should AI, because it's an incredibly powerful technology, be less open source? Does it introduce risk because there's capabilities that someone could use AI for that we don't want AI to be used for? Therefore, would it be logical for us to limit and erect barriers to being able to adopt AI? And should there be bans on releasing models openly? And this is a well-discussed and debated matter.

Generally, ensuring that AI can be easily consumed, used, and built is going to unlock the most good. And right now, AI is at somewhat of a turning point of how impactful it's going to be. And the best way to realize the impact of AI is to put it in the hands of builders to see what it can do, to see what it can't do. And so for example, having someone like David Sacks thinking about ushering in AI policy and showing a preference for investing in and aligning to open-source companies is a good vote of confidence for ensuring that companies can continue to tinker and explore or use it.

And by the way, you can very credibly make the arguments of, "Yeah, AI can be used for bad things." And there, I'd point to the early internet. You could use the internet to process payments. You could also use the internet to commit scams and wire fraud. You use the internet to share photos with your friends from vacation, you use the internet to share unsavory photos. There's of course things that you can use technologies for, good and bad. Ultimately, where things, I think, often get mixed is saying the technology or the tool is bad misses what the use case is.

And so I'm really encouraged that we'll be able to have increased amounts of open-source focus as the policy to incur and discover the best use cases. And concurrently, the way that we should ensure that misuse doesn't happen is you say, "Misuse is the thing we want to regulate against. You should not be able to commit scams, you should face consequences for a scam." And that's the same with AI. That AI itself is an incredibly powerful tool. Allow it to flourish. And then the things, the use cases that you don't like, which won't be new, the bad things that people do online is not new, like deceit, fraud, those are the things that we should continue to say, "If you do those things, whether you use AI, the internet, or not, we should not allow those things to happen."

So overall, I'm incredibly excited by a emphasis on enabling discovery of new uses. I'll make this really real for you. Our users will work on creating models in healthcare, accelerating cancer research where you run a series of wet lab experiments and you get cross-sections of cells and you identify and count proteins that react to given treatment called neutrophils, and counting all those neutrophils on a given cell underneath a microscope, a very tedious thing to do and a perfect one for computer vision. And so they built a model that automatically counts all of the neutrophils so that wet lab work could be done more effectively, which increased the rate at which they can accelerate their research from taking a full day between each of the runs of the experiment-

Turner Novak:

Oh, it would take a day to count them all?

Joseph Nelson:

Yeah, for all the trials that they were running. And so now you can do that in 30 seconds and you can just go and run the next trials. And think about it, that compounds, right? It's not just that one day of experiments, it's all those future experiments. Everything, everything gets slowed down if you don't have the accuracy of the model or the accuracy of the counts of the treatment that you applied. And that was only possible because someone out in the open was able to say, "Cool, I want to take a model and I want to train it."

Now, one could argue, logically, it'd be like, "Wow, healthcare, incredibly high-stakes space. Before you're allowed to use AI in healthcare, you need to apply for a permit," because healthcare has very real consequences on people.

Turner Novak:

Yep, I could definitely see that.

Joseph Nelson:

Right? And if you introduced a claim like that, then the graduate student that identified and created this method assuredly would not have been in a position to identify the regulation. Maybe you have a lawyer validate because the university is going to be like, "We don't want to incur the risk of someone violating healthcare policy in the US." It's use cases like that that get stymied that are so, so, so important.

And so while we're discovering and uncovering what's possible with AI in the real world, we need to ensure that open innovation can continue. Otherwise, things as important as cancer research gets slowed down. We will have cures to cancer slower if we limit the rate at which AI can help us understand health outcomes. So I'm encouraged and excited by the ability for AI to be in a position to flourish in the US.

Turner Novak:

Yeah, I think if you just think about it as any new technology, when we created steel, you could say, "Oh, cool, we can create buildings that don't fall down and people can live in these and they're safe. Your house isn't made of sticks and mud and it won't collapse when there's a storm," or you could also say, "Oh, we make swords out of them and we murder people with them and they should be banned. We should have no steel or no iron or whatever." Or even boats. It's like, "Oh, we use boats to travel faster to fish, get new food." Pirates, right? A thousand years, you're like, "There's pirates. Oh, no boats if you go on a boat because pirates use boats."

So yeah, it's definitely one of those things. You just got to be really nuanced of understanding everything can be used bad. We could be using podcasts. You can use podcasts to spread misinformation or whatever. We should ban all podcasts because you can't fact-check what they're saying or something. You can do that with any technology or any new form of anything.

Joseph Nelson:

And I get it, change is uncomfortable, change has disruption. These things are not easy, low-nuanced issues, but like you said, any prior technological advance has returned so much opportunity for improving life expectancy, improving earning potential, improving the type of leisure that can take place.

Turner Novak:

Well, like fire too. Imagine if we banned fire when we first invented fire, back like 100,000 years ago. Right? It's like, "Oh, it can burn you. It can burn down our mud huts." But that led to everything else.

Joseph Nelson:

Yeah, "The central village committee has deemed that fire can burn people and fire is no longer allowed in this village. Yeah, enjoy your sushi, not your cooked fish. Thank you."

Turner Novak:

Yeah. What about the wheel? Could the wheel be constrained? I mean, maybe if you use it to steal and you get away faster in your getaway cart?

Joseph Nelson:

I'm in Village A, I'm on a uphill village, you're Village B, you're a downhill village. I'm rolling my wheels down the hill and they're taking out your village. You're like, "Man, I am going to campaign to ban wheels." These are obviously contrived examples, but the point that you're making is right, which is the technology is something that can be used and wielded as a tool. We, as a society, should aim to discourage misuse, not discourage the technology itself.

Turner Novak:

One thing I also want to hit on before we... We've already gone past the Hacker News stuff, but I think it's interesting. If I'm listening to this and I'm like, "Okay, you obviously really understand and you really get this marketing stuff," how do you get at the top of Hacker News? How should I approach? If I'm a founder asking you, "Joseph, how do you guys do so well there?" How would you talk someone through just thinking about doing well, performing, and getting visibility on Hacker News?

Joseph Nelson:

To get to the top of Hacker News, you got to build things that are topical, kind of fun, or a deep dive into a given topic. Those are the things that typically perform well there.

At the end of day, it's a news site, so usually there's a sense of recency and things that are topical. And then there's also a pretty strong bend towards science and technology interests. And so when there's really strong deep dives into how something works or advances in energy development, supply chain, or software capability, those things will end up performing well too.

So there's not a perfect playbook, but I think there's some tips that I might give. One is authenticity. Things that you generally find interesting as an engineer, likely other engineers will find interesting.

Turner Novak:

So maybe think about it more as on a personal level versus a projecting what you want to project from a marketing standpoint?

Joseph Nelson:

Exactly. I think that that's number one is that the second that you're trying to think of like, "This is something that I want to make to go viral on Hacker News," it's almost like-

Turner Novak:

It's not going to go viral.

Joseph Nelson:

Versus, "Here's something that's interesting to me and probably therefore interesting to others." And people have done these analyses of what stuff goes to the top often, and it's usually personal blogs and projects, open-source projects, and tech advances, and news for the most part. Those are the sorts of things that people are particularly interested in, which skews towards people that are giving an individual viewpoint on a capability.

So maybe some examples are useful for what we've seen people find interest in. One was, like you mentioned earlier, the Super Bowl story. That has some of the raw elements. It's topical, it's fun, you're breaking a system. And that was really interesting to me to see, could you use computer vision to count all the Mountain Dew bottles? Could I win a million dollars by building a vision model? And I knew that that was fun, but I also knew that it had to be urgent because the Super Bowl is a point-in-time thing.

Turner Novak:

Yeah, you probably have within the next day. If you waited another day, people are just going to forget about this.

Joseph Nelson:

That's exactly right.

Turner Novak:

Yeah, and there's probably also an element of, I mean, you probably see this with Mr. Beast is he gives away money or earning money, any kind of monetary reward or incentive. I mean, that's game shows. All game shows on TV, they always give away money in some regard.

So I guess you can't just, yeah, it's like give away a million dollars to get to the top of Hacker News, but you played on someone else giving away a million dollars to get to the top.

Joseph Nelson:

Yeah, you're right. That was lucky that I had a big numeric dollar value attached to it for viewing stuff.

I mean, some other stuff that we've written that is, again, interesting to us, that is interesting to others, when GPT came out with the ability to have vision included, we were like, "How well does this work? Where does it work and where does it not work?" And so we wrote first impression with GPT-4V, what the model was called at the time, and we wanted to do that work anyway so we could tell our users, "Here's what you can expect and here's what you can't expect." And fortunately, a lot of folks found that interesting.

Again, the thing about why that one was interesting, it's because it's topical. It's on a new-model release, so that's something people are intrigued by. OpenAI is a leader in advancing AI capability, and so things that demonstrate what you can expect from those models usually has a level of interest to it. It's genuine interest. There's utility, like, "What can I expect from this thing?" And yeah, it's like rapid response. It was first impressions, the things that we were interested in.

We've had users build things that make it to the front page of Hacker News. Some of these are funny too.

Turner Novak:

That's UGC for you guys, like UGC Hacker News climbing.

Joseph Nelson:

Yeah. And we encourage people pretty regularly, like, "Hey, if you are interested in working at Roboflow, build something," and if you build things that are interesting, that's usually a good signal that you have a pulse of the types of projects that should be built.

But anyway, one person has built this thing that drops hats on New Yorkers from his apartment window.

Turner Novak:

Oh, yeah, I saw that one.

Joseph Nelson:

Yeah, he called it dropofahat.zone. And he had these signup slots that if you went to an intersection in New York, a hat would magically drop and fall on your head from above. That's a great one where it's not topical, but it's just so like, "WTF? Your hats in New York City falling on people?" And he's using computer vision to see when there was the person there and where their hand was and where the hat should be should he drop.

That same guy, he made another project that rates your cat loaf. So I learned this concept that the way that cats will sit, sometimes people will call it a loaf of bread.

Turner Novak:

Oh, okay.

Joseph Nelson:

And so you take a photo of your cat, and if its paws aren't showing and you just get the outline of the body of the cat, then that's really a good cat loaf. And so he made this product that you upload an image of your cat and you get a score of how good of a cat loaf your cat was performing at the time.

That's another one where it's like, it's kind of like another, "WTF? I'm intrigued. What is that?" sort of thing. And clearly, this guy built it for... Again, a common trend. Both these projects were built for him. He wanted to give hats to people in New York City from his apartment. He wanted to see how loaf-like his cat was and see how that compares to other cats.

And so there's just something you said here of just following your curiosity and then that's going generally be more authentic and things that are interesting, and maybe the last-mile piece is ensuring that you write well about the thing. Because I think a lot of engineers have really intriguing and thoughtful considerations about current events or releases or capabilities, and just doing the last step of pen to paper, that goes a long way.

But Hacker News also has memory, so if you get known as a thing that is low-quality or is interesting, that can go, that can curry direction and favor. There's some, like simonw, for example, is always on the front page of Hacker News. His blogs are really good. He has interesting things to say. He's evaluating the capabilities of models. So people trust his opinion and know him and when they see simonw content... This is Simon Willison, if you're familiar.

Turner Novak:

Huh. It sounds like that could actually go the other way too, where if you don't make good work or it's low-quality, people would just be like, "Oh, this is from this Joseph Nelson guy. He's kind of an asshole or he's kind of..." You know?

Joseph Nelson:

Yeah. I think, yeah, it could definitely work against you because yeah, I mean, it is a community, so people develop a reputation.

And that's one thing that, at Roboflow, it's important to me that we continue to pursue our interests. Again, things that are interesting to us that are useful for us to know what's possible with computer vision, that's a good thread to pull on. If it's interesting for us, then it's probably interesting to others too, and so we should explore those curiosities naturally and put them out in the world.

So I don't know the, spoiler alert, secret of how to get on the front page of Hacker News is pursue your curiosities and write about them.

Turner Novak:

Hot take, but very simple, actually. Very simple, but probably very complicated, very hard to execute on.

Then we can probably talk a little bit about what do you think is the most interesting thing going on in AI right now? Just what are you following and pay attention to? Maybe AI and computer vision specifically, what are you most excited about that we might not, the casual observer might not be realizing is going on or happening right now?

Joseph Nelson:

A lot of AI advances now are focused on making use of multiple sources of data. So there's images, there's text, there's voice, and this is known as multimodality, multiple modalities used in tandem to give understanding. And that's really promising because you can have more context. At the end of the day, AI is reasoning systems and intelligence and context comes from multiple places. And so using context together allows you to produce more powerful [inaudible 01:20:38].

Within computer vision, there's some nuance here where language text pairs are useful and insufficient for a lot of what problems people want to solve with vision. Previously, we were talking about counting things and there's good progress with counting things with language text pairs, and also there's a need to have precise, so-called beyond-token-level understanding of images.

Maybe a way to think about this is if you remember people were joking about counting the number of Rs in strawberry with one recent model. One of the reasons that models, as capable as they are, struggle with something that seems so simple is the way that tokenization happens. That is the way the model sees the characters in the text strawberry and processes them, and that tokenization that transformers perform limits, can limit, precision, doesn't definitionally, but it can limit precision. And if you think applying that to pixels, like an image is a construction of pixels, the precision of where edges start and stop or the number of things present is even more important. And so that's a pretty big opportunity, both for multimodal models but also just the field in general of making that increasingly easy.

Now, traditionally, the way that we've solved that is by breaking into tasks focused on things like object detection or instant segmentation for classification. And increasingly, I think those will get better and more performance in pre-training.

Another trend I think that's really exciting is transformers generally being able to power a lot of these capabilities, like convolutional neural networks, CNNs, have historically been the core background part, backbone part of computer vision. What transformers offer, and especially pre-trained large models, is you can have knowledge already and just use it.

So what does that mean for the field? It's like instead of needing to go start by collecting a dataset, which can be arduous, maybe the data's not accessible, it adds a lot of time, it's complex, you need to prepare and curate and annotate, increasingly, you can just start with a model that sees you can do stuff that you already want to do. And large pre-trained models offer that promise, which is really, really, really exciting.

Turner Novak:

So that just lets you get going faster because you don't have to go and train your Mountain Dew bottles, like, "This is what a Mountain Dew can looks like," and I can just get up and going. Yeah.

Joseph Nelson:

Another modality that's interesting is the sense of time. So there's, as we were discussing, voice and image and text. There's also a time-based component that's particularly relevant for video. And so being able to have models that understand both spatial and temporal things means that they can do video reasoning way more effectively. And so you're seeing this with Gemini, for example, and Google has really led the way, I think, in large video context, understanding video question and answer of, "What's going on?" And I think that trend is poised to accelerate of like, historically, computer vision models have been pretty naive in terms of how they view video, limited by compute and limited by model capability where each image is almost viewed in independent.

In reality, if I drop a ball, where that ball is in frame two, three, four, and five is non-random because gravity. And so I know that the ball is going to be one position lower. And so if you're building a model to find the ball, it makes sense that having a sense of which frame came before helps me understand where the ball is going to be after. And so what that translates to is, yeah, the video understanding capability.

Another thing in computer vision is a lot of systems need to run in low-compute environments. And so getting things to the edge, getting high-quality models distilled down into smaller models is very important. You just don't always have the benefit of the cloud at your disposal for some of the problems that you want to solve. And so distillation and affiliated techniques are key if you're realizing that.

And then maybe a last one, and this is the one that I'm pretty, that I see a lot of promise in and we're focusing on tooling for it, is visual agents. So the big trend in AI in '25 is going to be focused on the idea of agentic AI.

How agentic AI differs from what we've seen in the past is with an agent, I can provide an objective and then let a system learn to accomplish that objective. And so I can say like, "Hey, buy me the best Christmas gift for Turner, for Turner Novak." I can just give a shopping agent that open form of a goal, and that agent would need to maybe go look at your Twitter, maybe go listen to a prior podcast, go figure out your interests, realize that you're based in Michigan, hear that you worked on the Grand Rapids Griffins and grew up playing hockey. And it's going to be like, "Great, I'm going to go purchase tickets to a Red Wing's game because I know that that's going to be relevant for your interest." And an agent is going to go through and break down what the goal is, what the tasks are, executing on those tasks, and then perform it on your behalf.

Visual agents will do this with visual outcomes where it's like, "I want to... Hey, tell me whenever a package shows up on my doorstep. Tell me whenever there looks like there's going to be a jam on my line. Tell me when there's maybe suspicious activity taking place and I work in building facility security." And a visual agent can learn to reason about which models it should call, using those models, and then outputting results to other systems.

And so we've been working a lot on an open-source project built on top of inference, which is for serving models, but it's also for executing models intelligently and in a good order. So I'm going to use the first model to do this task, then another model to do this task, then we're going to connect it to some system to text you. And so we currently have a framework by which people can build these workflows. And increasingly, I am excited by the opportunity to enable people to just provide goals and one-shot their way to a system that can do what they want it to do.

So that's when you hear about agents and why people are so excited about them, that's what's going on. And in a visual context, it's going to be related to processing visual understanding and reasoning and providing me with results.

Turner Novak:

Well, and it's probably too of you incorporate a writing input and output, but then also the visual input and output in terms of robotics or automating physical things also, where all these different touchpoints interact with each other. Again, it sounds like you're in this crucial point of basically letting people use this new form of data in the process of interacting with software of some kind.

Joseph Nelson:

That's right, that's right. Yeah. And I mean, visual agent approach means you can show up to Roboflow and say, "I have this video and I want to get a text when blank thing happens, when someone spends too much time in this area, or when there's too many people doing this," and constructing a system that will understand that for you and then giving you the ability to tinker and integrate that system is something that I'm super, super excited about.

So yeah, the trends are both related to things that are broader within the field, but also the cool thing about Roboflow is we've gotten to a size and scale now where we can also influence what the trends should be. And maybe one tiny example of that, I wouldn't call it a broader trend, but I think it's something that's under-indexed on, something that, oddly enough, people are paying attention to that they should in visual understanding, is something called image-to-image prompting.

This is a little in the weeds, but if you have a model and it knows a lot about the world, you can access information from that model by providing text, like saying, "Hey," I'm going to type and say, "Hey, is this image outdoors or indoors?" You provide that as text, or I could provide a visual prompt where I say, "Hey, this part of the image, tell me everything that's in this region or segment everything that's here," or I can provide an image to an image model and I could say, "Hey, here's something that I'm looking for," or, "I want to know how many of these things exist."

So in the Mountain Dew bottle case, instead of going and saying, "Hey," typing, saying, "Hey, count all the Mountain Dew bottles," you can actually provide an image of one of the Mountain Dew bottles and then the entire rest of the, in this case, pre-trained model will be able to say, "Oh, I can tell you how many of those there are." And the reason that I think image prompting is so powerful is we've seen a lot of promise to it being a much more steerable, controllable way for models, and I think it's under-indexed because people have just not explored and realized the potential.

So that's something that we're particularly excited about and we're probably going to publish some benchmarks on this. We've already released some open-source dataset benchmarks in the past.

Joseph Nelson:

Benchmarks on this. We've already released some open source data set benchmarks in the past for novel tasks, and so the week that we're recording this is NeurIPS, one of the three large machine learning conferences and the ROFO team gave a presentation on the state of 2024 machine learning and AI, what might happen in 25. That's really machine learning audience centric, but for folks that are interested in that, there's a live stream on Latent Space.

Turner Novak:

Interesting. Okay. I think one thing also is you're so kind in the weeds of you'd think, "Oh, this guy probably lives in San Francisco. I probably grew up in the Bay Area." That's not true. Where were you born and were where you first from?

Joseph Nelson:

Yeah. I grew up in Iowa and like I kind of mentioned, my family has a connection to maybe stereotypically agriculture in Iowa. Now, I grew up in a city, the booming metropolis of Des Moines.

Turner Novak:

Oh yeah. It's like tens of hundreds of thousands of people.

Joseph Nelson:

Yeah, there's at least dozens of us there. Yeah. I grew up in a home where I guess there'd be maybe the average amount of technology. We had the gateway computer, the family computer, played RollerCoaster Tycoon, Backyard Sports games. There were some days, this is really going to date for the Gen Z folks, but I remember there was a day where I brought a floppy disk with an essay to school once.

Turner Novak:

Oh, really? I never did that.

Joseph Nelson:

Yeah. Maybe earlier than that. I had to print it off and I had to bring it in. Getting into technology meant learning and just exploring on the internet, the web. I remember getting... One of the very first things is I got really into, there was this website called CNET, reviewed technology and reviewed new phones. I remember following very closely, too closely. I had too much time I guess. When new phones would come out and their specs and the razor and things like that. I don't know why that was so interesting to me, but I follow along with it.

Turner Novak:

Well, I feel like that was kind of the heyday of like, "Oh, the camera is so much better" or "It's connected to the internet. It has 2G or something," or "You can put games and it's like snake or something," but it was kind of bleeding edge at the time. There was some cool stuff that was happening.

Joseph Nelson:

It was. It wasn't until when I really got introduced to technology was, do you remember the TI-84 calculators, the graphing calculator from calculus class and you could load games on them, like Block Man and even Mario in some cases. My older brother would get some of the latest games and then share them with classmates. You'd plug book your calculators in and transfer.

Turner Novak:

Yep. I remember doing that.

Joseph Nelson:

And I realized that you could actually make very, very basic programs on these calculators. The first thing was teachers were like, "Do not program the quadratic formula into your calculator."

Turner Novak:

Everyone had that program on their calculator.

Joseph Nelson:

That was the first no-go. So of course, what's the first thing we want to do? Then I kind of realized you could do a bit more, and so I made... The very first program that I made, programming quotes, it's pretty basic, was this joke generator where every day in class I would think of three more jokes and I would store those basically in just a dictionary and there was a random number generator. You generate a random number and then go use that number and pull the joke from that spot in the dictionary and the user would just get sort of a joke. You just put enter, enter, enter on a calculator and you get a bunch more jokes. Very, very simple, but it brought a lot of delight and the funny thing is I shared it with other people when I would give them Block Man and I wouldn't say that I had made it and it ended up going mini-viral through the school because I think people thought they had to transfer it to have it work to play some of the other, like a utility plan.

Turner Novak:

Yeah. I remember this. Everyone would select 10 things. You had to pick what got transferred.

Joseph Nelson:

Yeah. And I'd always transferred that and then people, I think, just started transferring it, so whether intentionally or not, ended up on a bunch of calculators. And so some of the jokes in there, people thought were pretty goofy, and I just remember people talking about and using it and finding a lot of excitement by this very pedestrian simple thing. And that was very much a spark for me, of like, wow, you can write very, very simple jokes into a calculator program and they can be distributed in a way that people can enjoy and see. And it just felt like very, I'm making a thing that people are able to find a delight in when I'm not even around. And it felt like to make money for free hack except for entertainment. So I was like, "Wow, there's something here."

Turner Novak:

Yeah, so you couldn't make any money from it, I guess back in the TI calculators. I actually made a TI calculator game. I think I made maybe a couple. The one that was actually good, it was this RPG game of my friend. So we were all nerds, but one of our friends was really tall, so he was on the football team and it was like this RPG where it was his life and you try to get to the NFL, so there was only a couple options because it was entirely text-based, but you could study, go to class, take a test and it was workout, go to football practice, play a game, and then I think you could apply for the draft or something.

I basically, I don't know if you remember how the basic programming worked on the calculator, like you could store things as the letter variables, so there's only 26 variables that I could work with, but I use that to basically make this game where you'd have to study enough to have a high enough GPA to be able to play. You have to be good enough and you could go to practice to get better at football. Yeah, it was fun. It was the first thing I ever really programmed. I don't even know if I still have it. It was very bad. Looking back on it, I think in college, I played it again, I was like, "This game sucks," but it was kind of funny.

Joseph Nelson:

No, that's like layers of complexity, storing progression, different paths. That's way more complex than my silly random number joke generator. That's awesome.

Turner Novak:

My friend thought it was hilarious because it was like we were all just these nerds, right? We played Halo and we played computer games in class. We all would take game design and tech drawing and architecture classes in high school and we just played games the whole time, so we thought it was hilarious. One of our friends was on the football team and our football team was terrible, but we thought it was so funny, we're like, oh. So anyways. And then I think well, Speedbowl games too. You did this thing called Pioneer kind of early days of Roboflow and maybe that's kind how you got into more of plugged into tech from Iowa. But what was Pioneer and how did that work?

Joseph Nelson:

Pioneer was this online community that gamified to-do lists. It was kind of framed like a accelerator eventually, but the way that Brad and I described it is it was like the Hunger Games is like startup in some ways. The way that Pioneer worked is anyone in the world could sign up and you would write your list of what you want to do for your project on Sunday in a text box and then you'd submit it and then the next week on Monday, you'd be presented with side-by-side 10 other examples of Startup A said they were going to do this and here's what they accomplished. Startup B said they were going to do this and here's what they said they accomplished. Who did more? You'd vote through 10 of these of head-to-head match ups of someone's to-do list versus what that person got done. And you would just say, who seems to have had a more impactful week? And this totally owned Brad and my psychology for a while.

When we first started working on Board Boss, which was the AR board game solving thing, on Sunday, it was a ritual. We'd go and read, write what's going to be on our Pioneer for the week and did we ensure that we got everything accomplished from the last week? Because let me tell you, your Sunday self is a lot more ambitious than your Friday afternoon self. And so ensuring that you did everything that you said you were going to do. And so it ended up being like a... We're competitive people and there's a leaderboard globally of who is consistently getting voted that they've done more head-to-head.

Turner Novak:

In public, right?Even people outside Pioneer.

Joseph Nelson:

Yeah. So there's this public leaderboard of the top 50 most productive projects in the world. And the way that scoring actually worked is it used Elo, meaning when you were head-to-head with another update, if you were higher ranks than them on the leaderboard and a company is lower ranked than you by ways and someone votes that lower ranked company did more than you, your rating drops way more. What does this mean? It means that when you're at the top, you got to just continue to do your best to stay at the top. They ultimately shut down Pioneer. But when it ran, we were at the global number one spot for 20 some weeks in a row and for I think a year plus, we were in the top five just because we were just so obsessed with...

And that was really critical in the formation of what we were doing and working on because for example, every week we would have as a given, right, pre-blog and do two YouTube videos, no matter what, every week. That was two of the items plus, I don't know, flows the first customer ship X key feature and debug blank thing and you'd have these updates and that set such a good operating kit. So we just got into a habit. There were many Sundays that rolled around and I'd only written one blog for the week and I was like, "Man, I've got to..." I said to myself I would do this, I got to get this done."

And there was sort of this anonymity of accountability and that was really formative. And so now actually for me, I really like to find ways to ensure that I'm kind of doing the best that I can. And so to this day, I actually now do a weekly, we call it like a ship list. So at Roboflow each team will have ship lists of the things that they're proud of that they got in front of customers in the last week and I'll do one individually, things that I've done as an IC within the last week. The whole company can see it and see I'm doing, I'm not doing the things that I want to have done.

And it really just helps set a sense of cadence and operating pace and what can I get done for customers this week. It's something that you want everyone asking in your organization and plus it's become a bit of a ritual. So on Fridays for example, we do ship and tell, like show and tell, ship and tell and you present things that are in production for customers that you've made and it's all teams. So this could be product engineering, it could also be the operations team made a better careers page or a way to describe how to work at Roboflow or got business insurance to ensure that enterprise customers that want to trust us, we can validate that we have what they need to do business with us. I think you could probably draw a very fuzzy dotted line between Pioneer and the idea of ship lists, but I think the truth is you've seen throughout the startup community, people have operated at this weekly kit.

Folks will regularly be like a week is 2% of a year. Elon famously, they're like...

Turner Novak:

What had you get done this week?

Joseph Nelson:

There's just this sense of a week is enough time where you should be able to have something that your heels stand on that you did, but also a short enough time where the continued progress should be evident. So Pioneer was very formative in that way. Plus it was a program where you met a lot of other folks in the community that still to this day, we stayed close with. We've had the chance to hire some Pioneers at Roboflow. We've got the chance to even share office space with Pioneers in San Francisco at a point in time. Some Pioneers have gone on to be investors in what we've been able to do. So it was a really special experiment that Daniel Gross and Nat Friedman had predominantly been, and then Rishi from Ageless had stood up and worked on for a while. This was an awesome moment in time for both Roboflow and I think startup creation generally.

Turner Novak:

Yeah, it was cool. Well, I never did it obviously, but I met founders that did and everybody else says good things. Talking about the ship and tell, well, you just mentioned you have ex Pioneer people. Do you hire a lot of former founders at Roboflow?

Joseph Nelson:

Yeah, we actually have it on our careers page as an open role. The type of people that we find are especially successful at Roboflow, at least what works for us, are those that are, you say full stack people. If you're an engineer, certainly writing code to produce a feature is key. You should also be thinking about which feature should I write? And how can users understand that that's the right feature to work on? And then you write the feature and then you want to drive adoption of that feature. I want to ensure that it's documented, that it works, that people are able to get value from it. And so full stack in the sense of yes, you might be working on front end and back end technologies or I'm a designer and I can also code to bring my ideas to life as well as create experiences that deliver an intuitive and powerful capability for users.

Turner Novak:

But then make sure people can use it too.

Joseph Nelson:

And make sure people can use it. And so this sense of, I want... It's not necessarily not sufficient to make a thing. It's like one thing that we find folks that hold that belief. A second thing is being independently motivated of holding oneself to goals and being proactive in creating and establishing and working against ambitious outcomes, saying "Here's what I want to do and here's how I want to accomplish that and here's how I think about how that's going." That's another thing that I think is really a key. And then third is just folks that are healthily impatient. Why can't we get things to use his hands more quickly or I only have one life to live and so the impact that I can make is capped based on the time, therefore I should optimize and do things as quickly as I can to ensure that the results are felt. So those things are often, as you said, former founders often syncs with a strong sense of autonomy and ownership and wanting to create these capabilities.

It's not the only way of course, but someone comes to possess those attributes. Though it's one way that we found. And the thing that is really key, I think a lot of companies say that they really like to have folks that display those attributes. It's another piece to work effectively in that environment. There's not a free lunch here. To enable someone that thinks that way to be successful is to actually I think be a far more bottoms up organization, like giving people the wherewithal and not only permission but expectation of what you want to work on, how do you want to achieve the goal. So we have this expression at Roboflow that someone is like we say fully ramped when you've chosen your own loss function. Machine learning model is optimized for a loss function and a person that realizes, "Hey, I want to ensure I know what my loss function is and I can optimize against that."

And so what that means is that creating the space. And so the way that we operate is also pretty... Sometimes it's like a splash of cold water when folks come in, for example, we do things like in a really open way of, we try to use open Slack channels as much as possible to share what progress is. And even if you're collaborating on just one other person of a given thing, maybe you're writing a blog and you're going to publish the blog and there's one other reviewer of the blog, you'll do that in an open channel, not a DM because you never know who's like, "Oh wow, that blog that's going to come live, I wanted to send that to my customer." Or I'm like on the branding team, I want to know, how did you make an image to describe that concept? Or someone knew that blog was going to go live but didn't know when it was going to go live.

And just the fact that saying that you're working on it is an implicit status update to where something is progressing. And so working out in the open has this massive, massive advantage of giving context and the ability for someone to be successful. And again, it's in concert with this idea of if you empower people who are motivated and want to do things, enabling them to have access to what's going on in the information to synthesize and make the right decisions, all these things move in concert. So it's not a free lunch. For sure, it can be...

Turner Novak:

It sounds like maybe information overload. It's just if there's a thousand open Slack channels instead of 10 because everything's public instead of private, maybe it takes forever to get through them all.

Joseph Nelson:

Yeah, that's actually a thing that we've faced as we've scaled is like how do we balance the ability to give necessary context without overwhelming someone?

Turner Novak:

How did you do it?

Joseph Nelson:

We're still figuring it out. The way that we've started to do it is kind of like mitosis of project. So it's like, for example, we used to have just one sales channel and then we have a sales channel that all people were in, all the company and then a sales internal that was just the sales team. And now it's like, okay, now you have a sales internal, now you have individual customer channels plus individual prospect channels, which it's kind of like a mitosis. And all of these are still public channels, which creates almost the ability for others to opt in. Let's say that you were an engineer and you worked on a feature that you knew was for prospect ABC, and you want to know, is prospect ABC responding well to that feature? Well, you can join that channel to follow along with the progress of that deal, but you're not expected to either.

It's kind of like it allows a bit more opt in of what someone wants to know, but just like you're mentioning, there's this difficult information architecture problem, like what is the right amount that someone needs to have context to do their role successfully but still have the opportunity to get more context if they want it? And so that's ever evolving and better understood as we grow at Roboflow. But the thing that we know that is really key is I've just continued to underestimate how useful it is for people to be able to learn from each other's progress and what they've done in the past and to collaborate.

So I don't know that to solve problems so much as it's ever evolving that the benefit outweigh the... Because the alternative is you can have a lot of solid information. It can be very difficult to know, is something on track or what features are coming along or these sorts of things. So I think probably the next step of the evolution is having a bit more highlights per channel sort of thing. But I don't know, I'm hoping that maybe Slack AI finally gets good enough and it just solves all this for us so I can just be like, "Hey, give me a summary of what's going on here. Ping me every time someone talks about customer ADD because I know I want to help them."

Turner Novak:

Does Slack have any much AI stuff? I haven't even really noticed much.

Joseph Nelson:

They added Slack AI.

Turner Novak:

What is it even?

Joseph Nelson:

Will summarize threats for you, will give you more context. It's a really hard problem for Slack to know what one's organization wants to be shared across the organization. Do you have the hardest, what should go in the training data versus not, like what answer it? It's really actually an access to information problem of you're Slack of... Which, for us, generally it's kind of like if maybe we're well-primed for that problem because public channels help make that decision to some degree. But someone that, yeah, started a company in the past and wants to continue to have the autonomy and position to be successful and really likes to own talking to users, getting things to production and wants to be a part of a rapidly growing high performing group of people, that's the type of folks that often in that environment are the ones that do well here.

Turner Novak:

How do you guys do... What does your hiring or interview process look like? And we'll also throw a link into in the show notes, I don't know, Roboflow/careers, I don't know, whatever. I'll find it, I'll throw it in.

Joseph Nelson:

Yeah. Roboflow.com/careers. Every role says the process for that role. Generally, it's like a first sort of conversation to understand that someone's intent and what they want to do and their background is right, the way that we work their ambitions of when they would be able to work and some of those things. Then they'll speak with the person who looks after the team that they'd have the opportunity to join. At this point, there's a bit of a fracture after that of like technical screen for engineers and exercise for maybe go to market.

I meet every person and intend to for the foreseeable future to get to know how they think, how they operate. Often someone's had the chance to speak with others across their team by the time I get to chat with them. So I can kind of know areas that we'll want to dive into, whether it's culturally someone being successful or the given skill set area or maybe questions for me, this type of thing. One thing that might be notable about all of our roles that isn't fame at every company is we expect everyone to build with Roboflow during the process regardless of role. And we find that this ends up being a really good signal for someone A, like is excited by the products that we're building and what we're making into the world.

And B, just like interviews are very lossy to get to understand someone. I think you get a lot of culture signal, but you don't know what someone's work is like. And so by creating a shared thing that both folks work on or that someone's going to work on, you get a sense. And if we deliberately make it open-ended, because then what you get back is you get an expression of how someone interprets an open-ended challenge. If you build with Roboflow, show us what you made, why you made this decision, what you found intuitive, where you got stuck, where you got unstuck, do you have product feedback? And so that's a really useful way of... And frankly, you could do this for any... We could do this with go build with I don't know, notion or something and we could get similar signal just like how do you approach an unstructured task?

But even better, of course, that it's in the context of Roboflow because for the candidate, they also then are like, this company says they make things that are easy and intuitive and fun. Are they? Am I finding myself able to build and create and am I inspired by the stuff that I would be getting to work on? And so it ends up being a two-way assessment of is someone excited by the things that we think should exist in the world and do we get a sense that someone approaches their work with the seriousness and excitement that we hope that people who we have the chance to work with would do so?

Turner Novak:

Interesting. One of my tricks is you probably do a lot more hiring than I do, but my general strategy is you kind of give them some kind of assignment or action item and then I just tell them to follow up with me and just send it to me whenever they're done. And that kind of also gauges how quickly did they do it and follow up, like 10 minutes after the call or is it maybe the next day and it's really well done, or do they never get back to me? And it's like, all right, cool. I didn't spend any time thinking about this afterwards, I put it in their shoes. I'm just like, "All right, I want to see an example of something." I'm like "Go do it and send it to me." Because in that way, it's like, can you trust that person to just go do what they need to do and do they have autonomy? And I mostly want to hire people that I don't have to babysit. So it's like, "Go do this." And if they do it, then it's like, okay, cool, I can trust you to do that.

Joseph Nelson:

The interesting thing about being the babysit comment, what's interesting is if you talk to people, no one wants to be babysat and no one wants to hire someone that they need to babysit. So it's usually just a difference of what does that look like for both the person? The closest way to realize what one's expectation of what does it mean for me to not be babysat combined with when someone doesn't want to babysit somebody is to see an expression of one's work. And so if that work is close to the mark as you're describing, then it gives you a positive signal that this person works diligently, quickly and completely and you have high confidence that they will be able to be successful. So I think that's the thing that a lot of people miss about, that you don't think about with hiring, the matchmaking process is really what it is rather than an inherent good, bad value judgment. It's like, okay, great. Do we think similarly about what it means to do X thing? And if so, then great, we're going to be able to do a lot of same tasks together.

Turner Novak:

And actually, so I didn't think I was going to do this, but I was just thinking about this over the last two minutes. One thing I've been kind of trying to figure out is I'm looking for somebody to kind of help me sort of part time-ish just with a couple things on the podcast. So I've kind of talked to a couple of people about it and my open-ended question is literally, "What would you do to market the podcast? What would you do to grow it?" And it's open-ended. I don't even care what you do. Just do something and I'll see if I like it. And that's actually a weed out because a lot of people, they'll be like, "Oh, I'm really excited about helping you. I have all these ideas." Like, cool, just do one, and then not doing it, that actually weeds out a ton of people.

So if you're listening to this and you have ideas of how I should grow the podcast, seriously, reach out to me. I'm interested in finding somebody to help me out with this. I'm not looking for somebody full time. It can literally be a couple hours a week. You can be in school, you can be working at a startup, founder of another show. I don't care who you are, but it's something I've been trying to figure out. So good way to get through my interview process I guess is just do stuff. What did you get done this week?

Joseph Nelson:

Exactly. Yeah, agencies show up, do it well, be proud.

Turner Novak:

I think the secret too is I stick that at the very end of the episode where if somebody's listening, we'll probably be about a 110 minutes into the episode. If they're still listening, they probably like the podcast, so they might have ideas. Well, yeah, this is a lot of fun. Thanks for coming on. I feel like we hit a lot of good stuff.

Joseph Nelson:

Thank you for having me. A ton of fun.

Stream the full episode on Apple, Spotify, or YouTube.

Find transcripts of all other episodes here.

The Split

Discussion about this post