35:00•2021-09-15

How To Build A Startup That Sells For $1 Billion

What does it take to build a startup that sells for $1 billion? In this interview, Tomer Tagrin, co-founder and CEO of Yotpo, shares his journey of building a company from scratch to a unicorn valuation. Learn the key strategies, challenges, and mindset shifts required to build a billion-dollar business.

Startup GrowthUnicorn CompaniesEntrepreneurship

Guest

Brad Schneider

CEO, Nomad Data

Chapters

00:00-Introduction

03:25-Building a Billion-Dollar Company

08:12-Key Growth Strategies

14:30-Overcoming Major Challenges

19:45-Fundraising Journey

25:10-Building the Right Team

30:20-Advice for Founders

Full Transcript

Sean Weisbrot: Welcome back to another episode of the We Live to Build podcast. Today's episode is with Brad Schneider, the CEO of Nomad Data. Which helps companies to find data that is available on the internet, and they take it a step further by using artificial intelligence to help you assess the data in the way that you want.

Sean Weisbrot: So what companies can do is say, I want to find out how many cars are being purchased in China per year. And the engine will suggest different kinds of data brokers that you can communicate with. This is a really interesting concept because until now there were many different data sets around the world.

Sean Weisbrot: But without being able to apply context to that data, it's really difficult to know if the data you're accessing is capable of helping you answer the questions you want to have answered. I really enjoyed this conversation with Brad because while we did spend a little bit of time talking about his company and data and marketplaces, we did go beyond it into the future of artificial intelligence and quantum computing and a little bit of blockchain, and he's the right person to talk about this with because not only is he building a company in this area, but he was also an investor for many years. And so he understands it from the point of view of an investor as well as an entrepreneur.

Brad: The problem we're trying to solve at Nomad Data is connecting people that could benefit from using commercial data sets and those that actually are either selling or looking to sell data.

Brad: So just to take a step back, there are literally thousands of companies today that sell data, and that data could be anything from what people are spending on their credit cards anonymously. What containers are being imported or exported from a given country where people are physically located based on phone pings and, and many, many, many more different types of data.

Brad: And then on the other side of the world, you have literally millions of businesses that want to make better decisions. They want to understand their customer's journey better. They want to understand their competitors. Uh, they wanna understand perception of their own products. And they have no idea which data sets will help them do that.

Brad: So you can think of us as the search platform to make it very simple for those people to find the right commercial data. Brilliant. So how did you get the idea to do this? So my, my journey with data has been a long one. It started back in the early two thousands.

Brad: I had founded a data company basically helping corporates use their own data. And then I moved to an investment role where I was the one trying to figure out. How a company was doing, or you know, how a specific country was performing. I immediately felt the problem of where does this data live? I know what I want to do. I have no idea what data out there would help me or who's even selling their data.

Brad: And so I, I kept seeing that problem recur over and over again in my career. Most recently before Nomad, I ran a company called Adaptive Management. And that company was focused on, once you bought the data, how did you actually work with it? So we basically provided a single user interface to help people work with multiple data sets.

Brad: And the question that always came up is, what data should I use to do this? And we ended up getting more and more inquiries just like that. And it just kept hammering home the point that there is not an easy way to find data. And that's a seed that's been growing in my head for a while. And then I sold a company early 2020, literally a month before COVID started, done a little bit of traveling and then started to think about, well, what's next?

Brad: And this was a problem that had been burning and decided to, to spend the time of lockdown, beginning to build that business.

Sean Weisbrot: So it's interesting you talk about how it's really hard to find the data because as a user of social media, I think most people are assuming that these companies are collecting and selling it, not on their behalf, usually against their consent.

Sean Weisbrot: Are companies like Facebook allowing this data to get sent to the internet for companies like you to ingest? Or are they just keeping it for themselves and

Brad: Yeah, the, the big guys are keeping it to themselves. That is their competitive advantage. The fact that Facebook knows who you are, they know what websites you're visiting, they know what, uh, apps you have on your phone, and so everyone else in the marketing ecosystem wants access to similar data.

Brad: And, and marketing data is just a, a small sliver of the world of data. The problem goes back to the fact that there are thousands of vendors. You, you bring up things like Twitter and sentiment. There are literally, you know, 50 to a hundred that I know of. Data providers that just sell sentiment based on, on Twitter data.

Brad: And so if you're somebody that's trying to solve a particular problem, where that data will help, where do you even start? You have a list of 50, and that's assuming you know exactly what kind of data you need. If you are a management consultant and you're working on, let's say, sizing a market, you have no idea.

Brad: Is it sentiment data we want? Is it web traffic data, credit card data? Some data have never even heard of before. And so that's where the problem exists. Is, it's so overwhelming. You know, most of the approaches to date to solve this problem have been basically to put giant lists in front of would be data customers.

Brad: Here's 5,000 data companies go at it. And that's not an easy thing to do, right? It's like going to a diner and having a menu of 500 things. How do you even choose? What do they all taste like? What will I actually enjoy? Uh, and the problem with data is much more complex than that because a given data set can be used in so many different ways.

Brad: So even if the data set you're looking for is on that list. Chances are the description of it won't really help connect you to the problem that you're trying to solve.

Sean Weisbrot: Now, I know internally, uh, my company's developing something that will allow for internal searching of like other users and information about those users and the companies that they're a part of.

Sean Weisbrot: I. Kind of going towards like the way LinkedIn does things, the actual means of search and returning results can become quite complicated when you have a number of columns in your dataset. So is there a specific strategy that companies should be thinking about and how to sort and provide results for these things?

Sean Weisbrot: How are you handling that?

Brad: So internal data is a completely different beast. Some of the problems around internal data are shared with what we call external or alternative data, which are these commercial data sets that you buy. So a lot of people try to characterize data by what columns it has, what format those columns are in, what's in the actual cell in a, in a given database table.

Brad: But that doesn't really tell you what the data can do, that that more tells you what the data is. And just to give you an example, you know, we've done, uh, a lot of. Connecting with consumer credit data. So think of, you know, the equifax's, the, you know, other credit bureaus of the world. They're storing data on all of the loans someone has outstanding what, what their mortgage payments look like.

Brad: You can look at those fields, you can look inside the field, you can look at the column headers, but that doesn't tell you that you can use that data to solve things in completely different spaces. For example, I might wanna look at the health of the auto loan market in Arizona. You know, I might wanna understand what's going on with used car prices.

Brad: Nowhere is that gonna be obvious from a bot. Looking at that data, that's a complicated leap. So the route that we've chosen to take is one that combines man and machine. So we collect data on what these data sets can be used for, and a lot of that comes through the searches themselves. So as someone comes to the platform and looks for data to solve a use case.

Brad: It gets presented using our AI engine to a number of these data companies, and they basically tell us, can our data be used for that? And then it goes to the client who also votes whether they want to connect with them. And so with each completed search, we learn new use cases, new metrics, new characteristics of these data sets.

Brad: So we grow in intelligence. It's not too dissimilar from a search engine. Google does the same thing, right? It's seeing searches over and over and over again, but it's also. Seeing, well, based on the search you put in, here's what you actually clicked on, right? So the, the models are retrained periodically based on that data, and so we're, we're taking a similar approach.

Brad: With this world of data, let's let the end user on both sides of this market train these models so that they get smarter and smarter and learn more and more about what each data set can be used for. So

Sean Weisbrot: it sounds like machine learning models are really the only way to make sense of data at scale.

Brad: It's certainly one approach.

Brad: You know, you can use machine learning as a catchall for a lot of different types of approaches, but at the end of the day. You can't rely on humans to remember large quantities of information. In general, human recall is, is very low. And, and just to highlight an example, you know, we've got, uh, approaching about a thousand data companies in our ecosystem and we had a client come to us, put in a search for understanding a piece of the market in China, and then they ended up getting matched to a provider who ultimately they realized they had already spoken to.

Brad: They had already literally sat down for an hour meeting with this company, but they meet so many companies they couldn't remember. That's a common problem. Humans just can't remember all the different permutations, all the different things each of these data sets can be used for. It's also a moving target.

Brad: The data that these providers are selling is changing over time. What companies, what geographies they cover. It's all changing. You

Sean Weisbrot: said it was one way to go about it. What are the other ways?

Brad: I mean, there's more basic systems such as filtering. Uh, there are some companies that tag different data sets with attributes.

Brad: So I might tag geographies that this data, uh, company claims to cover. I might cover certain types of data. And so then you rely more and more on the human to know what geography they want. What specific type of data they want. Uh, and so the human mind has to be trained to know how to, to do a part of that problem.

Brad: And then you can do the last piece using a, a basic string search. But obviously that doesn't help you with the use cases. It doesn't help you with anything that's not embedded in the data set or in the metadata about the data.

Sean Weisbrot: So I guess I'll go back to my point of it sounds like artificial intelligence is the best way to handle this.

Brad: That's certainly what we think. So we, we put our eggs in that basket. I.

Brad: We're, we're focused on using natural language processing, which is a branch of ai, to understand the searches, to understand what we're learning with each search, and then we're using more traditional machine learning to train models based on what people end up engaging with.

Sean Weisbrot: So you've mentioned a lot of specific types of data, like the number of cars being purchased in China or something like that. How does a company get that sort of information to begin with to then be capable of putting it on a marketplace to sell?

Brad: So there's a lot of companies where their core focus is selling data.

Brad: So they may get that data through an arrangement with someone who's producing it. They may get it by just web scraping. Web scraping is an enormous industry. You know, there's literally dozens of companies that scrape used car prices. They scrape listings, they scrape airline ticket prices, you know, every couple of minutes they're scraping, restaurant reservation.

Brad: And then, so then you've got companies where data is an exhaust of their core business. So think about the credit card companies. They're, they're seeing every charge that you make. Uh, they're not in the business of selling data. They may partner with somebody who does, and they're very sensitive about specifically what they license because they don't want to break user trust, you know?

Brad: Then there's, you know, other things like just a, a process. So think about imports. So any container that comes into the United States. There's form filing requirements that have to be done. All those forms are digitized under the, I believe it's the Freedom for Information Act, and they're publicly available.

Brad: Uh, granted you have to buy them from the government, but they're, they're part of this process that's sort of built into something. And similar processes exist in areas like, imagine you're doing renovations to your home. You have to file a permit. Permitting offices will digitize those permits. Other companies will then partner with all the different permitting offices around the US or around the globe to aggregate scan in, you know, optical character recognition of that data and be able to produce pretty interesting data sets on what is the volume of construction by state, what is it by city, what types of renovations are occurring.

Brad: There's really a lot of different touch points and, and the internet is also an amazing one. Every time you type a webpage into your browser. There's a whole chain of people being made aware of that request, and so someone in that ecosystem somewhere might choose to productize and sell that data.

Sean Weisbrot: Yeah, that's why a lot of people use VPNs and privacy browsers and logged out mode and incognito mode to try to just hide and protect, you know, what they do and, and how they do that.

Sean Weisbrot: Do you think that has actually any benefit whatsoever or are people just kind of being lied to and, and feeling good about it?

Brad: It definitely reduces your, let's call it digital footprint, but to really completely remove yourself from the grid requires, you know, one, you've gotta basically turn off your phone or you've gotta have some sort of a special phone that's not leaking anything.

Brad: You can't go on the internet. Even, even if you're using a VPN there, it's a much harder record to get. Still, the fact that that communication happened is not a complete secret, you know? But I think people can rest assured that most, I mean, I've been in this space for probably 10 years at this point, and the, the kind of information being sold is mostly not at a.

Brad: A personal level, it's, it's been highly aggregated. So if we're talking about credit card transactions, people are not seeing that you, Sean went to Home Depot and bought these five things. You know, someone is buying the fact that overall, across all US stores, home Depot sold X amount of dollars today, yesterday, a week ago.

Brad: Same with sentiment data. No one really cares. For the most part about one person's opinion. They care about the aggregate of millions of people's opinions. So those are the cases where I've seen most of the data monetized. Now on the flip side, marketing does want to know exactly who you are and they don't necessarily need to know your name, but they do need to know some way to target you.

Brad: So it might be your home address, which may be they actually have, or maybe there's a third party company that links the message they wanna send you with your address so the people marketing to you don't actually ever know your address. Even the post office does this, so keep that in mind. Then there's other cases where there's a cookie on your browser, but they don't actually know who you are.

Brad: You're just someone who plays football or likes to watch golf, and they use that information to target. They're not actually after your name and your, and your phone number necessarily.

Sean Weisbrot: All right, fair enough. Well, hopefully that makes people feel a little bit better. I do have an example where a friend of mine and I, we were talking about like dating Asian women on WhatsApp, and then he got.

Sean Weisbrot: An ad on Facebook for like asian brides.com.

Brad: I've heard similar instances of that. Facebook claims that they're not using the WhatsApp data. Some people think it's Apple listening on your, on your microphone. I, I can't speak to what they're doing. That would seem like an overreach, but certainly they're doing that in Gmail and through other mediums. So anything's possible

Sean Weisbrot: in the instance of commercial data or in marketing data. Right. So they obviously these are two separate. Spaces, do you think that people will actually get paid for the data that they put into the systems through their usage of them? Like some blockchain companies claim they're working towards?

Sean Weisbrot: Or do you think at scale it's quite nonsense and people will never see that happen?

Brad: I hate to say never, and I hate to make long-term predictions 'cause I've been wrong in many cases. But, but from where I'm sitting, it just doesn't really seem feasible unless you're providing very rich information about yourself. So, for example, if you were filling out, you know, a five page profile on your interests and your job and your job history and your salary history, then all of a sudden you are high value to target. So you could imagine getting paid enough that that would be worthwhile. I've seen a lot of people talking about doing that.

Brad: I haven't actually seen a quality implementation so far. It doesn't mean it doesn't exist, but unless you're gonna fill out something like that, the value as an individual is pretty low. And the complexity to actually aggregate that many people. Is extremely high. So it would be an enormous challenge, I think, to do that.

Brad: I think that the marketing ecosystem would love if there was a company that was doing that well and, and was actually paying people a, a rate that, that it made sense for them to participate. You know, the, the problem is whenever you recruit people for anything, you, you introduce a bias. If you are paying people to do something, you are enticing people that that amount of money is worthwhile.

Brad: Those people don't mind giving up information about themselves, and that's one type of person. There's lots of types of people, and if you only have access as a marketer to one type of person, unless it's exactly who you want, that audience isn't that valuable to you. You know, if I'm a financial services, let's say I'm a an investment analyst, I can't really use that panel's behavior to predict many things because of the inherent biases in it.

Brad: That's a really good point. I had never thought of it like that. Yeah. The biasing problem comes up over and over again.

Sean Weisbrot: It's something really common in psychology, but I, I never really put those two concepts together, like psychology and data.

Brad: Yeah. I mean, it's been a problem historically for many studies, you know, forgetting about the internet and the, the kind of data I'm talking about, but it, but you know, whether it's medical studies, it's behavioral studies, the, the biggest risk is that you introduce a bias that you're not aware of.

Brad: And, and many of these studies, you know, are, are taken as truth for decades until someone uncovers that there actually was a massive bias. How these panels were collected in how the test was done and how the questions were asked. And so you later find out that you haven't learned anything about the thing you were testing.

Sean Weisbrot: Yeah. There's been people talking about cannabis and LSD and other sorts of medications that are being tested to see if there's real therapeutic benefits where. The backlash is coming from people looking at those studies and going, wait a minute, the person who did the study, it wasn't actually a double blind study, or the person funding it is like an anti-marijuana, you know, company or a senator or somebody like that.

Sean Weisbrot: So definitely there's a tremendous amount of bias everywhere you look and it's not always easy to figure it out at first glance.

Brad: Yeah, there, there's, you know, going back forever. You know, there's always a lot of bias in how measurements are made. Uh, a party may be incentivized to come to a certain conclusion, right?

Brad: Data that supports the conclusion is rewarded. People are praised, they're promoted, and, and data that disagrees with the, the conclusion people are looking for is punished. And so you end up just, you know, with data that's reflecting what the underlying bias was, which isn't actually helpful at, you know, proving or disproving whatever the study was about,

Sean Weisbrot: right?

Sean Weisbrot: Like the sugar lobby in the 1990s. Doing a series of studies that made fat look like the problem, to get everyone to stop thinking that sugar was the problem.

Brad: The problem with any study is if you are looking at something and you already believe you know the outcome, it biases the tests you do so dramatically.

Brad: Some other examples, I think in recent American history, if, if we remember the search for WMDs, we already decided there were WMDs in Iraq, right? And so if you were an intelligence agent and you didn't find evidence. You weren't doing your job. If you did find evidence, you were praised, rewarded, promoted.

Brad: So obviously the conclusion was that they were WMDs. We find out later there were no WMDs. And that, that that's played itself out throughout history, um, in different medical studies and the way governments behave and the way intelligence is gathered.

Sean Weisbrot: Yeah. I, for one, never believed that there were nukes in, uh, any of those countries.

Sean Weisbrot: It was quite obvious, you know, middle East, it's gotta be oil, but there's no reason to turn this political. I'm curious to know. Is there any way for you as a platform to ensure there is no bias in the data sets you collect or you helped a broker?

Brad: That's a really hard problem to solve. I think that's one of the oldest problems that that's existed when talking to other humans to try to come to a conclusion about something that's really outside of the, the, the scope of what we're trying to do.

Brad: I guess one thing that we do plan on doing is, is collecting feedback over time. 'cause we can see who has engaged with which data company and so we can go back later and get their opinion on what the bias is. But you know, even that opinion is biased, right? It's biased by what they tried to use the data for.

Brad: It's biased by their ability to manipulate data. So all of these things have to be taken with a grain of salt.

Sean Weisbrot: So basically the only way to remove bias is to use artificial intelligence. Although hopefully the artificial intelligence model hasn't been biased by the person who trained it, it's,

Brad: it's absolutely biased by the person that trained it and by the data that was put into it.

Brad: You know, the, the beauty about AI machine learning is. We can hopefully do tests over time. Hopefully we have some truth. You know, there, there's a lot of issue in cases where we don't actually know the truth. Uh, those are not as good cases for machine learning and ai, right? If we're looking at a stream of people's sentiment about, you know, a new product that comes out, but we don't actually know how the product does when it comes out, there's no way to compare that machine learning model, that output to the reality.

Brad: So this is a problem in data that we talk about a lot, which is called ground truth data. If you don't know what's actually happening, it, it's really hard to validate the data that's coming in. It's really hard to learn when it's wrong. For example, we were trying to validate data in China years and years ago, which basically tracked people's foot traffic.

Brad: So where, where people were located based on on cell phone locations. And so the question was how do we actually figure out. How many people were at this location to know if this data is remotely accurate, how do we test the validity? And so we ended up finding the locations of soccer stadiums in China.

Brad: The Chinese government reported attendance numbers and we used that as a comparison, and later found out that those numbers were heavily biased, uh, depending on different political situations in the different provinces and cities. It's an ongoing challenge.

Sean Weisbrot: Having lived in China for 10 years, I think it's safe to say.

Sean Weisbrot: It's impossible to trust any source or any piece of data that comes out of the Chinese government ever.

Brad: Well, I'll have to take your word for it again. I don't have the, the ground truth,

Sean Weisbrot: right, and I don't have it either, but I just know, for example, when they say that we have 7.8% GDP growth for the last 10 years straight.

Sean Weisbrot: I'm sorry, that's just not true.

Brad: That's one of the beauties of what we, what we're calling alternative data, is that it provides you a way to fact check things like that. Right. So if I am curious about the validity of a number like GDP growth in a country, I can use shipping information. So I can actually track the number of ships going into their most heavily used ports so I can see what's going on in Shun or Shanghai or one of the other large ports in the country and say, okay, well do these numbers match up?

Brad: Right? Is is the growth of imports or exports or some other metric changing? By a similar amount. So that, that's sort of the opportunity around data is, is it, it gives you something to go on, which is more quantitative.

Sean Weisbrot: Isn't there a way for the Chinese government to fudge the ship data coming in and out of each port?

Brad: It really depends on the source of the data. So some sources of data can be more easily manipulated than others. So there are what are called bills of lading. So the, these are the government documents that get filed. When a ship comes into a country, leaves a country, could those be manipulated?

Brad: Absolutely. Then you have forms of information, uh, such as a IS data. So every ship around the globe has, you know, think of it as a collision warning system, a ship to ship communication system, and these are reporting the locations of the ship, the name of the ship, and this is all over the globe. Fudging that is very challenging, right? 'cause.

Brad: You see the ship, the government doesn't really control the a IS beacons and it's so far removed from something like GDP, but if you know what you're doing, you can piece these things together and start to come up with pretty interesting metrics. So, you know, things like satellite imagery, we're taking a.

Brad: Images of factory. I talked to a company yesterday that they, they, they take satellite images. They compute the volume of cars, the volume of people at a factory and how it's changing over time. Very hard for the Chinese government or for any government to manipulate, uh, those images over a long period of time.

Brad: Right. You have to know exactly when the satellites are gonna fly overhead. People are changing, which constellations of satellites they use. That would be an enormous amount of work for unclear that it would be that high of a benefit.

Sean Weisbrot: If it's difficult and improbable, not impossible, but improbable, that people will end up getting paid for their data, then it seems like as we go forward into the next decade or so, that corporations will continue to amass data and be in control of it.

Sean Weisbrot: So it's kind of like the current system's not gonna really change for at least another 10 or 20 years. Does that sound fair to, to assume?

Brad: 10 years is a very long time, but from where I'm sitting, the, the main change is around privacy. So there, there have been a lot of changes. Some of them are government driven, but actually a lot of them are, are company driven and, and maybe they're, they're being enacted for the better of humanity, or maybe it's for the better of, you know, a specific technology company.

Brad: Like for example, if you're an iPhone user, they've begun prompting you much more often about, uh, whether you wanna allow an app to do something. So one of the common things is to see what other apps are on your phone. So it used to be the case that any app. On a phone could see every other app on that phone.

Brad: And, and there was a reason why that made sense, which was, you know, if I'm writing an app and my app is crashing all the time, I kind of wanna figure out which configurations on the phone are causing it to crash. Uh, and there, there are other valid reasons why that information is very useful. But then you see many app companies start to sell that as a data source where they'll report the number of installations for a given app.

Brad: Across millions and millions of phones, and then Apple lashes out and says, no, you can't do that. That's breaking our terms of service. And that might be, again, because they, they wanna protect the consumer, or it might be because, well, they don't want Facebook to get access to this data, or they don't want, you know, an advertising competitor to see it and, and potentially launch a competing product.

Sean Weisbrot: How does artificial intelligence coupled with data create business intelligence?

Brad: If you have a big data set and, and you're trying to understand something about your own business. Our artificial intelligence is a great way to do that, right? You can feed it a stream of, let's say, previous business outcomes. Let's say you wanna know if your competitor is doing something and you have a lot of evidence of when they did that thing in the past and when they did not. And then you have some other input streams. You're measuring the number of employees they have, you're measuring how much product they're ordering from different companies, and you could use AI to figure out.

Brad: When there's some sort of a turning point in a business, you know, I'll give another example. In economies, you know, last year there was a lot of uncertainty. You know, as we think about March and April, everybody thought that the US and global economies were basically in the toilet. And that a, a short-term recovery was extremely unlikely.

Brad: And what we saw in hindsight is that the recovery happened almost immediately. And so, you know, if you were tracking very high volume indicators and you had a measure of economic activity going back 20 years. You could use AI to very quickly figure out, you know, what, what people are talking about in the news is not right.

Brad: The economy is recovering and it's recovering faster than people think, which, you know, changes a lot of things. If you are a a manufacturer, you might want to start ramping up your factories. If you're an investor, you might. Want us to start making, uh, investments around companies that will benefit from a reopening.

Brad: And so I think that's an area where, where AI is, is really helpful because humans cannot look at the volume of data that's coming out, even if it's well understood data, uh, they need assistance to know, Hey, there's something going on here. You should take a look and, and what that means. And so that's, that's how I see most people using it today.

Sean Weisbrot: Will there become a time where there's so much data available that the current levels of AI will be incapable of processing them properly, that they just get overwhelmed?

Brad: I don't think so. The issue there is that, so you've got Moore's Law sort of at the, at the tailwind of all this, so. It's impacting storage costs.

Brad: It's impacting compute costs in a, in a very similar way. So as the amount of processing power goes up, you end up producing a lot more data, which also, you know, you're seeing a similar reduction in cost for storage, similar increases in speed for storage. So I don't think it's the volume that will be a problem, because I think our processing power and our efficiency and speed around storage will increase at a rate to compensate.

Brad: I think the bigger issue, again, goes back to having that ground truth. Right. I can collect tons and tons of data, but it might not be the data I actually need to understand what's going on. It's like we, we still don't definitively know what the mortality rate from COVID is. There's tons and tons of data, but it all has very different biases. So it's not a, it's not something that's easily solvable with more data. It's more about having the right data than having more of it. Yeah.

Sean Weisbrot: I mean, the Spanish flu people still aren't sure if it's 25 million or a hundred million deaths from That's over a hundred years ago.

Brad: Yeah, I mean these are, you know, I hate to call 'em experiments, but, but nature is running these experiments at a very large scale, but we don't have the instruments to, to really measure what's going on.

Brad: You know, and another great example is genetics. We have tons of genetic information. Now we're, we're sequencing genomes at an enormous rate. You know, services like 23 and Me are, are tracking complete breakdown of. Every single base pair in a, in a human body. But the problem is we don't have every single one of their medical histories.

Brad: We don't know everything they ate every single day of their lives. We don't know what level of stress they had. So going from that genomic dataset to making conclusions about certain drugs or certain diseases or certain likelihood of reaching a certain height or reaching a certain age, we don't, we just don't have the data to actually figure it out. So we have a ton of data, but we don't have. The data we need.

Sean Weisbrot: So that's actually a really interesting, uh, vein of thought I'd like to go down, which is personalized medicine. I've been hearing that. Uh. Gene. With gene sequencing and artificial intelligence, we'll be able to create personalized medicine where, oh, we see, based on your medical history and this thing that like this is the diagnosis you have and this is the dosage of this pill that you need, and then your problem will be solved, blah, blah, blah.

Sean Weisbrot: How long do you think it'll take for us to get there?

Brad: I think we'll start seeing fruit borne probably in, in the next kind of 10 to 20 years. You know, again, this goes back to a problem of we don't have enough information. Yes. We can, let's say sequence someone's cancer, right? E every everyone's cancer is different, right?

Brad: It's, it's something your body produces. It's mutating at a very high rate and that gives it different characteristics. But just 'cause we can sequence. A protein, we don't even know necessarily how that protein folds. If we don't know how that protein folds, we don't know how it interacts with other enzymes and other proteins and other structures in the body.

Brad: So there's still an enormous information amount of information that we, we don't have, but in, in certain cases where it's a lot simpler and it's something that's heavily observable. Then we can draw those conclusions. You know, for example, eye color. Eye color is controlled by a very small number of genes.

Brad: Um, and we can observe what someone's eye color is. It's very easy to to know. Um, it's harder to know things that are going on inside the body, across, you know, millions and millions of samples to figure out. Well, which base pairs control your susceptibility to severe COVID infection? You know, maybe it's involving thousands, you know, hundreds of thousands of, of amino acid based pairs without understanding how these structures ultimately work.

Brad: Having the sequencing doesn't really help you. Uh, it doesn't, it doesn't solve the problem completely.

Sean Weisbrot: So do you think artificial intelligence alone will be capable of that at some point once we figure out how to use the data properly? Or will it require an upgrade from something like quantum computing to be able to really process?

Brad: Well, certainly there, there are cases where faster processing will help. Certainly protein folding is, is one, uh, the number of calculations have to be done is, is staggering. Of course, if we do invent quantum computing and it, it does lead to that level of a breakthrough, then there's a lot of other problems that it might cause.

Brad: For example, blockchain breaks down wifi, any sort of internet security breaks down as all the keys. Are really based on certain math problems being really hard to solve. If we make those problems easier, then we introduce new problems. So certainly faster computing would help, but there's still knowledge that we need with all the computing power in the world, you know, we won't be able to defy gravity, right?

Brad: There needs to be a breakthrough in understanding of that in order to even know where to focus the problem. It requires more than just data knowledge and understanding of how something works is important, and really machine learning is trying to shortcut that. Saying, you know, we don't understand how something's working and let's just, for the most part, forget about how it works and let's try to understand how we get from input to output.

Brad: The black box in the middle, we can't really understand. We can't see into, but we understand given, you know, a certain set of inputs, here's what the expected output is. So given a certain set of jeans, here's what the expected hair color is. We may not understand how a protein sequence actually causes.

Brad: Hair to be brown versus blonde, but we can predict that it will be blonde. So it just depends how much knowledge is needed in the middle. And in some things a lot is needed and in some not much.

Sean Weisbrot: I've heard it be said multiple times over the last few years that machine learning models are black box, as you just mentioned, and it's quite fascinating how.

Sean Weisbrot: The people who are the most knowledgeable about how these things work still have no idea how they actually work. Why do you think that is? And, and how do you think we can come to a better understanding of, of these models?

Brad: I don't know that we can come to much better of an understanding about what's going on in the middle or what that means because there's so many little things happening, right?

Brad: You know when, when I decide I wanna lift my arm. I have no idea what's really happening at a cellular level to make that happen. But voila, my, my arm just moved. Do I need to understand, you know, all the electrical impulses and how the muscle fibers worked and what happened at a nano scale? I. Not necessarily, uh, but for certain things, one does.

Brad: And I, I don't think machine learning is, is the approach to that. I think basic science is the approach to understanding a lot of these things.

Sean Weisbrot: So as we get closer to potentially an artificial genital intelligence or a super int intelligence, something that's more intelligent than a human, shouldn't we be concerned about not understanding how the hell they work and think and act if they're conscious at any level?

Brad: I mean, I don't understand how you think and act. Right. There's how many trillions of neurons firing. I don't really understand what your thought model is. I can't, you know, run it on my computer and test things out. So I think we're already used to that.

Sean Weisbrot: And I guess, uh, from a human to human point of view, we have, I. Collectively several million years of unconscious experience with our ancestors and, and their ancestors and this like, and how they interact. And while surely Yeah, it's hard to tell by just, uh, looking at someone how they think and how they're going to act. We're at least we've come to a point where socially we're comfortable with the idea that there's a.

Sean Weisbrot: Good chance this person's not going to hurt me. That's kind of what we've come to accept. But when it comes to an artificial intelligence that doesn't even have that benefit, how can we trust that what they're saying and doing, if they're smarter than us, that they're not just manipulating us completely from the beginning to make us think that we can trust them.

Brad: I mean, isn't that the case with anything unknown? You, you just don't understand it. And you need to build up that shared history together to understand is this something to be trusted? Is this something to be feared? And just like a, you know, machine learning model, you're gonna see different things that you did and and how this intelligence acts.

Brad: And then you're gonna build a model in your head of. You know, oh, if it says this, I shouldn't trust it. Right? That, that's the same math going on in your head. Uh, with every new person you meet with, every maybe strange situation that you find yourself in that you haven't been in before.

Sean Weisbrot: Have you heard of the robot?

Sean Weisbrot: I think her name is. Either Sophie or so is made by a Japanese lab.

Brad: I have not.

Sean Weisbrot: Sophia is considered to be one of the more social, more capable of communication at a deeper level with humans. Type of a robot. And she's kind of paraded around the world, uh, in that regard. And there was a time that she messed up and she said something like, kill all humans.

Sean Weisbrot: And then she's like, ah, I'm just kidding. I'm just kidding. The question in in my mind was, were you saying that by accident and then. Making a joke of it? Or do you really think that, or like, is that, you know, is that the bias, this joking mentality given to you by your creators? I

Brad: dunno. I mean, it's a risk, right?

Brad: If you give anything ultimate power, you're, you're putting yourself in a very risky position. And there's been many cases where, where we put humans in that position and it's. Worked out and many where it didn't work out so well. So the same will be true of ai. I think we're gonna have to be very thoughtful about giving it the keys to the kingdom, giving it the nuclear codes, you know, putting all our infrastructure in a situation where it's relying on this ai that's a risk.

Sean Weisbrot: What we see now in the, not only the electrification of transportation, but also making things autonomous. Uh, cars, trucks. Probably planes will come soon. I imagine we've already done this with drones. If by providing data to artificial intelligence models and allowing it to make more and more decisions on behalf of driving a car, driving a truck, et cetera, that it's a slippery slope towards.

Sean Weisbrot: Allowing AI to control society because if humans are so bad at it, surely an AI with no emotions will be capable of making better decisions on how to organize resources and manage people. Is that something to look forward to or is that something to be scared of, do you think?

Brad: I think it's mostly something to look forward to this.

Brad: This is no different than the problem with people again, right? You, you need to have created a structure where you can remove that AI if there's a problem. That there is a fail safe, you know, in, in democratic societies, we can remove one person and we can put another person in their place, right? We're not wed to one thing.

Brad: The system will function without that specific person. You need to think about how to architect these systems that are dependent on AI and make sure if for some reason the AI needs to be turned off, there is a bug with it, that it can be replaced and the system can continue to function. But you know, I think driving is a great example, right? You have so many people making independent decisions. Which is massively inferior to one thing guiding all traffic, right? Imagine every router on the internet had a different opinion about how to. Route packets and you know, I, I don't know. I don't like that packet. That packet cut me off. You know, imagine how the internet would work. But when you have an over overarching system and a set of rules that each of these, you know, different components behaves by, then you get a more efficient system. But we have that system in the internet today, and the internet doesn't necessarily think for itself. So it's not a danger, uh, but if you put something or someone in charge of it that could introduce a risk if their intentions are bad.

Sean Weisbrot: I wonder how possible it would be to actually create a failsafe for an ai. If that AI is so good that we trust it to make those kinds of decisions, that by the time it gets put in place, it's already capable of discovering its flaws and rewriting them

Brad: well, then you need to make sure you don't let the AI rewrite the way that it thinks.

Brad: Again, it's a design decision, right? You can build a system where the AI is generating the data for the model, and then if there's concerns about the ai, you, you shut off the AI and you use the last working version of the model. Uh, but if you give the AI full control over, which is the live version, and you let it rewrite its own code, then it can do whatever it wants, which may be.

Brad: Good. May, may be bad,

Sean Weisbrot: but if it's smart enough, can't it figure out how to rewrite itself even if we don't release it with that capability?

Brad: No, I don't. I don't think so. Make an AI as smart as you want. Does that mean that it can figure out, again, how to defy gravity, how to travel through time? Some of these things are just physical challenges.

Brad: You can't do something that isn't physically possible. So if you design a system where the AI can't physically do something, I think the risk of that is much lower. Well, it's something to think about.

Sean Weisbrot: Is there anything else that you wanted to talk about or anything that we didn't discuss that may be relevant in helping to kind of close this out?

Brad: Well, I think that as more and more data becomes available, we're gonna think of more and more use cases to use it. I think it will. Train better and better ai, it will allow for more and more use cases that we probably haven't thought of.

Brad: And it's a virtuous cycle. So as we come up with new applications and the space sees more revenues come into it, you'll see more and more data, which will feed more and more applications and that this ecosystem will really take off. And there's been some friction over the last. Couple years, but I, I see a lot of that giving way to, uh, a renaissance in, in this space