Tesla, Waymo, & Lyft: Hurdles to Autonomous Driving

Tell us about your role and responsibilities at Lyft today, as well as at your previous role at Google?

Today at Lyft I lead the team called Lyft Level 5 which is focused on building self-driving cars, focusing on the specific needs of ride-sharing. Three and a half years ago, I founded this group and have been leading it ever since. Prior to that, I was at Google for over 12 years. I came to Google to work on Google Books. I was intrigued by the ambition of Google to scan all the world's books. At the time, they encouraged engineers to have side projects called the 20% Project. I got interested in a project that was a collaboration between Stanford and Google, headed by Larry Page himself.

Larry Page is one of the co-founders of Google. The idea was to collect imagery at street level from vehicles driving around. His vision was, essentially, to not only bring the web to people, but the world. He wanted to collect data at street level and make it useful in some way. Digital was still a bit fuzzy, but aligned with Google's overall mission. Since Google was growing very fast, there was nobody at the company at the time who had spent any time on this project, so I took that on as my 20% Project. I got some interest from seven interns who helped me out over the first Summer.

We put together an end-to-end demo of what something like this could look like. We hacked a car together with a bunch of cameras, Lidar and hardware, and then got help from the Google security team to drive this car around collecting data. We established an early, though clunky, way of getting data from the cars to a hard disk plugged under somebody's desk. After uploading the data, we put together an end-to-end pipelined computer vision to make this data useful. A couple of weeks later, I gave a tech talk, followed by a review from the VP of engineering, who thought it was interesting and gave us the go-ahead to make this real and hire people.

We hired engineers to work on it, and launched in five cities about a year later. Why five? From day one, we wanted to show that the project was not just a California experiment, but an ambition to grow beyond a single city, so we chose five cities spread across the US. We only had a small amount of data for each city, but still useful. From there we wanted to see what happened. It was a brand-new product space and nobody had launched street level imagery before in the context of maps. We received record traffic and press interest in the product and, from then on, we knew there was demand and that it was going to work.

We grew this project from an early stage experiment to something that could be scalable. My focus over the next few years was that scale; to make it real, more robust, and to essentially rewrite everything about the software we had built because everything was clunky. It was all about moving fast and not about having something super reliable. Along the way, my career grew, with the team growing from a handful of people to over 100, with global operations. After that I also expanded my scope of interest to be involved in different kinds of imagery – captured from airplanes, satellites, even user's cell phones – with the mission to make sense of all this imagery in the context of maps; to drive data for maps. It was not only about presenting images to users through Street View or Google Earth, but to essentially go from raw pixels to knowledge and structured data.

Today, Google and other company's maps are built automatically by data mining imagery and extracting the corresponding street signs, street names and house numbers; all the information required to make a map. We pioneered this along the way and expanded the scope to derive knowledge from other kinds of imagery at Google.

After collecting data via Lidar or radar in the early days, you expanded to use many other sources?

The Street View car was primarily about imagery, but from the very beginning we felt there was going to be interesting information that we could not easily derive from the imagery, which was about 3D of the environment. We wanted to be able to give users a smooth transition between panoramas. Imagine your Street View Images are like a bubble; navigating from bubble to bubble can be very jarring if there is no transition. Context is lost and so, in order to create smooth transitions from one bubble to the next, the first bubble needs to be warped in a way that it moves to the second one, which involves an understanding of 3D. The imagery is projected onto a course 3D of the environment and the user is moved smoothly to the next bubble.

In theory, you can construct a 3D from the raw imagery itself through stereo, but it is complicated and does not always work. To aid us, we placed Lidars on these vehicles. In the early days of the project, our Lidar had a single line, but over time we moved to much more sophisticated puck-style Lidars that gave us more information. It was primarily about imagery but Lidar helped.

What are the differences in data quality between Lidar and radar?

That is completely different and we can talk about this when we move into the realm of autonomous driving. Lidar is very rich data; it is a point cloud that gives you distance at every point in a scene. That is if you talk about Puck-style Lidars, which are rotating lines like this. Radar is very different; it is raw data and very hard to interpret. It gives you distance and velocity of a dominant object in the scene. New generations of imaging radar, which are currently becoming commercial, give something that looks like a course image of objects, in addition to distance and velocity.

The two have different characteristics in terms of how they react to weather. The beauty of having Lidar, radar and imagery on an AV is that you have something working well, regardless of weather conditions and for all the situations you need to be able to deal with, so they are complementary.

Does Google Maps give Waymo an advantage in AV?

Some advantage yes, but there is a difference between traditional maps like Google Maps which are mainly for people, and AV or HD maps, designed for machines, robots and AV. Humans do not require the level of detail an AV does. We have a lot of experience intelligence to map the course information into the world we see around us and navigate. AVs are different, and need to do a lot in a very short amount of time.

They need to understand the world around them, what other agents are doing, like pedestrians, other cars and trucks, and plot a safe course of action. Doing this all in real time is very expensive. If, on top of this, you also need to understand the static environment, lane configuration, intersections and stop signs, you need to do even more work, and you are CPU limited. Today every AV company relies heavily on HD maps, which are a pre-recording of the static environment pre-loaded into the vehicle. These maps are a 3D model of the environment, plus metadata that matters, such as the exact location of stop signs, lane boundaries and traffic lines.

Is all that metadata proprietary to each company?

At the moment, yes, however there are companies that are trying to build a business in providing AV maps to the industry. The first one was a company called HERE maps that have been trying to go into this business and provide HD maps. There are others like DeepMap, founded by former Google engineers, catering to the AV industry. There is, however, no agreed upon format or level of detail – some companies operate online in real time, versus on a map – and people place boundaries differently. So at the moment, it is still the Wild West in HD mapping. In future we will see more consolidation or standardized formats, but that has not happened yet. Going back to your question about Google, Waymo uses some of the Google Maps for the underlying structure of the streets, but they augment this with their own HD mapping efforts.

Can you lay out the AV stack from hardware to the behavior and perceptions?

Think of it as a layered set of components. At the lowest level is the vehicle you need to interface with, which typically requires deep integration through vehicle components. This is so you can programmatically control the vehicle, essentially tell the vehicle to apply the brake, turn right, turn left, accelerate, indicate or even control the doors. On top of the controls is a complicated piece. When you apply the brake, the vehicle will do different things, depending on whether it's on a slope or if the road is slippery or the weight of the vehicle. There are a whole bunch of other things that happen to understand exactly how the vehicle reacts when you apply the brakes, to give a safe and comfortable experience to riders.

I should go in the other direction. First of all, the vehicle needs to know where it is. Part of the stack is that localization piece, which is done using a combination of sensors. You cannot only rely on GPS as it is not precise enough, can be blocked by tall buildings, and there are too many issues with it. What you end up doing is relying on the maps we mentioned earlier, pre-loaded into the car, and use Lidar to get the course map of the environment around the car, and match it with the known map.

The match is restricted using the knowledge of GPS and course localization. Once in a while, you map precisely where you are and then you can rely on other sensors like IMU to know how much you have moved in between. This system offers centimeter level accuracy localization, so you know precisely where you are in the environment. The sensing stack contains all the sensors – cameras, Lidars and radars – used for what we call perception. Once the vehicle knows where it is, it also needs to understand the dynamic agents. It knows all the static due to the map.

Dynamic objects are typically found through a combination of Lidar, radar and cameras, of which there are a lot of looking around the vehicle. Once you have localized and detected objects around the vehicle, you have to predict their motion, which is the prediction piece. That is often done simultaneously, as you can imagine doing perception prediction, at the same time as doing one block. For each object or agent, you predict what it is going to do over the next few seconds. The simplest way of doing this would be ballistics roll out – take the velocity and project it out – which is not sufficient enough. Vehicles move differently, depending on the street configuration. If the road curves, they will continue to curve; if there are other agents they need to avoid, they will avoid them.

Your prediction can easily get pretty sophisticated. Once you have all the agents and their predictive behavior, you plot the right course around these agents, which takes you closer to your goal. To get to a destination, you have the map, all these agents and the rules of the road, and you do what is called a planning piece, which is to get the safest path around the agents. You then execute the plan through controls, turning the plan into a set of actions the car will execute, meaning turn left, turn right, apply the brakes, etc. This is all done very often, typically 10 Hz to 20 Hz, which is why you need a lot of computing power in the vehicle.

We talked earlier about the sensors in the vehicle. You need a ton of compute and doing all these is expensive. In an R&D environment when you are building your stack, you typically have a very beefy multi-core Xeon type data center machine, with four to eight high-end GPUs primarily used by the perception.

Is the differentiating piece of the stack that proprietary data set that defines who has the best solution?

Yes, data matters a lot, you need it for testing which is specific to the ODD, the operational design domain, the environment the vehicle is currently designed to operate in. This is typically a geo-fenced area with specific routes and weather conditions. When you build your stack from the ground up, only good weather and day time operation is considered, where it is easier. Typically, data is collected from your ODD and performance measured only on specified routes.

The larger your ODD, the more data and testing you need to do. That is why Waymo have hundreds of vehicles and are testing in many cities. They need to be exposed to a large range of situations. The more advanced your capabilities, the more you will not see anything interesting for many miles, because you are able to deal with all these typical situations very well. In order to see long-tail situations, things you see very rarely, and to test the vehicles against those situations, you need to drive even more. That is a conundrum in this business, tons of data is required to safely bring an AV to market.

It is not simply the volume of miles, but the quality of those miles driven?

Exactly, if you only drive highway miles, it will not help you much. Typically, not very much happens on most highway miles. You need to somehow get exposed to new things. In the realm of perception for example, there are particular agents seldom seen, such as police cars, ambulances or fire trucks. To test your stack against those agents, it is a good idea to drive around police stations, hospitals or a fire station to increase your likelihood of seeing them and, therefore, be able to train on them and improve your stack.

How would you compare the different approaches companies in this space are taking to building out that stack, starting with Tesla, as Elon Musk seems to be the most vocal about their radar approach and use of neural networks?

Tesla is unique as they have a way of collecting data from their entire fleet of vehicles, which is a powerful asset. Other companies can also do that but Tesla is uniquely positioned; however, they are limited in what they collect. The vehicles have cameras and some radar, but don't have any Lidar, which limits their progression rate. In terms of compute, they have been advanced in building silicon quickly for their in-car machine learning.

They recently announced a new generation which will be even more powerful. This is important because, in order to have a commercial vehicle, it is impossible to use 5 kW of power for your AV stack. Not only would it dramatically reduce the range of your EV, it would create excessive heat and be prohibitively expensive. One has to move the stick often, which Tesla are very good at. Their challenge is the lack of Lidar, which is critical for training on fewer miles and building a stack which will perform safely, even without seeing those long-tail events. Tesla built their perception stack by training over many miles using only camera data.

When you encounter something completely new, there is no precedent. There are new examples every month, but a recent one was in Taiwan, where a Tesla in autopilot mode on the highway suddenly encountered an overturned truck. Because Tesla had never trained their perception stack on overturned trucks, it could not react appropriately. This is a conundrum where you think something may be off but you cannot always hit the brake in these cases, because then comfort and safety are compromised. If the brake is applied unexpectedly, you may be rear-ended.

This very interesting video is on the internet, where you see a Tesla drive straight into the overturned truck. At the last second, maybe it applies the brakes. Why did this happen? As it relies purely on vision and had never encountered this object before, it did not know how to react. Whereas a Lidar first approach has a chance of getting over that hurdle both more quickly and safely. With Lidar you build a 3D model of the environment first and foremost. If you notice something big in your path, you apply the brake. This allows you to train on far fewer miles than the billions Elon Musk quotes, while still being safe.

Do they collect only images or do they also use video for those miles?

I honestly do not know, but I suspect they collect snippets of video. They cannot exploit too much data by uploading it all. My sense is they have developed a smart way of deleting the boring miles and uploading only the novel and interesting events that offer potential training. I suspect only a small fraction of collected data in each vehicle is uploaded back to base for training.

Will certain events trigger that sending, such as sudden braking or swerving?

That is exactly right, they use other sensors and events from perception logs like hard braking, for which is a whole logic has to be built.

Are they limited to what can be sent over the air?

I am sure they are, but they do not do it in real time. Again, this is all speculation as I do not know how they do it because they do not share that. My suspicion is that when a vehicle returns home and is being charged, that is when they upload interesting events. They must have a smart upload approach which operates when the vehicle is idle and also does not upload everything. Tesla are very interesting and unique in the industry. There are others who also collect data from cars, such as Mobileye, who are now owned by Intel. For a long time, they have built level 2 type safety systems for vehicles, and have also built silicon which is very efficient. In their case, this system is in many vehicles, and uploads only snippets of information. They observe 3D patterns which they use in aggregate to make 3D maps, thereby updating them. It is also a very interesting approach which only works because the Mobileye system in installed in many cars.

Is there a situation where the technology and compute power of Tesla's neural network can bypass Lidar and go to the next level to create better maps? Can Lidar be avoided?

With enough training, you can bypass Lidar or use stereo to get depth information. Lidar is not 100% critical and there are probably ways around it, but Lidar is also becoming very inexpensive and less unsightly than it used to be. You no longer require a giant thing rotating on top of your car; there are now many companies making very small and high-performance solid-state Lidars that can be embedded around a vehicle. If I were Elon Musk, I would look at these very closely and, potentially, eventually put them in Teslas. Lidar simplifies the problem so it is a good idea to use it. As far as I know, 99% of companies use it; only Tesla do not.

Waymo have hundreds of vehicles with Lidar on the top, in areas like Phoenix, where they are specifically collecting localized data?

Lidar is not only for data collection; it is also a critical sensor for perception. It is used both in real time and offline for the map. I believe they have more than one – they have five Lidars – long-range and short-range ones. Some are behind the mirrors, one in front, one in the back; they may even have six, it is a large number of them.

Is the perception layer top of the stack almost emulating the behavior of a human to drive the vehicle in a certain way, meaning the vehicle requires as much quality data?

Five or 10 years ago, when the field of autonomous driving was quickly ranking up, the AV stack was often associated with perception. 3D images binding objects around objects was what many people thought the complication was; this perception. Since then, tremendous progress has been made thanks to deep learning and advanced sensors. Perception is no longer the thing to solve. There is a path to having something that is good enough and safe enough.

The biggest challenge is planning. How do you plan around the environment where it is always safe and provably safe? Most teams that started in this field were roboticists and applied robotic type approaches to the problem, essentially trying to come up with rules and tweaking those rules based on a large number of parameters. That approach is currently limited. There are so many different scenarios to deal with. In order to be good at it, there needs to be tens or even hundreds of different parameters involved.

For example, even passing a cyclist, that is a bike lane on the right, you have to take into account speed, the lanes around you, other approaching vehicles or an upcoming traffic light, this easily adds dozens of different parameters you need to tweak, which makes for a challenging problem. Many engineers are required for each maneuver. One thing I personally believe, an approach we have focused on at Lyft Level 5, is to build more of a machine learned approach, a system which will learn from more data. The data Tesla collects from their vehicles, we will collect from the Lyft fleet, by adding sensors on some of the cars at large scale.

What is the challenge in planning for companies like Tesla, Lyft or Waymo?

The first challenge is to be able to handle the long-tail situations you encounter. Most cars spent their time lane following; they are on a lane and typically follow the car in front. That is the first thing you work on and quickly solve. The next thing they do is turn right at an intersection. You keep going down the list of things you need to be able to do and you reach the end of the spectrum, where you have to deal with a deer jumping from the bushes in front of your car. That is the challenge. How do you deal with all these situations and prove that you are safe enough?

What is the bar? Is the bar to be as safe as a human? Probably not; you want to be even safer. How safe and how do you prove it? There is also a challenge of demonstrating that you are safe in the presence of some other failure. It is very possible that a camera is suddenly obstructed by bird poop, or you lose a piece of your computer due to overheating; a number of things could happen. You have a flat tire, what do you do then? Are you able to do a safe maneuver and can you prove that? That is often done through safety decomposition of the whole problem and understanding the path. That can be proven but it is hard and requires much work. You have to prove that the right things happen in the presence of a hardware failure.

You also have to statistically prove that you are as safe as or X times safer than a human in the ODD you are operating. That requires many miles which is the biggest challenge. Finally, you also have to communicate this to both regulators and the public, which is the challenge today for AVs.

Can we look at the different potential business models in AV? Waymo is partnering with Magna to build the fleet of Jaguars and Chryslers with Lidar on; would ride-hailers like Lyft and Tesla look at a different business model?

Broadly speaking, there are two or three. Some companies are building the SDS, the self-driving system, and their model is to sell this to the largest number of OEMs to integrate. Aurora is a start-up in Palo Alto, California who is doing this. This is simple and they become a tier 1 provider to OEMs. Companies like Zoox, recently acquired by Amazon, and Cruise are in the robo-taxi business. They want to build a service and do not need to integrate with many vehicles. You need at least one vehicle to launch a service.

Their business model is to simply provide a service, as Lyft or Uber does, but powered by self-driven vehicles. Some are more hybrid, like Waymo, who is partnering with companies like JLR and others, and might want to license the Waymo driver to them. They are also interested in building a Waymo ride-hailing service, which is a robo-taxi service. In the midst of all this, are ride-hailing companies like Lyft and Uber who are building this themselves. Their business model is to complement drivers with AVs. In the case of Lyft, these AVs will be either our own AVs built by the Level 5 team, or a third party that we on-board. There are currently pilots on this, both with Waymo and Aptiv.

Is Tesla similar to the ride-hailers?

That is very interesting; Tesla are already selling you full AVs. Two thirds of Tesla buyers opt for the full self-driving package, even though it does not exist yet. Eventually it will, one would hope, and Elon's vision is that you can monetize your Tesla when you are not using it by putting it on a network where it operates as a robo-taxi. Most AVs will, initially, be operated as part of fleets, as they will still be prohibitively expensive.

All the equipment required – the collection of sensors, redundancy and computer –are easily tens of thousands of dollars; probably more initially. That is not something a consumer would feel like spending to drive autonomously on occasion. It seems already too small a market. If you can make that super cheap, as Elon Musk is hoping to, then it becomes more attractive. Until that happens, only robo-taxi providers and ride-share will use AVs and get value for them, as only they can keep them on the road long enough. They can quickly amortize the price over many miles.

Do you think businesses that already have direct relationships with customers and end users have an advantage?

Do you mean Tesla?

Tesla, Lyft and Uber have relationships with customers.

Yes, Lyft and Uber obviously have an advantage because the service exists. There are millions of users using these services daily, so AVs can be naturally be deployed. The go-to market approach is a very obvious advantage. Others, like Cruise and Waymo, will have to build up their own service. The service will likely be limited at first and only operate in some cities, perhaps not all the time or not every desirable pickup and drop-off location will be supported. It is going to be a challenge to build a great product from day one with only self-driving cars.

Not just in terms of technical challenges but also in terms of building the brand, the service and the customer base. We believe people will not want to use five different apps when they want to go from A to B. They will probably end up with one or two. The question becomes, can these robo-taxi providers eventually become able to operate everywhere so that you only use them? It will take a while.

That is the advantage, if I already have Uber or Lyft on my phone, I am not going to download the Waymo app to hail an AV once a month when I do a specific journey. It is very hard for them to get to scale and put it into service.

We should not underestimate them; they will find ways to give you coupons and make it cheaper; they will find a way to attract you. They will get distribution from Google Maps or other places where they can get discovered. There will be ways for them to build the brand, but yes, over time, it will require them to be in many cities. Another way they can be discovered is, if you are using Google Maps for example and Google knows your route, you want to go from where you are to some destination, they can see if that route is supported by Waymo and if there is one nearby, propose it to you at that time. There are ways that they can build their brand over time.

There is definitely a play there as everyone uses their products. The question is, what scale do they need to reach to make it considering the cost of those rides in the early days?

Exactly, they are going to be super expensive. That is why we believe the right model to have, at least in the early days, is a combination of driver and safe driving cars. First of all, AVs are very expensive, and the demand in this business is very spiky. You will find there is a peak of demand in the morning and evening commutes and going out for dinner. Every day is like that. Between the peak demand and the middle of the day which is low demand, there could be five times or 10 times difference. If you are building a service, you want to be available when people need it. You need to be provisioned for peak demand, which means your cars will be idle doing nothing most of the time.

What do you do with them? Maybe they can deliver packages or something else, it is possible, but mostly they do nothing. If your cars are very expensive, your business model cannot work. We believe that when your cars are expensive, you only have a small number of them doing your base demand all day, but they are complemented by drivers, for either peak demand times or rides they are not able to do yet. The combination is very synergistic, and since the business is growing despite recent hiccups and headwinds with COVID, there is a secular trend of people moving to ride-share or not owning a car, who are moving to using a car as a service. As this market keeps growing, even though you deploy AVs on your service, you are still going to need more drivers as the demand grows. We will see this play out over the next 10 to 20 years.

Waymo partnered with Magna who then partnered with JLR and Chrysler; how do you see the role of OEMs evolving?

Over the past few years, car sales have not been doing great worldwide, with the exception of Tesla and a few others. The reason for this is that there has been a move to ride-share and fewer people owning cars and that trend will accelerate. OEMs are under pressure and need to reinvent themselves. They do not have the DNA or the people required to build the deep machine learning and self-driving engine they need. That is why you have seen lots of partnerships between tech companies focused on AV and OEMs.

These tech companies, typically, have a software DNA but do not know how to build cars. There is clearly a need to partner. You have named a few but there are many more we could name if we do a web search. If you agree with the thesis that, in future, people will stop owning cars and rely more on transportation as a service provider, like Lyft, Uber or others, then I believe that the brand power will migrate to these networks. This would mean that the role of the OEM will become more like a tier 1 to these services that rely on their vehicles. But building cars is very hard, it is regulated and requires decades of experience, so OEMs are not going away, they are critical to this ecosystem.

They are being pressured to become pure assemblers, but without the brand, because that will shift to those who have the relationship with the customer, like Uber and Lyft.

That is a logical scenario, but one that will play out over a very long time.

The tier ones are also in a tricky position. Everyone will get squeezed if the value is shifting to the planning and perception layer, which is software or technology, not owned by tier ones?

Some tier ones are also trying to build self-driving systems. Magna was one but they may have scaled back their efforts recently. Another one is Aptiv; they are a tier 1 and they are building a full SDS and testing it on Lyft.

What is your opinion on Aptiv because they have level 2 and level 3 systems. Is it a different ball game moving from level 2 and 3 to level 5?

I do not specially know about Aptiv so I cannot comment on them. You are right that some players, typically those coming from the auto industry, whether they are OEMs or tier ones, will start their journey in autonomous by doing level 1 and level 2 and level 3. The biggest hurdle is going from level 3 to level 4, and they have to somehow fund their way to level 4 through products they can sell now, but the jump between level 3 and level 4 is very substantial. You will find that many pure AV players coming from a tech DNA or Silicon Valley, go straight to level 4 and say the rest is a distraction. The crux of the problem is to build level 4 capabilities and they go straight to that.

One day we will know who is right. AV companies are usually focused on the raw autonomic abilities. They are trying to demonstrate some level of performance of the autonomy system, without worrying too much about the underlying hardware stack in the vehicle. They take the opinion that it is something that can be solved later. OEMs and tier ones will put a lot more focus early on, on the underlying foundation and get that right. Get the right hardware stack and real time OS and the middleware layer that is auto grade that they can build on. It is a different approach, but we need both and over time we will see partnerships that build on the strength of each. That is why we have seen all these partnerships between OEMs and tech companies.

How do you see the automotive value chain evolving?

If you agree with the thesis that, in future, more customers or users ditch their cars and use instead a service for most of their needs, then you will probably see vehicles being designed for these services. That means they will be more autonomous, but will also last longer. They will be designed for very high duty cycles and, therefore, they need to be more robust. We will likely see fewer sales of vehicles as we do not need as many, as they are shared in these networks. They will be much more robust and able to go half a million miles or more.

The other obvious trend is that they will be electric. It is something that is happening now, so fast forward to 10 years from today, you will see fewer cars worldwide. They will be electric; they will last a lot longer and they will be autonomous.

What is the time frame that you are realistically looking at for AVs?

That is a great question and all of us in the industry have lost some credibility here. Some extremely ambitious and optimistic predictions were made some time ago about deployment timelines. We could say it is going to be less than 50 years, but we are not there yet. The big players, Cruise and Waymo, have all announced deployments, full autonomous services, and then repeatedly pushed them out, and are not yet deployed in any meaningful way.

Some of them are also getting into not moving people. Waymo is now going into trucking. I wonder why? One can speculate this is because they are still not ready to deploy a passenger AV service and are, therefore, trying to get value in an environment where the tech is less complex. Over the next couple of years, we will see some really small-scale deployments, which are going to be in simple ODDs.

We will see the Cruises and Waymos of the world scale back their ambition and try to deploy somewhere small. At the same time, there are other players that started small and might want to expand. Players like Voyage, a California start-up, has been focused on retirement and gated communities, environments where they are more controlled, where the speed limit is 20 miles an hour. It is a small market, but you can use that as a way to build your service and grow from there. We will see some small-scale success from both ends of this market, between the smaller players focused on very specific ODDs and from the bigger ones being more focused.

There are some in the industry that speculate we need a breakthrough in AI to be able to deploy AVs at scale. It is true, our system right now, our build to imitate the human, cannot extrapolate any situation thrown at them. You can only deal with what you have seen before and been trained on. A human is able to extrapolate; any situation they are provided with, they can somehow react safely.

A jump in machine learning is required?

That jump hasn't happened yet, but there are many AI luminaries currently working on this extrapolation piece, which we may need in order to have safe AVs at scale.

Surely Google is in the best place to make this jump given their ML and AI?

They are certainly uniquely positioned, but who knows? I am not there anymore, so I cannot speak to what they are doing. One big factor which we have not discussed is that building an AV takes a lot of money. Why? In part, because you need a lot of expensive engineers, and regardless of how much you rely on simulations, you still need cars on the road to test. The more advanced you are, the more cars you need.

Burning cash.

Tens to hundreds, with safety drivers, which are expensive to operate. You also need to collect tons of data for machine learning and, therefore, your cloud costs are significant, and only grow because you have increasing data and models over time. It all adds up so it is an expensive proposition but some players have unlimited money. Apple is working on this too; they are not making much noise but they are definitely investing. Google and the biggest players have raised billions and are able to invest in this for the long run.

Do we need 5G because these cars are effectively moving data centers?

You should not build an AV assuming it will be permanently 5G connected because that may not happen and it will be unsafe. You have to assume the vehicle can operate on its own in the environment without connectivity. These vehicles collect a ton of data, some of which will need to return to base to be analyzed, to help refine the training and keep the map current. An AV typically collects multiple terabytes of data daily, so to upload all this you would want 5G.

But even with, 5G you would need to go back to base to some sort of AV hub to get that data uploaded over wire or even Wi-Fi. Admin becomes a big piece, which 5G could absolutely help with, but it is not critical.

Looking at Tesla's plans for full self-driving in the next couple of years, is it unlikely we will see mass Teslas driving everyone autonomously?

My best guess is that we will not see a ton of Teslas for many years but Elon has succeeded against all odds and he has built an amazing company. I own a Tesla and it is a great car, so I will not rule him out.

Do you drive it on autopilot and autonomous?

No we bought it five years ago, just before that was available. Maybe I should upgrade at some point.

Thank you very much William.

Please Use PDF Download

TeslaTSLA

Related Content

Waymo: Scaling AV1 & Uber Ride-Sharing Partnerships

Uber: Waymo GTM & Autonomous Vehicle Economics

The Trade Desk: Strategic Value of Ventura OS

Incumbent Auto OEMs: Gigacasting Gaps, Legacy-Line Constraints & Cultural Bottlenecks

Luc Vincent

Current EVP, Autonomous Driving at Lyft and Former VP Street View at Google

Why is this interview interesting?

Interview Transcript