Current EVP, Autonomous Driving at Lyft and Former VP Street View at Google
Luc Vincent is the Current Executive VP of Autonomous Driving at Lyft and responsible for building a Level 5 system for the ride hailer to bring an autonomous ride-hailing service to market. Luc previously spent over 12 years at Google where he started the Google Street View product which forms the foundations of the current maps product. Luc spearheaded the journey from collecting images and turning pixels into knowledge to map the world. Luc previously worked on image processing at Xerox and has a PhD in computer vision.Read moreView Profile Page
Tell us about your role and responsibilities at Lyft today, as well as at your previous role at Google?
Today at Lyft I lead the team called Lyft Level 5 which is focused on building self-driving cars, focusing on the specific needs of ride-sharing. Three and a half years ago, I founded this group and have been leading it ever since. Prior to that, I was at Google for over 12 years. I came to Google to work on Google Books. I was intrigued by the ambition of Google to scan all the world's books. At the time, they encouraged engineers to have side projects called the 20% Project. I got interested in a project that was a collaboration between Stanford and Google, headed by Larry Page himself.
Larry Page is one of the co-founders of Google. The idea was to collect imagery at street level from vehicles driving around. His vision was, essentially, to not only bring the web to people, but the world. He wanted to collect data at street level and make it useful in some way. Digital was still a bit fuzzy, but aligned with Google's overall mission. Since Google was growing very fast, there was nobody at the company at the time who had spent any time on this project, so I took that on as my 20% Project. I got some interest from seven interns who helped me out over the first Summer.
We put together an end-to-end demo of what something like this could look like. We hacked a car together with a bunch of cameras, Lidar and hardware, and then got help from the Google security team to drive this car around collecting data. We established an early, though clunky, way of getting data from the cars to a hard disk plugged under somebody's desk. After uploading the data, we put together an end-to-end pipelined computer vision to make this data useful. A couple of weeks later, I gave a tech talk, followed by a review from the VP of engineering, who thought it was interesting and gave us the go-ahead to make this real and hire people.
We hired engineers to work on it, and launched in five cities about a year later. Why five? From day one, we wanted to show that the project was not just a California experiment, but an ambition to grow beyond a single city, so we chose five cities spread across the US. We only had a small amount of data for each city, but still useful. From there we wanted to see what happened. It was a brand-new product space and nobody had launched street level imagery before in the context of maps. We received record traffic and press interest in the product and, from then on, we knew there was demand and that it was going to work.
We grew this project from an early stage experiment to something that could be scalable. My focus over the next few years was that scale; to make it real, more robust, and to essentially rewrite everything about the software we had built because everything was clunky. It was all about moving fast and not about having something super reliable. Along the way, my career grew, with the team growing from a handful of people to over 100, with global operations. After that I also expanded my scope of interest to be involved in different kinds of imagery – captured from airplanes, satellites, even user's cell phones – with the mission to make sense of all this imagery in the context of maps; to drive data for maps. It was not only about presenting images to users through Street View or Google Earth, but to essentially go from raw pixels to knowledge and structured data.
Today, Google and other company's maps are built automatically by data mining imagery and extracting the corresponding street signs, street names and house numbers; all the information required to make a map. We pioneered this along the way and expanded the scope to derive knowledge from other kinds of imagery at Google.
After collecting data via Lidar or radar in the early days, you expanded to use many other sources?
The Street View car was primarily about imagery, but from the very beginning we felt there was going to be interesting information that we could not easily derive from the imagery, which was about 3D of the environment. We wanted to be able to give users a smooth transition between panoramas. Imagine your Street View Images are like a bubble; navigating from bubble to bubble can be very jarring if there is no transition. Context is lost and so, in order to create smooth transitions from one bubble to the next, the first bubble needs to be warped in a way that it moves to the second one, which involves an understanding of 3D. The imagery is projected onto a course 3D of the environment and the user is moved smoothly to the next bubble.
In theory, you can construct a 3D from the raw imagery itself through stereo, but it is complicated and does not always work. To aid us, we placed Lidars on these vehicles. In the early days of the project, our Lidar had a single line, but over time we moved to much more sophisticated puck-style Lidars that gave us more information. It was primarily about imagery but Lidar helped.
What are the differences in data quality between Lidar and radar?
That is completely different and we can talk about this when we move into the realm of autonomous driving. Lidar is very rich data; it is a point cloud that gives you distance at every point in a scene. That is if you talk about Puck-style Lidars, which are rotating lines like this. Radar is very different; it is raw data and very hard to interpret. It gives you distance and velocity of a dominant object in the scene. New generations of imaging radar, which are currently becoming commercial, give something that looks like a course image of objects, in addition to distance and velocity.
The two have different characteristics in terms of how they react to weather. The beauty of having Lidar, radar and imagery on an AV is that you have something working well, regardless of weather conditions and for all the situations you need to be able to deal with, so they are complementary.
Does Google Maps give Waymo an advantage in AV?
Some advantage yes, but there is a difference between traditional maps like Google Maps which are mainly for people, and AV or HD maps, designed for machines, robots and AV. Humans do not require the level of detail an AV does. We have a lot of experience intelligence to map the course information into the world we see around us and navigate. AVs are different, and need to do a lot in a very short amount of time.
They need to understand the world around them, what other agents are doing, like pedestrians, other cars and trucks, and plot a safe course of action. Doing this all in real time is very expensive. If, on top of this, you also need to understand the static environment, lane configuration, intersections and stop signs, you need to do even more work, and you are CPU limited. Today every AV company relies heavily on HD maps, which are a pre-recording of the static environment pre-loaded into the vehicle. These maps are a 3D model of the environment, plus metadata that matters, such as the exact location of stop signs, lane boundaries and traffic lines.
Is all that metadata proprietary to each company?
At the moment, yes, however there are companies that are trying to build a business in providing AV maps to the industry. The first one was a company called HERE maps that have been trying to go into this business and provide HD maps. There are others like DeepMap, founded by former Google engineers, catering to the AV industry. There is, however, no agreed upon format or level of detail – some companies operate online in real time, versus on a map – and people place boundaries differently. So at the moment, it is still the Wild West in HD mapping. In future we will see more consolidation or standardized formats, but that has not happened yet. Going back to your question about Google, Waymo uses some of the Google Maps for the underlying structure of the streets, but they augment this with their own HD mapping efforts.
Can you lay out the AV stack from hardware to the behavior and perceptions?
Think of it as a layered set of components. At the lowest level is the vehicle you need to interface with, which typically requires deep integration through vehicle components. This is so you can programmatically control the vehicle, essentially tell the vehicle to apply the brake, turn right, turn left, accelerate, indicate or even control the doors. On top of the controls is a complicated piece. When you apply the brake, the vehicle will do different things, depending on whether it's on a slope or if the road is slippery or the weight of the vehicle. There are a whole bunch of other things that happen to understand exactly how the vehicle reacts when you apply the brakes, to give a safe and comfortable experience to riders.