Road to full autonomy - the open challenges
There have been multiple demonstrations of autonomous mobility, yet we don't see anything that is near market or ready for deployment. The road to full autonomy has two key open challenges, which need to be addressed. The first one is robust and failsafe perception and the other is reliance on HD maps. Unless the industry can break through these two major barriers, we are likely to see the hype of expectations disappear into the trough of disillusionment.
The driverless car industry is progressing at a rapid pace, with large automotive manufacturers, software giants, and nimble start-ups all competing to develop new breakthrough technologies to plant their flag on the global playing field and secure a position in the expected multi-trillion dollar market. Many prominent figures within the driverless car industry have often spoken about the benefits which a fully driverless world could bring: reduced congestion and pollution, safer and more efficient roads, mobility for the elderly and young, more productive land usage with fewer parking spaces, and many more. There are a multitude of potential socio-economic benefits that driverless car technology could bring about beyond the wildest expectations and speculations of experts today "“ imagine predicting the socio-economic impact of the first manned flight by the Wright brothers as a new technology on the development of the global aerospace and aviation industry in the years that followed.
There are four main factors underpinning the hyperactivity in this new technology arena:
- Fear of obsolescence of large auto OEMs, who can see the disruption potential of self-driving technology to their traditional business model "“ fewer people will want to own cars, more people now live in dense urban city centres and increasingly use shared transport.
- Billion-dollar valuations for small technology start-ups
- Huge revenue growth and potential for ride sharing companies that can transform the operating cost model by eliminating the human driver expense via self-driving 24x7 fleet operations resulting in 10x increase in asset utilisation rate
- The likely disruption of auto OEM business model by global software giants via software creep and control, relegating the platform hardware as an undifferentiated commodity.
These are powerful factors, but as yet, have not been enough to accelerate the onset of driverless cars on roads today. In this race, losing sight of the main goal of invention has seemingly been pushed down the order to quickly capture the market. One of the common outcomes in such situations is that actors become more willing to solve the easy problems. In my view, the state of play in self-driving cars is not a lot different. Many large companies are trying to tackle this challenge simply via resource allocation "“ more funding, more engineers, and more cars. However, drawing on the past, the Wright brothers didn't invent the world's first aeroplane because they had the largest engineering teams or the most money/investment. Surprisingly, one of the greatest innovations in human history came from two brothers who did not have a college degree, funded only by their bicycle repair shop.
There is much the self-driving car industry can learn from the story of the Wright brothers, and the first and most important lesson is to solve the real problem. The Wright brothers realised in contrast to their contemporaries that aircraft control, not propulsion, was the key to manned flight and dedicated their efforts to controlling airplanes. And by solving the right problems, one after another, they took off at Kittyhawk and changed history forever.
The question is that if autonomy is already working across so many demonstrations of technological capability, then where are the promised autonomous cars and when will they get here. In fact, we are in the long tail of technology capability development with the most critical open challenges that need to be solved before market ready solutions can be commercially deployed "“ in short, we need breakthroughs in two areas related to autonomous capability. In Sacha Arnoud's words, Director Engineering and Head of Perception at Waymo, "when you're 90% done, you have 90% to go, in other words you need 10X improvement".
Today's autonomy approaches are not scalable and only work in specific pre-mapped geo-fenced areas and only in particular environmental conditions (not too sunny, no rain, day only"¦etc.).
The open challenges
There are two open technology challenges constraining the capability of autonomous mobility today. One is "˜Perception' and the other is "˜Three Dimensional (3D)/ High Definition (HD) mapping'. Perception is the conversion of raw data captured by sensors on-board the autonomous vehicles into "˜scene understanding'. As humans when we look, we see the world around us and instantly understand where everything is "“ meaning even if we don't know where we are in the world, we can still see where the road or path is, whether it is blocked by something (that we may or may not have seen before), where the road kerbs are, where lane markings are and whether we are at a junction or not. When a sensor acquires input data from the environment (a camera or LIDAR for example), it spews out raw numbers as bits and bytes referencing the depth or pixel values. The major challenge at this stage is how to derive the understanding of the scene from this raw information that can guide the manoeuvres of the vehicle (stop, slow down, accelerate, avoid etc.). Perception is extremely hard and yet, it is the core of the autonomous capability. One can easily imagine the difference in mobility capability of a sighted person compared to someone with visual impairment to negotiate a complex crowded route safely.
Why machine learning alone is not enough
The current approaches to Perception in most, if not all, autonomous technology development programmes rely almost entirely on Deep Convolutional Neural Networks (CNNs). These are Machine Learning frameworks, which allow cars to learn to recognise what things look like (roads, buildings, people"¦). This approach is riddled with constraints and issues.
- Building a neural network requires millions of images/LIDAR pointclouds with annotations of what things are "“ e.g. bounding-box on a person
- This is a challenging and time consuming task so many approaches also use simulators to create simulated, annotated data
- Using this data, the network begins to learn what things look like "“ however networks don't learn the same way people learn and remain limited by the data that is provided as inputs. Neural networks are highly sensitive to the data used to train them. A great example is provided by the statement made by Michael Houston "“ Senior Distinguished Engineer, Deep Learning, Nivida, who said ""¦Like when we were doing our car detector, somebody gave us video from Europe and said how does your car detector work, well, we had never seen a Citroen, they don't exist here, so we didn't know what it was, right, it wasn't in our dataset.."¦your neural network is only as good as your dataset." This leads on to the next point rather clearly.
- It is very difficult to build networks that generalise "“ neural networks are good at only doing one specific thing (e.g. find one type of traffic sign)
- This means neural networks trained on data from one place won't work in other places if they look different
- So if cars, people, buildings, traffic lights and traffic signs look different than the data used to train the network "“ it won't work and will need to be retrained on new data
- Different networks will need to be trained to operate in each country and even in each city all over the world because everything is so different "“ and this retraining will need to always happen to account for changes (new concept cars, different type of signs, new infrastructure)
The points above clearly explain that neural networks alone do not solve the open challenge of perception.
3D/HD Maps are not navigation maps
3D/HD maps are point-in-time, detailed three dimensional road surveys typically relying on laser data. This survey data is annotated to mark the position of lanes, roads, traffic lights, signs and other important road infrastructure. When an autonomous vehicle drives through the surveyed road, it matches live sensor data to the stored map data to estimate its position relative to the survey. It can then pinpoint the location of road features as stored in the survey "“ like memorising what the world looks like in 3D. As the environment changes, surveys and annotations need to be updated. 3D/HD maps are costly, sensitive to environmental changes, and do not scale. One hour of mapping input data requires 800 man-hours of annotation effort and 1 mile of driving data requires 2 terabytes of storage. For mapping-based autonomy to scale, 3D/HD maps must be made by manually driving every road on the planet. The road network in the top 50 countries of the world by road network size, totals up to 60 million Km. A finer point about the scale of this task is that this data has to be collected in both driving directions, updated frequently to keep track of changes "“ petabytes of storage. This is a monumental challenge and all the current approaches will remain constrained to specific ring-fenced pre-mapped locations with limited route capability. There have been some suggestions on crowd sourcing the collection of 3D/HD maps by the autonomous vehicles themselves while they drive on various roads in order to keep the maps updated when the pre-mapped information is no longer accurate or fresh. The reality is that map failures are abrupt and severe (loss of signal to cloud database, sudden change in environment, roadworks, fallen tree"¦) and updating small changes doesn't fix this. Only an autonomous car that does not rely on the 3D/HD map can navigate the abrupt change and update the data for other cars blindly following the map. It is ironical because mapping-based autonomy will forever require a fleet of human operated cars to update maps.
Model free Perception
Propelmee has made a break through in the state-of-the-art in Perception by building and testing a "˜model-free' technology to convert raw sensor data input into scene understanding, which does not require training a network through datasets. Our perception gives autonomous vehicles unparalleled scene understanding in diverse complex environments. Our perception technology is scalable all over the world and detects every obstacle and road, anywhere - unlike anything ever developed before. This capability sits at the core level of our perception technology stack and
- Is not region specific "“ works in any city in any country
- Finds every conceivable obstacle and every road surface (dirt road to highway)
- Generalises across environments "“ even in places an autonomous vehicle has never driven before
- Does not need any re-training in new cities/regions "“ works out of the box
- Does not suffer from corner cases or "black swan" incidents "“ we once detected a zorb ball
Propelmee "“ Detection of surfaces