Artificial intelligence (AI) applications have now become part of everyday life. For example, the combination of AI and computer vision offers fascinating new possibilities. Whether in the private field with facial recognition on smartphones, in industry in the automated inspection of products or in the detection of damage in quality control and predictive maintenance measures: AI and computer vision already achieve high-quality results that can improve the user experience, simplify processes and reduce effort. In the following, we show how real-time object detection with sonar data can support mission-critical tasks.
EvoLogics is a bionics and robotics company that has been developing innovative industrial products for 20 years. Based in Berlin, EvoLogics has worldwide connections in bionics research. The company offers efficient systems for underwater communication and autonomous survey robots - USVs (unmanned surface vessels). Their newly developed Sonobot 5 is among the fastest floating sonar USVs in the world. Institutions worldwide use it for survey and search operations in lakes, canals or coastal waters. These search operations mainly focus on detecting objects hidden under water. Such objects can be scrap metal in shipping routes or old military loads, such as aerial bombs or underwater mines. Apart from that, the police and coast guard use the Sonobot to locate and recover drowned people.
During such missions, sonar image data are transmitted in real-time to the operators. Sonar images differ in their physical nature from camera images in numerous ways. They are influenced by water depth, underground, reflectivity of search objects and more parameters. In the images calculated from sonar data, underwater objects are more difficult to detect. Inexperienced users may have difficulty interpreting or overlook important signals, and even sonar experts may make mistakes. In addition, continuous, focused observation of the incoming images is challenging.
One can greatly increase the probability of success and speed in mission critical tasks. This can be achieved by specifically supporting users in the interpretation of sonar images. Steadforce's AI team created an end-to-end solution that allows to train and run modern machine learning models based on state-of-the-art algorithms directly (on the edge) on a USV. This real-time object detection solution clearly highlights relevant details visually in the sonar image data.
The current success of AI systems is based on several factors. These include technical developments of new hardware, theoretical and practical advances in computer science, and the availability of large amounts of data.
Object detection is a special use case of computer vision, the analysis of images or videos. Large quantities of image data are available today. These are annotated, labeled as well as presorted and also freely available for various domains (e.g.from the OpenImage project). These datasets have dramatically accelerated and improved the development of object detection methods.
Convolutional neural networks were developed over 30 years ago and are the basic building block of many computer vision methods. Combined with the latest generation of network architectures, error rates for these methods have been falling exponentially in recent years (see here). They now have reached or even surpassed human performance in some domains.
The necessary computing power is preferably provided by GPUs or increasingly by dedicated tensor processing units (TPUs). These can execute the mathematical operations of the neural networks in a highly parallelized and thus optimized manner. Cloud providers make the hardware for computations available on demand and in a highly scalable way. They provide both computing power and other necessary technical infrastructure, such as data storage, but also specialized, preconfigured AI frameworks.
Object detection is the combination of two complementary tasks: Object localization and image classification. That is, on the one hand, it identifies whether and where objects are present on a digital image. On the other hand, it assigns an image section the object type it most likely shows.
Machine learning models based on deep neural networks that extract and combine a large number of visual signals from an image achieve the best results. The very commonly used and already mentioned convolutional neural networks use filter kernels for this purpose. Among other things, these recognize different types of edges or color values and separate fore- or background of the image. Generally, they react to informative patterns in the image. We connect multiple tiers of these convolutional layers hierarchically. This enables learning of coarser elements e.g. outlines, and finer details that serve as features for object localization and classification. Current network architectures extend these convolutions with other building blocks and optimize them for object detection.
Sonar images show some differences to optical camera images. Instead of a 3-dimensional color space, such as RGB, you only see the one-dimensional intensity of a reflected signal. Also, you can only determine the location of a reflected signal indirectly via the transit time of the signal. The fan-shaped expansion of the sonar signals, wave motions on the surface, sonar shadows and other physical differences create distortions and artifacts in the displayed images. We can partially correct these artifacts. However, a machine learning model must be robust enough to compensate for these spurious signals and detect the objects of interest.
The project required us to enable the Sonobot users to configure their own model for selected object classes. Additionally, they should be able to contribute their own sonar imagery. The model must be able to run on the Sonobot itself and in isolation. Thus, it must work without network connection to additional data storage or computing resources. However, the model must also be able to process large sonar images (several million pixels). In addition, the detector component must integrate seamlessly into the existing command & control (CC) control software of the Sonobot through suitable interfaces. The model must process several frames per second to provide live results. The detector should operate with the best possible accuracy in terms of false alarms or missed objects.
To support the complete process, EvoLogics required three elements that build on each other:
The newly developed Sonobot user portal Argos serves as a user-friendly central data and model management platform. We designed the entire system for multi-tenancy. Specifically, each user's data and processes remain strictly separate from each other throughout operations. With built-in data separation and security the system also handles confidential data. Every user can easily upload their own data to the portal, for example from previously performed survey missions, in order to use them to supplement the training of future models. The user can label objects on uploaded images (see Figure 3) and assign each object to a class. We keep the labeled data in cloud storage, allowing easy and dynamic adjustment of storage requirements as the users add new data.
In this labeling, it is important not to miss relevant objects, to choose object edges that fit accurately, and to apply classes consistently. Otherwise, these impurities and inconsistencies in the training dataset will negatively influence the model training. This influence is stronger the smaller the used dataset is for the affected object class. Argos therefore offers the possibility for a review process. This involves experts checking and verifying the annotations of uploaded images by users, as well as automatically generated annotations of existing models. Through this, Argos minimizes errors in further use. In addition to the privately usable data, EvoLogics provides a curated library of sonar images that users can freely use along with private images. Any user is free to contribute images to this library and make them available to the Sonobot community.
The data management and control functions of the backend, e.g. for data validation and updating, establish the connection to the cloud infrastructure, are themselves implemented serverless and are fully scalable as well.
The user portal also controls model training. From all available object classes, users can select the ones relevant to their use case and request a model that recognizes and distinguishes between these classes. At this point you can also define advanced parameters, allowing for a trade-off between accuracy, sensitivity and speed. The training is based on a selection of models that have been pre-trained on large image and object sets and fine-tuned for the selected sonar object classes.
The actual training takes place in the background on GPU-enabled cloud resources. These are available as required and scalable practically limitless. This means that several users can process different models simultaneously without waiting times. You can adapt the cloud resources called up as desired to the size of the model and the amount of training data. The training runs are again strictly separated on all levels to ensure maximum security and confidentiality.
Creating a customized model can take anywhere from a few minutes to a few hours. This depends on the amount of images and the parameters selected. The user portal keeps the users up to date on the progress and success of the training. They receive a detailed evaluation report for each model, which uses a variety of metrics to indicate the expected performance in terms of falsely detected, missed or misclassified objects and the confidence of the model. Each model created is cryptographically signed and can be put directly on the user's Sonobot with a simple update.
Deploying machine learning systems on-the-edge, in this case on a USV, presents challenges. This is due to limited hardware power as well as missing or unreliable network connections to external components. Also, we have to ensure reliability of the deployment process to keep an embedded system productive.
We test new models for hardware compatibility and function before we install them as updates. The users can set class-specific alarm thresholds for each model to enable optimal scenario-specific detections. The model architecture was evaluated and selected to run on the available hardware components of edge devices such as the Sonobot. Thereby, it can evaluate the sonar data off-the-grid and completely independently in real-time. The real-time object detection solution for sonar data indicates the detected objects graphically to the operator. The geocoordinates of the location are marked and the data is logged for later evaluation. The sonar operator thus receives mission-specific configurable support for his work. The ability to detect sonar objects in real-time provides the basis for further development stages. These could be automatically finding objects and avoiding obstacles in the future and thus carrying out missions completely autonomously.
Alternatively, the user can play back the sonar data of missions already performed in replay mode. Thus, they can apply models with different object focus. This also allows the use of models with a focus on accuracy instead of frame rate, whose greater hardware requirements prevent real-time evaluation.
Through integration with the user portal, users can upload recognized objects again, validate them and use them as new training material for ongoing model improvements.
All development steps are integrated via a cross-platform compatible CI/CD pipeline, enabling rapid agile development, automated testing, and seamless delivery of all components.
Steadforce's AI experts realized this project in close collaboration with EvoLogics' engineers. EvoLogics handled the ongoing technical implementation in the Sonobot and control systems. The Steadforce team was responsible for the design and operationalization of the real-time object detection solution for sonar data on the edge device. The team also planned and implemented the control functions and ML pipelines in the cloud.
Currently, practical pilot applications are already running in close cooperation with a European police authority in real operational scenarios.