vak | 2023-08-22

Наконец то, чем я занимался последние почти четыре года, выпущено на свет божий.

This coming week, SiMa.ai ships the Production Release 1.0 of its Palette™ ML Developer tools, enabling companies designing with its purpose-built MLSoC device to release production software for their end products.

Это реально крутая штука. Делать специализированные вычислители для машинных нейронных сетей многие пытаются, но всё упирается в эффективный универсальный компилятор. Не выходит каменный цветок. А у нас получился. Не всякая архитектура вычислителя одинаково полезна.

Приведу несколько цитат из корпоративного блога. Выделю важное.

SiMa’s MLSoC silicon delivers a truly integrated edge ML solution that can receive and process real time video, radar, lidar and other sampled data to produce a comprehensive edge ML solution without connectivity to the cloud. The MLSoC’s huge compute capabilities to process data in real time is only unlocked when it can be easily programmed by a broad range of programmers.

Вся магия искусственного интеллекта будет происходить прямо на борту вашего девайса, разработанного с применением чипа от SiMa.ai. Облаком тоже можете воспользоваться - мы даём такой сервис, но это некритичная второстепенная опция.

SiMa focused on the very beginning that one of the key challenges in getting edge ML devices to market is the difficulty programming the embedded ML silicon to deliver an edge ML product. Many edge ML products require a small army of embedded programmers to hand code the algorithms to achieve the performance to process real-time data streams, otherwise many of the benefits of edge ML are lost. ML programming has been usually focused on the cloud, where the programming was simplified in exchange for utilizing massive amounts of computing power, consuming significant power and cost to operate. This trade-off made it possible for many data scientists to develop initial cloud based ML solutions, but they have remained difficult to scale due to high cloud computing costs. SiMa.ai focused on an approach to simplify the programming edge ML with key principles of Any, 10x and pushbutton. These principles have helped democratize the ability of companies large and small to migrate their cloud based algorithms to the edge.

И оно отлично работает. Мы ж не случайно переплюнули на бенчмарках такого гиганта как NVidia.

The Palette 1.0 software release provides updates to SiMa’s industry leading ML Compiler, which converts 32-bit floating point machine learning models into binary run-time code for execution on the 50 TOPS MLA core and ARM A65 contained within the MLSoC. The compiler update provides for enhanced quantization techniques when converting the fp32 model to an 8-bit integer representation. The developer has the option to parse the network layers to execute on different precision processing elements within the MLSoC device, ensuring that the model accuracy can be obtained in the design process. The ML Compiler update also features the ability to support large tensor models, while providing layer parsing and buffering to ensure that these large tensor models can utilize off-chip DRAM memory to support the large model file sizes needed in these configurations.

Нейронные сети обычно создают (тренируют) в виде так называемых моделей в формате 32-битной плавающей точки. Наш компилятор это дело кушает и превращает в код для вычислителя хитрой энергоэффективной архитектуры (мы его назвали MLA). Остро стоит проблема квантования, то есть превращения модели в целочисленную 8-битную арифметику без заметной потери точности. Эту задачу наш софт тоже решает. Размер обрабатываемых изображений ограничивается только количеством внешней памяти.

The Palette 1.0 has updated the pushbutton build process which assembles, schedules and orchestrates the heterogeneous processor execution, coordinating vision processing, ML inferencing and application code execution with a single simple development flow. This avoids the need for each processor executable to be painstakingly built with its own tools flow and then integrated with a manual and error prone process. The software update streamlines this coordination with an ability to support both Python based pipeline or Gstreamer based pipeline processing, with Python providing a methodology for quick functional results before developing more optimized Gstreamer based pipeline processing. The release has augmented the rich set of processing plug-ins and example ML pipelines to further aid the developer.

Вы можете своё приложение нафигачить по-простому на Питоне, или по-сложному на известном пакете GStreamer. Зато на порядок эффективнее. Многочисленные примеры приложений входят в комплект.

The Palette 1.0 software release contains a significant set of features as well as the maturation of the underlying run-time environment on the MLSoC. The MLSoC features an embedded Linux operating system based on the latest Yocto distribution, version 4.0, incorporating the Linux kernel 6.1.22, glibc 2.35 and ~300 other recipe upgrades. This compact but powerful implementation of Linux enables the deployment of a run-time environment that manages the execution and scheduling of the ten embedded microprocessors as well as the Machine Learning Accelerator (MLA) contained within the MLSoC device.

Всё там Линукс, конкретно Yocto - известный проект для встроенных применений, широко используемый в промышленности. Плюс драйвера и всякий рантайм для MLA и прочей периферии.

Для полноты картины покажу хардвер, имеющийся на данный момент.

MLSoC Evaluation Board: