EDGE AI POD

Discover the cutting-edge world of energy-efficient machine learning, edge AI, hardware accelerators, software algorithms, and real-world use cases with this podcast feed from all things in the world's largest EDGE AI community.

These are shows like EDGE AI Talks, EDGE AI Blueprints as well as EDGE AI FOUNDATION event talks on a range of research, product and business topics.

Join us to stay informed and inspired!

All Episodes

EDGE AI POD

Faster Edge AI, Fewer Headaches

March 12, 2026 • EDGE AI FOUNDATION

0:00 | 59:29

If you’ve ever shipped a model that flew in the cloud and crawled on a device, this conversation is a relief valve. We bring on Andreas from Embedl to unpack why edge AI breaks in the real world—unsupported ops, fragile conversion chains, misleading TOPS—and how to fix the loop with a unified, device-first workflow that gets you from trained model to trustworthy, on-device numbers in minutes.

We start with the realities teams face across automotive, drones, and robotics: tight latency budgets on tiny chips, firmware that lags new ops, and the pain of picking hardware without reliable performance data. Instead of guesswork, Andreas demos Embedl Hub, a web platform and Python library that standardizes compilation, static quantization, and benchmarking, then runs your models on real hardware through integrated device clouds. The result is data you can act on: average on-device latency, estimated peak memory, compute-unit usage, and detailed, layer-wise latency charts that reveal bottlenecks and fallbacks at a glance.

You’ll hear how to assess quantization safely with PSNR (including layer-level drift), why pruning and optimization must be hardware-aware, and how a consistent pipeline across ONNX/TFLite/vendor runtimes tames today’s fragmented toolchains. We also compare Embeddle Hub’s scope to broader end-to-end platforms, touch on non-phone targets available via Qualcomm’s cloud, and talk roadmap: more devices, deeper analytics, and invitations for hardware partners to plug in.

If you care about edge AI benchmarking, hardware-aware optimization, ONNX/TFLite compilation, layer-wise profiling, and choosing devices with data instead of hope, you’ll leave with a practical playbook and a tool you can try today—free during beta. Listen, subscribe, and tell us the next device you want to see in the cloud lab. Your model isn’t done until it runs on real hardware.

Send us Fan Mail

Support the show

Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org

SPEAKER_02 0:33

Good morning.

SPEAKER_00 0:35

Hi.

SPEAKER_02 0:35

Or good afternoon, I should say.

SPEAKER_00 0:38

Or good evening wherever you are.

SPEAKER_02 0:40

Yeah, maybe even how's Amsterdam.

SPEAKER_00 0:43

Yeah, it's great. It's getting cold. So it'll be nice to go to Taiwan next month and get a little more warmth.

Taipei Summit & Community PSAs

SPEAKER_02 0:51

I know, I know. Yes. I will see you there. Uh well, first of all, welcome everyone to Edge AI Talks with our host Jenny Spielman and myself, Pete Bernard. Uh Jenny's actually dialing in from Amsterdam, and I am in Bellevue, Washington. So that's why we're talking about locations here. And we were talking about the upcoming extravaganza in Taipei, November 11th and 12th, HAI Type A, which is pretty cool. And I'll show the little banner here if people want to. There's still some seats available. Uh type A2025 AJI Foundation.org. We have discounted tickets for academia as well. And um and a bunch of folks are gonna be there. We're gonna have Dell there, Edge Impulse, of course. We're gonna have AdvanTech there, we're gonna have uh Squeeze Bits, we're going to have everyone from startups to big tech, metal to cloud in Taipei for a few days to uh talk about the state of the art in AJ and what's happening in the space. So that's cool. Um, so that's my PSA for the morning. I'm trying to think of other PSAs. That's part about it. I don't know. Do you have anything?

SPEAKER_00 2:05

My my number one PSA is always ask questions in the chat if you're here live and we'll get those answered. Otherwise, shoot them over via email after and uh enjoy the talk today.

SPEAKER_02 2:16

Yes, yes, definitely. Please uh engage. This is a live stream. So uh unless you're watching it on YouTube recorded, then it's not a live stream. But for all of the folks here, I think we see 23 and that'll climb. Um we had close to 100 registries. So yeah, this is a time to hear some interesting stuff and ask some interesting questions, and everyone's gonna get a little more educated, so that's pretty cool. Hello from Singapore. Hello, Russ.

SPEAKER_03 2:42

Hello, hello.

SPEAKER_02 2:46

What time is it in Singapore ish? I don't want to know. Um okay, so why don't we uh why don't we oh sorry Russ, I'm gonna take Russ, I'm sorry, I'm gonna take your iconic off here. Um we're gonna bring on Andreas. Let's bring Andreas on. There he is.

SPEAKER_04 3:07

Hey Pete, hey Jenny.

Guest Intro: Andreas from Embeddle

SPEAKER_00 3:09

Hello, welcome.

SPEAKER_02 3:10

Thank you. Good afternoon. And uh actually let me let me put you into this. Um oops. Let me give you the you got the guest spot here.

SPEAKER_04 3:22

Oh nice. Um the guest spot. I was just listening to you guys talk about Taipei and I look out at the Schili Gaffenberg in Sweden.

SPEAKER_02 3:30

Yeah, so you're in Sweden.

SPEAKER_04 3:32

I should talk to my boss about going to Taipei. Sounds like a lot of you should. Yeah, it should be good.

SPEAKER_02 3:40

I like your microphone set up there. So you look prepared to do a proper proper pod. Some people like show up there like sitting in their office, like shouting into their laptops.

SPEAKER_04 3:50

Yeah, yeah, yeah. I was sitting for a long time, it wasn't even plugged in. I thought I was talking to this one, but I realized it was not plugged in. Now it should be. It sounds like you're it looks like you are hearing what I'm saying.

SPEAKER_02 4:01

So yes, I'm hearing what I'm saying. Good, good. Excellent. So we're gonna learn some stuff today about embedded's cool tools. Is that the is that the uh is that the plan?

SPEAKER_04 4:11

That is the plan. Um, yeah, and especially some new exciting stuff that we're coming out with.

SPEAKER_02 4:17

All right.

SPEAKER_00 4:17

Amazing.

SPEAKER_02 4:18

Sounds good. Well, uh, as a reminder, yes, as Jenny mentioned, uh ask questions. Um, Russ says it's oh do we have two people from Singapore today. Interesting. What's that all about? Um so I don't know if you can see that, Andreas. So there's some some people from Singapore.

SPEAKER_04 4:38

I can see. Hi, Singapore.

SPEAKER_02 4:41

All right, two from Singapore, two out of two. Um yeah, so why don't we let you kind of jump into it and then you know folks can ask questions. We'll sort of we'll sort of uh gather them in the background, and and if Jenny and I reappear on the screen, then uh we will um throw a couple of questions at you and we'll we'll get the audience some answers. Yeah, yeah.

SPEAKER_04 5:04

Yeah, don't hesitate to interrupt at any point. It's it's just nice to have a dialogue and a conversation and so on. Cool, cool.

SPEAKER_02 5:11

So we will we'll hang back and give you the floor and let you kind of dig into it a little bit.

Embeddle’s Mission & Customers

SDK: Hardware‑Aware Optimization

SPEAKER_04 5:17

Perfect. Uh well, so first off, thank you a lot for allowing us to talk about some of this new exciting stuff that we are coming out with here at Embattles and recent development. Uh, so what I'm gonna talk about is faster time to devise with a new platform that we're releasing called Embeddle Hub. Uh this is let's see. Oh, there's a slight delay in the changing uh slides. Okay. Before I do that, I want to give a little bit of a background about Embeddle. So I'm not sure if everyone in this audience knows who we are. Uh, just to set the stage a bit and tell you where where we come from, uh, what the type of things that we usually work with, and why we are coming up with this new platform uh today and why we think this is a good good time. Uh, I also want to set the stage a little bit and talk in general about some of the challenges broadly about developing HI as we see them at Embattel, uh, based on our insight from working with customers that are dealing with some really challenging HI tasks like autonomous driving and then autonomous drones and these sort of uh sort of systems. And then I will set the stage for uh the the a demo of our new product called Embattel Hub. That's where we will end. Uh so we will look for those who who are more interested in product demos than slides. We will come to the product demo, but I will save it towards the end. Uh let's set some context first. So uh our mission around everything we do at Empadel is about accelerating the adoption of edge AI for all users and all applications. So that's a pretty broad statement. And we work with typically large enterprises and quite advanced deep learning organizations. For example, we have some listed here, like SENSAC, who is developing self-driving capabilities for Volvo cars. Uh, we have have um uh autonomous drones and other types of applications. I would say some of the hardest edge AI applications or challenges that are out there. Um, and that's where we come from. Um and the way we interact with our customers with these large enterprises is through our optimization SDK. What is unique about our optimization SDK is that we have a toolbox in this SDK. We have different algorithms that are dealing with optimizing deep learning models. They can be neural architecture search, they can be pruning, we have quantization and knowledge distillation. And all of these tools are looking down on this toolchain or software and hardware stack beneath it through our hardware abstraction layer. So all of these tools in our toolbox, they take the specific toolchain and hardware into account, and that can be open source toolchains like ONG's runtime, uh TensorFlow Lite or Lite Runtime. They can be vendor-specific runtimes like CV flow, uh, open vinyl, uh, title, and so on. And ultimately take the specific processor into account as well. And that's all about bringing the hardware into the model development. Uh, so we're really used to looking at the entire software stack and hardware stack and bringing that into the development uh through our optimization SDK. So typically, uh, the type of uh type of challenges that our customers will use our tools for, for example, can be here in an automotive setting. We have an example here to the left where we have optimized a model called ultra fast lane detection, which does exactly that, it detects the lane. Uh so you have a car driving on the road and it needs to see where the land where the lane uh edges are. Uh so in this particular use case with this customer, first the model didn't even run. And then, as we applied some manual modifications, uh we could make it run, and you had 12 frames per second, which was not good enough for this particular use case. Maybe it was for an autonomous braking system or something like this, but you need things to happen really fast, and then with our tools, you can boost this performance up. And in this example, you could reach 76 frames per second. So that could be a typical use case that our customers are using our optimization SDK for. And how they use it is to get started is really simple. Um, it's a Python library, so you can pip install it. And let's see, can you see my pointer on the screen?

SPEAKER_01 10:12

Yeah, yes, we can see it.

Real Case: Lane Detection Speedups

Three‑Step Pruning Workflow

SPEAKER_04 10:16

Yeah, um, so to get started, uh the only thing you have to do is basically import a pruning method in this case, in this example, and this is a hardware-aware pruner. So this is gonna prune the model, taking the target hardware into account. So it's gonna make changes to the model based on the specific tool chain that you're using and the hardware that you want to target, they want to deploy to. So after just defining a model, you uh define this hardware aware pruner, letting it know which hardware you're targeting. In this case, it's a hardware from Texas Instrument, it's called TDA4VM. It has a neural uh accelerator, uh a DSP based. And here we say that we want to prune this to a target fraction of 0.5, which in this case means we're going to prune away 50% of the flops in this model. And we can set different types of targets, could be do you want to reduce the on-device latency or you want to reduce something else with a certain fraction. And then you actually call the pruning dot prune method here. Uh, so it's just a simple three-step process to get started with this hardware aware optimization that our customers are using our optimization SDK for. And all of this is then also embedded into our visualization tools that customers use to analyze and go a little bit deeper and try and also understand what is happening to the model as we are talking as we are tailoring that model for this particular hardware. Uh that's what you see here to the right in our analyzed graph here. So, this is where we come from. Uh, working with larger enterprises on really, really hard challenges where state-of-the-art performance or where it really matters to squeeze out those maybe extra 20% uh on latency, for example, uh, to get that performance that you need for your application. You also have a lot of different models running at the same time, perhaps doing different things. In a car, you can have up to maybe 100 different models running uh on device in the car. Let's see. So, so this is where we come from, and now we are branching out and we are launching this new web platform that we call Embeddle Hub. And this is our way of taking some of that knowledge that we have gained over the last seven years working with these really advanced deep learning organizations, taking some of the things uh that that uh and put together a package that we think would benefit all HI developers on a broader scale. And I'm gonna show you exactly how how this looks like, but just to give you a teaser, it's basically a three-part system, so it's a Python library, and what you can do there is some compilation and quantization of models, and it has a collection of optimizations, but it's not a full-fledged say optimization library like what I talked about before in our optimization SDK. And then there is a device cloud, so we integrate with different device clouds so that our users can test on real hardware. Because we think this is a key component. It's the key component for our optimization SDK, where you have the hardware and you take that into the development loop. So you can test things, you can optimize things for the target device. Now we are offering this here, but in a cloud setting instead. And then lastly, we have the website where you can analyze, you can compare the results, visualize the performance, uh, and start understanding a little bit what's going on, why a certain model might run faster on this device versus slower on another one, and so on. And it's also where all the devices are stored so that you have everything in one place. So, this is our new platform in its essence, and we're going to dig a little bit deeper later on into each of these components. Um, to set the stage a bit, I thought about something that we did.

SPEAKER_02 14:43

Yes, yeah, I had a question from the audience. I just wanted to uh uh throw this one on here from a little uh yes, now I can see it as well. I'll bring Jenny on here too.

Why Hardware Matters for Models

SPEAKER_04 14:57

Yeah, exactly. Yeah, yeah, yeah. Yeah, so for example, it might be uh so if you have a convolutional CNN, a convolutional neural network, for example, and it takes a certain number of channels in one operation in particular. It could be that on one particular hardware, because of how it's set up, how it works, how it uh retrieves things in memory, going from, for example, 32 channels to 16 channels can lead to a huge speed up because that particular layer was a bottleneck or something but inefficient in that case. On another hardware, that might not be the case. Maybe they even have the same kernel running for 32 channels or 16 channels. So you prune it and you reduce those channels, but you see no speed up. That's why you need to take the hardware into account. You can't also not look only on individual layers, you have to look at the entire picture, the entire model, and make calculated uh choices of where you prune for that particular hardware, if that makes sense.

unknown 15:55

Got it.

SPEAKER_02 15:56

Okay, just want to throw that one in there before you move to the next thing.

Introducing Embeddle Hub

Four Core Edge AI Challenges

Hardware Limits & Lagging Support

Choosing Hardware Without Data

SPEAKER_04 15:59

Yeah, it's an excellent question. It's a very good question. Um, yes, so we did this recently, and I thought that it works nicely uh as an introduction to Embeddle Hub. Um, but also in-house, and we ask some of our engineers who are working very closely with our customers. Uh, so they are on there on a weekly basis working with customers on these really hard problems. If you're to generalize, we ask them and zoom out a bit, so not focusing on the particular problem for this particular use case that you're working with, but in general, what are the biggest challenges that you see across all these different organizations of various sizes, some of them with teams that are in the hundreds, some of them which are just a few people, and some even maybe a single developer. What are the common themes that you see? What are the challenges? And I'm going to go through all 13 answers. Uh we have selected 13 answers that I that I that I like for this setting, but I also took the opportunity to bunch to batch them together, and I thought that was interesting that I could do that, because all of them are basically falling into one of these four categories, and it's an even spread as well, which I think is interesting. So, either it had to do with hardware limitations and performance and bottlenecks and these sort of questions, maybe one of the most fundamental challenges in HAI. One of the things that our engineers brought up was choosing hardware without real insights, without data, without knowing how something will perform in a real-life setting. And that's a really tricky question. And for us, for example, who are working in the automotive sector a lot, there you might, for example, need to procure hardware for five to ten years in advance. And it's really hard to know even how AI models perform on today's hardware. So when you're trying to make a choice of which hardware should I should I commit to in a five to ten years horizon, that's a really tricky question. I know that if you asked AI developers 10 years ago what type of AI models do you want to run in vehicles or on device in 10 years, they were not gonna say, yeah, we're gonna have this multimodal language-based models in the cars trying to interpret the scene. So looking 10 years ahead, it's really hard to know what we want to run. That makes choosing hardware a tricky question. A lot of answers also had to do with the fact that HI is cross-disciplinary, so it's attaching everything from embedded engineering to model architecture and everything in between, and that can cause issues. And then one really big topic, I think, which is fragmented tooling. The fact that there is not a lot of uh uh consolidation around standards in the HJ community. People might use ONGs, some might use T-Flight, some might use TUSA, some might use something else, and you start chaining these together, you end up in a rather fragmented reality, and that can cause a lot of frustration and vulnerability and so on. Um, but I also think it's interesting that it was not just one of these that are say 90% of answers fall into one category, but rather is an even spread, which tells you something about the picture where I think HAI is today, that you need to address all of these different challenges at the same time. There's not just one big problem that we need to solve, we need to look at it holistically, and that is what we're trying to do with our new platform. So, zooming in a bit into the first set of challenges that our engineer in our in-house survey mentioned. Uh, these were some of the answers that we got. The first one is this, what I call the maybe the defining feature AI at any point, trying to take it to deployment, have been dealing with this. And that's a tight latency budget when you also have a really small processor. And it's sort of an equation that doesn't really add up because at the same time we want to put more and more models on these devices, but we also want to keep the power consumption down, we want to keep uh want to use cheap hardware, preferably, so that we can make money from the products we're trying to sell. And that equation doesn't really add up. So you're always going to be crying for more compute, and as we get more compute, because embedded devices are growing a lot in terms of their performance and so on, also our models are growing, and they are growing faster than the the the uh the devices or the processes, so it's a hard uh challenge. Then you have the fact that hardware support lags behind the evolving AI, the ever-evolving AI. And I know from experience, and I'm similar to myself, when we develop AI, developers like to use the new and fancy stuff, and we like to have the newest, fanciest architecture, the newest, fanciest LLM, and so on. And we're a little bit spoiled when we're working on the service side, where a lot of the development is happening. Then we move the application to the edge. Now all of a sudden the operations are not supported because the software and firmware for embedded hardware is often lagging a little bit behind what we have on the service side, and that can cause frustration and issues. Of course, everything will be solved if we have a universal target device, but that's not the case. So every SOC comes with unique constraints, which also means that it's hard to generalize optimizations. Um, yeah. And lastly, device capabilities sort of defines which techniques work. Uh, and the fact is that quantization and pruning, even if you learn it for one target device, is going to look completely different and behave a little bit different as you're changing the devices, you're changing the tool chain. Uh, the static quantization that you're using doesn't just look the same across devices, and sparsity, a certain type of sparsity might work in one setting, not on another, and so on. Uh, that's also causing uh some issues. So, my takeaway from this is that HAI innovation is held back a bit about hardware fragmentation and the lagging support. Um, moving on to the second category uh about challenges of HAI, and that was the question of choosing hardware uh when you don't really always have access to uh useful or good data for that you would like. Um, so investing blindly in hardware where it's really difficult to estimate performance, and it's if it's difficult before you actually go through the process of purchasing the device, setting it up, doing all the integration, and then testing running something, and that's a really time-consuming and resource, resource-consuming process. And I think we all know that the spec sheets is not a useful tool when it comes to evaluate the performance or estimate the performance of a certain AI model on a device. AI tops, for example, is commonly used to indicate the performance of a device. But it can be in reality that you maybe go from eight tops to 16 tops, and you think, okay, great, all of a sudden my model is going to run twice as fast. But what happened in reality was that the vendor added another core, but maybe the there's no software for parallelizing the computation. So all that happens is that now you can run two models, but they're just as slow as before. Maybe they're sharing the same memories, you can't even run two, and it's just it's much more complicated complicated than just looking at, say, the spec sheets and the number of number of tops on a device and so on. So, in the end, you want to run things on device to get that's the way to get uh to really understand how things perform. And in certain settings, these can actually have uh um, I mean, real implications that teams risk both um both over and under provisioning devices, and that can be very, very costly, especially in larger organizations. So the takeaway here is that hardware decisions are uh unfortunately often made without reliable performance data, even though there's a lot of uh activity in the community to try and and uh and deal with this issue. Um, so the third category um was the fact that edge AI today is or maybe it has always been very cross-disciplinary. That means that uh you basically need to be a full-stack engineer to go from even be an expert more or less from model architecture to compiler flags, and then knowing about device deployment. And if you're if you have a small team, that might be reality, you need to be basically a full-stack engineer. If you are in a slightly larger organization, instead you might have different teams where you have one ML or AI model team who is doing the development of the models and training and so on. You have another team which is the embedded expert, and you have system engineers, and all of these need to align and work together. And this can be a very time-consuming, inefficient process where maybe a model is being shipped from the ML team to the embedded team, and they can compile it and they have to send it back and try to understand why. Sometimes these teams might not even sit in the same building, maybe even not in the same continent. We've seen that case as well. So the multidisciplinary process makes this uh a hard, hard thing to do. And this lack of alignment does lead to breakdown. We've seen many examples of that, that the models fail once they are deployed if you can even get to deployment. So I think the edge AI it is rather unique, at least compared to say AI development for server-side, that it requires this rare full-stack expertise and a lot of rather complicated coordinations across teams, especially teams with different backgrounds and expertise, and that that can uh uh be type uh quite um resource consuming. So the last set of challenges or last category was this about fragmented tooling. And I think a lot of at least the developers in the audience are gonna uh relate a lot to this point. Uh, I do it myself, so let me point a picture. So you have you finally got your model to compile, and you're gonna try and run it on device, and you get an error message and it says unsupported operation. And then you sit there and you scratch your head because you have no more information to go. You try and understand why this operation is not supported. I thought it was, and you start digging in some documentation, you find that well, in theory it should be supported, and then you realize that well, your tool tool chain looks like this, maybe. For some reason, you are going from PyTorch to onks to light runtime, and then into the vendor-specific compiled format in the end, and now you try and understand where in this four-step or in this chain of compilations did something go wrong, and you have no information to go. That's requires quite a detective work, I would say. They can also be very unique for each target device and for each situation for each model. Um, so this fragmented tooling, different compilers, profilers, and runtimes across vendors also makes it very fragile. So you have pipeline fragility here. The fact that, you know, it might be that uh there was an update, for example, and this now OP set that you are using in your ONGS model is not compatible with the like runtime that you were using with that version. So that makes it rather fragile and it can break easily. So the fact the takeaway here, the fact that we have rather disconnected ecosystems, which uh, if we dare to say inconsistent documentation, can make HI development both fragile and time consuming. So with this said, how are we addressing yep? Did I hear something?

SPEAKER_02 29:05

Since there's a little pause here, um I don't think you heard anything, but there is a question, so your ears must be burning. Uh another one from Emola before you move on. Uh code optimization modules in the hub, staying on your great example of pruning, manipulating the C code for explicit memory allocation for filters and activation. So you can kind of dig in a little more on the hub functionality.

SPEAKER_04 29:30

I mean, we will see exactly what the hub have and what has and all the features and so on in just a moment. Um, but it doesn't have inference scheduling. I can I can tell you that straight away. But there's many other great things that we can look at.

SPEAKER_03 29:43

Okay. All right. Yeah.

SPEAKER_04 29:44

I was going to see you have a quote, you had a comment. Harvard frag, hardware diversity. That's a feature, not a bug. Nice.

SPEAKER_02 29:51

Yes. Well, you mentioned hardware fragmentation. I like the another way of saying that is hardware diversity.

SPEAKER_04 29:57

Uh that's another way of looking at it.

SPEAKER_02 30:00

Which is a yeah, certainly is a technical challenge. I mean, the world would be easier if all the hardware looked the same, if uh from a software development perspective, right? But um, but then we of course we lose all the value of of the edge computing world if um we have to stuff uh an Nvidia Jetson into everything.

SPEAKER_04 30:18

That's true.

Cross‑Disciplinary Bottlenecks

SPEAKER_02 30:19

So yeah, maybe what we would like is hardware diversity, but some consolidation in the tooling, perhaps, at least I mean, you know, that's that's best practices for like model portability, and we'll talk about this toward the end, too. Best practices for model portability and optimization would be good, especially the portability side, because imagine that's where people come up with new architectures and new accelerators, and um, you know, obviously the time to market for those is going to be dependent on the tooling and getting those models moved over. So uh things like Onyx, for example, and it's kind of intermediate formats are really important for that. Yeah, um, so there's another one from Emola. Mola's on a roll on a roll this morning. That's nice. How is hardware aware pruning different than regular pruning? Oh, did we already talk about that one? I think this is uh previously. You had your two good questions there, but feel free to throw more in there. Uh good. Emola was actually at our Milan event, she's been a part of our community for a long time at Electrolux. Okay, so maybe we should get out of your way and let you keep going.

Fragmented Tooling Pain

Audience Q&A: Pruning & Ops

Hub Solution: Unified Device Cloud

Workflow: Compile, Quantize, Benchmark

Live Demo: Projects & Runs

SPEAKER_04 31:20

Yeah, yeah. So uh to deal with uh I would say uh to deal with some of this aspect, um this is the solution that we come up to that. We think address a lot of the key issues that I just talked about in this fragmentation and what makes HAI sometimes time consuming and and a little bit frustrating frustrating. So, the embedded hub, what we have is a unified device cloud access. So, what do we mean by that? Well, there is the same way of accessing devices from different device clouds that we have integrated with from different vendors, but you will access to get access, it might look a little bit different, but the interface looks the same. And that allows people to run models on real hardware from multiple vendors with very little modification or changes into how they work. Things will be familiar. We have easy benchmarking in profiling, so that's what you will use these devices in the clouds for, so that you can compare the on-device real latency, not just look at say the hardware specs or trying to guess somehow. Uh, but also uh dig a little bit deeper and try and understand how this hardware behaves or how this model behaves. It's a cloud-based platform, which means that we have done some integrated tool chains behind the scenes or behind this one interface that we are giving or providing as an option to the user. And then we have standardized optimization pipelines, which means that if you want to use a per tensor static quantization for a particular hardware for a particular tool chain, it's gonna look the same in your in your eyes or the tools that you are interacting with as you are changing from one to another. Um, yeah. And we will see this in action in the demo. Uh, the workflow is rather simple. It's that you compile and you quantize your model with the library, you will verify the performance on a remote device, and lastly, you will analyze the performance on the web. Uh, and we think that this gets you pretty far into getting started with on device and bringing models from the cloud to the device, you will be in a pretty good shape to then take it from there and working towards a functioning on-device application. And I think this is where okay. I had one more slide just to make it really, really clear about the components that we have, so sort of a system overview. Uh the starting point is PyTorch for those developers uh who are working there. Then we have the Python library, call them BattleHub. There is the device clouds, which we are integrating with that you will interact with via the Python library, and then all of this is talking to the web so that you can analyze performance. So, this is how the system looks like. So, that brings me actually to the to the demo. So, let's see how this looks like in practice. So, uh hope that you can see that I switched tab now in my window. You should be seeing the hero of our landing page here. Yes, very good. So uh it's in beta, but it's live, so anyone can check this out on your own. Uh, it's free. So the starting point is you just log in. You can log in with Google or GitHub today, and right away I will jump into projects. So, here for example, you and now I'm gonna illustrate how I would use this in practice, or you could use it in practice. So, I have a selection of projects here that I'm working with. I've done some face detection exploration, I'm working on some post detection app, and we're gonna have a look now on is drone detection. So, here's an AI edge use case when considering maybe detecting drones that are flying, maybe from another drone or or or a camera or something like this. And we have the concept of runs. So, runs are the actions that you will take via the embedded hub. And there are three types of runs that you can do. Uh, categorize them into quantize, compile, and benchmark. It's sort of illustrating the steps that you have to take as you leave the training phase of your model, and now you're gonna take it to device. You may or may not want to quantize it to compress it and get some performance gains. You can skip that part if that sounds interesting to you. Then you have to compile it into some format, and it's gonna be different formats for different target devices, and then you will actually run it and profile it so that you get benchmarks on how the model performed. And the goal is to enable users to get to this point where you have a benchmark run and everything worked. So you have a model that did compile, you manage to run it on the device, and you will be greeted by this dashboard that tells you about its performance. So, what have I done here? So, I have an experiment now that I call it Resnet 18. That's the name of a model. So, I'm doing exploration here, trying to find an appropriate backbone for my drone detection uh architecture or app. So I'm taking this model, ResNet 18. I have run it on a Samsung Galaxy S25 mobile phone. I'm using the TensorFlow Lite runtime. I did this job today, actually, it has finished successfully, and it took three minutes. So that sets the time perspective here, too. It's a three-minute job. And I used a device that was provided via our integration with the device cloud from Qualcomm called Qualcomm AA Hub. And I have a link here if you want to go to uh Qualcomm A Hub and see some more information there. And what you see here is the key metrics, which is the latency, the inference latency on average on the device, the estimated peak memory, so how much memory did this model actually consume during repeated inference runs? You also see the number of the word the top compute unit, which in this case is everything was running on the NPU. So all we see here is 40 compute units running on the NPU. And here you can see, and if something did, for example, have a fallback to CPU, in this layerwise graph, I would see a different color indicating that this layer had a fallback and it was running on the CPU or the GPU instead. But here everything worked as intended and it ran uh really nicely on the MPU. And here I want to uh give a quick idea about why it's important to look at these metrics when you're developing edge AI. So, here, for example, in the layer-wise latency table, so we have a y-axis here of microseconds indicating how much time was spent on this particular layer. So, here we have this to to vision that we have flattened the graph of the model, even though a couple of things might happen in parallels parallel. And what you see here is, for example, that this layer is a max pool 2D, that's the name of it, it's the sixth layer in the graph. It takes up a lot of time on this on this hardware. So maybe from just looking at this layer-wise graph, I can see that okay, maybe the max pool 2D layer that I have in this model is not that efficient for this particular accelerator, the MPU on this Snapdragon chip. Or there might be something else that is causing, but now I have a starting point. So I can go through here and and look, okay, so this dynamic conv that I have early in the graph is also causing some issues, but it's actually running rather smoothly later on in the model. So maybe it's not the layer itself, maybe it's just the positioning of this particular layer in this model is causing this bottleneck. So I can use that to gain an understanding and uh make a better judgment of what model I will choose for my particular application. So in this case, maybe based on this, I would have to check on some other models. I would probably not choose a RESTNET 18. Maybe there are more efficient networks on this particular target device. And here you have access to basically a performance dashboard where you can see more things, how much time is taking up per type of uh operation, and so on. And if I want to see exactly which layer is this bottom, I I can click on it, and oh, I get taken to there's a slight delay because I'm sharing the screen. I get taken to here, the sixth layer in the mod in the mod. So um okay, so this is what you see on the benchmark runs on no matter which device you choose, no matter what from which vendor, what type of device, you will see this type of information. That means that you have a standardized ways, you have one way of working that will uh that will work across the different vendors. So let me go back and we will have a look at some of the other type of runs. So that was the benchmark run. I can also see here that this model I've experimented with for the first two. I tried a Galaxy S24, I tried a Galaxy S25, it went from 0.33 to 0.29. And this last one is actually an error on my part because I forgot to quantize this model, so it's a little bit slower. Um, I can filter another run type. So, for example, quantize. So, as you quantize the model, we had an embedded hub library, you get access to other types of dashboards for quantization. So, here you see the output peak signal to noise ratio that is an indication about how much the output, how much the predictions of this model is different between the quantized and unquantized graph. And usually a good rule of thumb is that okay, anything above 30 dB is a good uh is a good measure. Anything below it might there might be something you want to check out about the quantization or try different settings. But we don't stop there, we also have layer-wise PSNR, so that's when we look at the intermediate output for each particular layer, so that if you have poor output PSNR, you can also start analyzing which layers are causing these issues. Where does the quantization errors come in? Okay, let me go back. So, and similarly, there is a compile uh run type as well. So, these are the three run types that you have. They have different colors here. Uh, for those who like it, you can just go on runs, and you have all the runs that I've made in a Battle Hub for all the different projects and experiments I'm working with. So, in this way, you also have not only a way to analyze the performance, there's also a structured, organized way of working that we would like to promote. Good. But I also want to see so for example, uh so here we are in the docs section. Uh, if you're curious seeing what devices do we support today, we have this supported device list. Uh so here you can search, for example, maybe you're interested in a certain chipset or a certain uh phone, for example. I can snapdragon a maybe a lead I'm interested in. Okay, I see the these devices, the Samsung Galaxy S25, 25 Ultra, S25 Plus, all of them have the 80 Lead chip, for example. And right now, in the live version of the hub, we have one uh uh device cloud integration going, and that's the Qualcomm A hub. Uh so that gives you access to roughly 80 different Qualcomm devices or Snapdragon powered devices. Uh and I will talk a little bit about uh some of the next integrations that we're making. Okay, any questions on uh on this part?

SPEAKER_02 44:30

We had a couple of things going on here. Um, first of all, thanks thanks for all the live demo stuff. Uh we had a it's uh here's an external question. Uh a couple of things. One from Jordi, main difference of embedded compared to edge impulse. Ooh, controversial. But but there's a lot of tools out there. I mean, there's like you know, there's YASP and roofline, and there's there's lots of tools out there to help optimize, and so we we love them all.

SPEAKER_00 45:01

There's a ton of differences, but I'll let you answer, Andreas.

SPEAKER_03 45:05

Sure, different different different animals, sort of.

SPEAKER_04 45:08

Yeah, I'm not an edge impulse user. I think actually you were better positioned to answer this question based on on this DMO that I use made on the hub. Uh, I think edge impulse offers a broader what we have here is one particular use case. It's about taking a model, compiling it to running it on device, and then analyzing the performance that you get. So it's a cloud-based performance testing platform. That's what it is. Yeah, and maybe you can correct me, but edge impulse offers uh a slightly broader palette of things.

SPEAKER_00 45:37

Yeah, edge impulse is more end-to-end from data collection, designing and training your model and deployment. We also analyze performance, but we don't have a testing farm yet for actually testing on a real device in the cloud. So that's a big difference.

SPEAKER_02 45:53

There you go. Cool. Uh, yes, and also I think the edge, like you said, they do uh data set management collection and uh things like that.

SPEAKER_04 46:01

Yeah, embedded hub is not dealing with with uh with training basically uh models.

unknown 46:06

Right.

SPEAKER_02 46:07

Okay, so um another question from Jordy. Is Embedded only oriented to smartphones?

SPEAKER_04 46:14

No, we're not, but it is the first device cloud that we integrate now in the beta. Um and but you should you should stay tuned.

SPEAKER_01 46:24

Stay tuned, okay. Sounds good.

SPEAKER_04 46:27

And if you're in, I would say this actually, if you're interested in something else and you know what you're interested in, or you actually even if you don't know what you're interested in, you can reach out and maybe you can be one of the early access users, right?

SPEAKER_00 46:40

But it looks like a lot of the uh Qualcomm AI hub proxy hardware supported, so like QCS 9075, the a lot of the drag those are all Dragonwing um which those are not just phones, so those are not just phones, yeah.

Reading Layer‑Wise Latency

SPEAKER_04 46:58

That's true. So with uh the Qualcomm A hub integration that we have, you have access to some uh some hardware, some automotive chipsets, some robotics things, some dev boards. It's not only phones, but this I think it's uh it's a majority of those 80 devices are are phones, but but you're right, uh it has a slightly broader selection as well.

SPEAKER_02 47:20

Cool, yeah. Uh, internal question, um, couple of things, and Jenny, you'd come up with one question that we related to that. What physical hardware is provided in the testing farm?

SPEAKER_04 47:32

Uh yeah, so I mean, uh the way we have built this now, we said we integrate with different device clouds with different hardware farms. And so far, as we just talked about, it's the what we have live today, is Qualcomm AI Hub. Uh, we are coming with our own uh uh integration or our own versions soon, and then we're also looking and exploring for other partners, hardware partners uh that could be part of this. But we have a broad scope in mind, so again, I would say stay tuned here.

SPEAKER_02 48:05

Okay, here's another one from Kurt. Can users power view power estimates for specific model runs power consumed, I assume that's what that means, or power estimate consumed. I don't know.

SPEAKER_04 48:19

Sorry, one more time. Can use this view power estimate? Okay, uh, very interesting. No, you cannot do it today. Uh it is something that is very interesting, and uh especially for us learning if that is what the community is really interested in, that is something we can provide uh to a certain degree, right? It's gonna depend on the how the device cloud integration looks like um and what we can actually access.

SPEAKER_03 48:45

Right.

SPEAKER_04 48:46

It's easier for that when we have access to the device, uh, if it's an external device cloud integration, which we very much like to do as well.

SPEAKER_02 48:54

Yeah, I mean it's very hardware board dependent on power consumption, obviously. And you know, people try to fit their models inside of a memory envelope, but also inside of a power envelope. But yeah, it'd be tough to uh get anything reasonably accurate, maybe relatively relative one chip to another kind of relative thing.

SPEAKER_04 49:13

Yeah, I mean there are different levels to this that you can that you can look at. But yeah, I would say our up we have we are we're used to looking at power consumption. It's one of the aspects of our optimization SDK that I talked about in the beginning, because some of us used to like to optimize for power consumption, uh, but it's not part of the hub today, I would say that. Right, right.

SPEAKER_02 49:34

Okay, and then there is another internal question from Jenny. What security is insured for sending your model to the hardware farm in the cloud? And does your model remain your own or does it become embedded IP?

SPEAKER_04 49:48

It does not become embedded IP, and you can use it for production use cases.

SPEAKER_00 49:54

Awesome, and sort of piggying back off that, what's the pricing for embeddle?

Quantization Metrics & PSNR

SPEAKER_04 50:01

Right now, everything is free. Okay, so in this beta phase, right now, everything is open and free. Uh, we just want people to use it at the moment and reach out if you have problems. We like to keep it very open.

SPEAKER_01 50:15

Nice.

SPEAKER_02 50:16

All right, so uh that's kind of where we are with questions right now.

Supported Devices & Integrations

SPEAKER_04 50:20

Do you have I will try and wrap up then? Okay, well, but I wanted to see because I know also that seeing is believing. Seeing so the final thing I wanted to do was just we won't wait for the results, but here I have my terminal and I have the embedded hub library installed, and I have set everything up. It's an easy setup process. Uh so if I want to, for example, have my ResNet 18, which I've tuned beforehand, I have compiled it to TFlight, and now I want to benchmark it on this Samsung Galaxy S25. So the run that we just looked at. This is to get started, all you have to do is embattle hub benchmark, specify the model, you specify the device, and you don't need to go deeper than that. If you want to start tweaking and changing things in the compilation, the quantization, or the benchmarking, you can do that. But this is basically all you do. It will start, submit the run. I can click on this. Oh, I have to log in. This is the run that I just started. I see nothing yet because it hasn't finished. But if you wait three minutes or so, we should see a performance dashboard here. Um, so this is working and this is live. So going back, I think I have one final slide, which is just uh what is next? So, as I said, it's open for beta testing today. Uh, you can go into hub.tembedl.com. Uh if you want to, there's also spots left in our beta program, and that comes with some special perks. You get some access to our engineers and a lot of support, and it's it's a nice little community to be in. Uh so if you want that, just reach out to us. Uh in terms of device cloud integrations, so the Qualcomm integration is new, but it's live. And what we're coming now in Q4 is an even broader selection of mobile devices in our own embedded mobile devices, uh, our own uh device cloud. And we also like to welcome new hardware partners here. There are a lot of other types of devices for robotics, IoT, and so on. Um we would love to partner with more. So, hardware vendors. I know there are a lot of hardware vendors part of this community. Um, we are welcoming new hardware partners, and we want to engage here with the community. For me, as a data nerd, uh data nerd, I liked our extended dashboards, which are coming soon. We have access to even more advanced analytics for profiling and quantitation and compatibility and so on. Also, some porting that you mentioned before. Uh, so that's also something to stay in tune for. So um, yeah, with that said, come join us. Make on-device AI more accessible than ever.

SPEAKER_02 53:07

Sounds good. Awesome. Andreas, how can people get in touch with you or embeddle? What's the what's the call to action here?

SPEAKER_04 53:14

Yeah, so either directly to me, uh it's a aask at embeddle.com or just contact at embeddle.com. You can visit our website. Uh I mean you can find everything at oops.

SPEAKER_02 53:30

How's that?

SPEAKER_04 53:32

You will find everything here in the bottom. Uh there's a there is a contact us, you can click there on hub.embedel.com.

SPEAKER_02 53:40

Awesome.

SPEAKER_00 53:41

Great.

SPEAKER_02 53:42

And this is live now, so people can go to it now and yeah, it's live, hub.embeddle.com.

SPEAKER_03 53:48

Fantastic.

SPEAKER_04 53:50

Yeah, there's also if you want to ask questions, you can click on the community section up here. That will bring you to a Slack workspace, and you can ask questions directly to us there.

SPEAKER_02 54:03

Excellent. All right. Let me uh show that.

SPEAKER_03 54:09

Cool.

SPEAKER_02 54:10

Yeah, good stuff. Perfect. I think this space of um optimization and portability. You talk a lot about optimization, but like how much of the tool is used for that versus portability? Because we also look a lot at people moving models, you know, like everyone starts in the cloud, everyone starts on a Jetson board, and it's like, okay, actually, I need to share this thing. So then they go, they put it on a dragon wing, they put it on an NXP, whatever. So, how much of the tool is used? And I don't know if you have telemetry on this or whatever, for portability versus versus efficiency.

Edge Impulse vs Embeddle Hub

SPEAKER_04 54:44

I don't have a numbers, but I can say that I think as H A developers, we like to think that we spend a majority of our time is optimizing, squeezing out maximum performance of the device. Then in reality, we spend 80% of the time just just trying to make it run, just making it run moderately efficiently. I mean, that is the challenge, and it's it's the same, it's not only for you know, if you're a single developer trying struggling to get your mobile app working, we see it with large enterprises as well. You spend a lot of time just trying to get it to work, make it work, yeah. Um, but so you're not seeing right now, it's more of getting things to work and then getting them optimized than it is like, oh, I'm moving it from CUDA, yeah, you know, to but there's there's a I mean there's a minimum selection of just good practices that you want. You would like to use some int eight quantization uh and make that work. It's just so much to gain if you can make that work uh on edge devices. So there's uh but whether you know really spending the time to go from int 8 to int 4 if you have a if you have a hardware that supports that, for example, might not be worth it for everyone. Uh but I think there's a minimum set of optimization that you would like to do, or that most people should do, and invest a little bit of time in.

SPEAKER_02 56:00

Right, right. No, that makes sense. By the way, just a side note, I noticed you have an undergraduate degree in physics, a master's in physics, and a PhD in physics. So, how did you get into this AI thing? Oh, that's a good question. That's a good question.

Beyond Phones: Boards & Robotics

SPEAKER_04 56:12

I think it was it was around the middle of so my PhD is in quantum theoretical quantum physics. So I worked on building a quantum computer. I don't know if you heard about that. Okay, yeah, we've heard about that. Um yeah, it's I mean it's a bit a major challenge. Yeah, I think it was around the midpoint of this where AI sort of had a boom. Um, this is sort of the time where, for example, we thought that self-driving vehicles would be a reality in a couple of years. You know, these Ghana networks becoming really good at producing uh really high-quality images and these sort of things, and this really got me uh um got me excited because now, in contrast to the quantum computer, AI was changing society today, whereas the quantum computer is probably something for the future. Sure, it's been the quantum computer has been 10 years away for for multiple decades, and that's what my feeling. So that got me really excited about AI, something that is really changing society right now.

SPEAKER_02 57:09

By the way, I was in San Francisco last week and I I'm hooked on these Waymo cars, so I think the Waymo's you know, uh self-driving vehicles really make a lot of sense. Originally I was very skeptical of it, and I know there's people who will get comments about the Waymo's and stuff like that. But you know, the coolest thing about a self-driving car, and I have nothing against humans, but when you get into a self-driving car, my favorite thing is I can put on my own music. You know, you can go there and select your little that's fair.

SPEAKER_00 57:35

That's actually fair, you know, and then that's it.

SPEAKER_02 57:38

Then you just get to where you need to go. And and you know, when was the last time you got into a cab and asked the cab driver to change the music to something? You can't do it. So I think that's the killer app for self-driving cars, is you know, yeah.

SPEAKER_04 57:50

Maybe that's the feature that would lead to adoption. That'll be it.

SPEAKER_00 57:54

Being in the EU, I don't get any of this stuff, probably not for another five years. So I'm looking forward to it. I've never tried one, but I'm honestly gonna maybe a little to have a little trepidation towards it. So next time I'm gonna go. This is ironic coming from us, yeah.

SPEAKER_04 58:13

But you can have self-driving in on in Germany, right? On the Autobahn, no? Oh, okay.

SPEAKER_00 58:18

I have not been to Germany recently, so um, yeah, no, but I have I have a Tesla and it all of the features of self-driving are disabled because I'm in the Netherlands.

SPEAKER_02 58:30

Oh well, such is life. But um, Andreas, thanks again for joining us and shining a light on this important area of edge AI. It's this really cool tool development. The fact that you're making it available now for free through the hub is really exciting, and uh, you know, I'm sure we'll be learning more about embedded in the future. So I really appreciate it.

SPEAKER_04 58:50

Thank you so much for it.

SPEAKER_02 58:50

Thanks for everyone for joining and asking questions. And you know, subscribe to our YouTube channel, get on our newsletter, learn more about embedded, all kinds of good things.

SPEAKER_00 58:59

So and see you in Taipei.

SPEAKER_02 59:02

Yes, I'll see you, Jimmy, in Taipei. All right, thanks, everybody. Take care.