From Lab to Low-Power: Building EMASS, a Tiny AI Chip That Runs on Milliwatts Artwork

EDGE AI POD

Discover the cutting-edge world of energy-efficient machine learning, edge AI, hardware accelerators, software algorithms, and real-world use cases with this podcast feed from all things in the world's largest EDGE AI community.

These are shows like EDGE AI Talks, EDGE AI Blueprints as well as EDGE AI FOUNDATION event talks on a range of research, product and business topics.

Join us to stay informed and inspired!

All Episodes

EDGE AI POD

From Lab to Low-Power: Building EMASS, a Tiny AI Chip That Runs on Milliwatts

March 04, 2026 • EDGE AI FOUNDATION

0:00 | 1:00:41

What if the only way to get real gains at the edge is to redesign everything—from the silicon atoms to the app you deploy? That’s the bet Professor-Founder Mohammed Ali made with EMAS, and the results are striking: continuous inference at milliwatts, microsecond wake/sleep cycles, and real benchmarks that hold up against the best in class while burning a fraction of the energy.

We walk through how a RISC-V core, dual AI accelerators, and an MRAM/RRAM-backed memory system work together to keep weights on-chip, slash data movement, and power-gate aggressively without losing state. The compiler handles pruning, quantization, and on-the-fly compression to achieve around 1.3 bits per weight without torpedoing accuracy, while a custom memory controller mitigates non-volatile quirks like endurance and read variability. Instead of chasing TOPS, the stack optimizes bandwidth, dataflow, and timing to match the realities of sensors and batteries.

The story gets especially interesting with drones. Since propellers—not processors—dominate energy use, EMAS applies tiny AI to the control problem, redistributing load across rotors in real time and extending flight endurance by 60% or more in hardware-in-the-loop simulations. We also dig into wearables and time-series workloads like ECG, audio, and vibration, where sparse sampling pairs perfectly with microsecond power gating. If you build at the edge, the dev experience matters: you’ll hear about the virtual dev kit with remote access to real silicon, a compact evaluation board with modular sensors, and an SDK that plugs into TensorFlow, PyTorch, and Zephyr. Advanced users can map trained models via a CLI; newcomers can lean on a NAS-based flow that proposes architectures meeting strict memory and power budgets.

If you care about edge AI, battery life, and shipping reliable products, this conversation is a blueprint for co-designing across the stack to unlock 10–200x energy gains without giving up performance. Subscribe, share with a teammate who owns your edge roadmap, and leave a review with the one use case you’d optimize first.

Send us Fan Mail

Support the show

Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org

Opening Banter & PSAs

SPEAKER_05 0:34

Hello, hello, hello. Good morning. Okay, so here we are, HAI talks. We're back. Um folks are rolling in. They're elbowing their way into the virtual lobby here, so that's cool. Um I know we had over a hundred folks registered, so we'll we'll see um uh who's who's showing up today. But um good afternoon, Sarah, I should say. Good afternoon, because you're in Europe right now, right?

SPEAKER_02 1:02

Hi, Pete, yeah, so based in Sweden. So afternoon still, and still light.

Guest Introduction: Mohammed’s Journey

SPEAKER_05 1:08

Oh, nice. Good. I'm in Bellevue, Washington, which it is still light because it is eight in the morning. So cool. All right, we're gonna have a good show today. Um I actually talked to Mohammed recently. So we have this so every other week we have these edge AI talks, we have these live streams, and then in between those weeks we have what we call edge I a partner, where we have discussions with different new partners, uh, new innovative partners that people maybe haven't heard of. So I did talk to Mohammed recently about um EMAS and what he was doing there. So I have a I have a little bit of a head start on all the cool stuff they're doing there. But um before we get into it, let me do a couple of PSAs. One is uh for those that are going to be in Amsterdam next week at the Things Conference, uh I'll be there. We'll have the whole Edge I Foundation community will be there. Um so I look forward to seeing you there. Uh please uh find me. Um we're actually gonna launch a uh a real-time scavenger hunt there where you can uh win cool prizes, and this is one of the prizes is this cool mug. Um and uh so keep an eye out for that on our website. And then we have if you haven't yet registered for edgei Taipei 2025. Um tapping November 11th and 12th in Taipei. Surprise, surprise, and um twice as big as last year. And it has some amazing keynotes and workshops and talks and uh and evening events. So um register for that. I think it's type2025.org. I should probably uh put that in a little banner. Um yeah, and so those are the two two uh calls to action for folks there. And of course, if you have not yet subscribed to our YouTube channel, please do so. I think we have maybe crossed the 50,000 subscriber boundary today. I'm not sure. That's pretty exciting. Sarah, what is new from Imagimob on your end? You've been traveling like crazy, you mentioned.

SPEAKER_02 3:08

So yeah, we just had a pretty exciting tour through um Singapore and Japan. So excited to hear our guest today, because he's from Singapore too. Hopefully he'll mention a bit about the industry. What I found there is quite innovative, uh, quite ahead. So um pretty cool. Um, and looking looking forward to learn more from him.

SPEAKER_05 3:29

Cool. Yeah, Singapore is a lot of fun. I'd mentioned I used to go there when I was at a previous company early in career. I used to go there all the time. And um it's such an incredible place, it's so unique. Uh the chili crab and the all that cool stuff going on there.

SPEAKER_02 3:46

So we've been said well, let's say that.

SPEAKER_05 3:50

That's the thing, like you know, traveling is hard, uh, but you know, I always appreciate the evenings because you can get an interesting dinner and you know, really understand the culture better through the food. So I think that's pretty cool. Um so yeah, hopefully get back to Singapore pretty soon. Good. Well, let's see. Without further ado, why don't we bring Mohammed on here? And um uh he can tell you his background, but he's actually a professor and a founder. So not unusual actually to find folks in academia doing cool research that then decide, hmm, this could be a company. So uh Muhammad's one of those folks, so let's bring him on. There he is. Welcome, sir.

SPEAKER_01 4:33

Hi, Pete. Hi, Sarah.

SPEAKER_02 4:36

Good to have you.

SPEAKER_01 4:37

Uh pleasure is all mine.

SPEAKER_05 4:41

Good. So where are you? You're calling in from Singapore?

SPEAKER_01 4:44

Yeah, today I'm calling from Singapore, yeah. So uh I'm a bit of an international citizen, but today I'm in Singapore currently.

SPEAKER_05 4:51

Yes, yes. I think the last time we spoke, you were in Egypt.

SPEAKER_01 4:54

So yes, indeed. Yeah, I was uh in the midst of like me and spending a little bit of summertime with the family and uh working with our RD office in Cairo, yeah.

SPEAKER_05 5:04

Yeah, so that's cool. Yeah, so uh different cities, same screen, as they say. Um, but uh cool. So so maybe give us before you launch into talking about all the cool stuff you're working on, why don't you give us a little bit of a background on yourself and how did we get here?

The End-to-End Edge AI Philosophy

SPEAKER_01 5:23

Yeah, sure. All right, so uh hi everyone, right? So good morning, good afternoon, good evening, I think, mean depending on the time zone you're in. Um so my name is Mohammed Ali. Um I'm I'm wearing on multiple hats, but currently I'm mean the hat here, I'm the founder of uh an edge AI company called EMAS. In the meantime, as well, I'm also um a professor here at NTU Singapore. Um my journey started in Switzerland when I did my degrees in EPFL, then I went to Stanford. I did a little bit of research on like kind of um the whole thing that I was always working with. How can you bring new nanotechnology devices and integration technologies all the way to application level and benefit from them while at the same time overcoming their short uh like mean their challenges right in a scalable way? So I joined here, uh, I mean, I spearheaded a research project that's spanning multiple research institutes here in Singapore, and we were looking into the next wave of um AI hardware. That was in 2017, 2018. So we looked into, I mean, the main concept that's always been pitching in is that if you want to achieve significant gains, you have to look at the entire stack, see in between those layers that are these tiny hidden knobs that when you tweak them and you make all of these melded together, you will get significant gains, much larger than dealing with each layer of abstraction separately. So we were working on this and we're trying to get that. We built a couple of mean very large chips, and then we say that hey, this actually could be very efficient in like in very low power platforms, where um back in the day, right, edge AI was still emerging, like back in we'll talk about 2019, 2020. And the challenge there was like, I mean, can you really put AI there? And secondly, can you preserve the batteries right for a very long uh duration? And we started playing with all layers of abstraction, we came up with this nice concept, and then after all that, we said, hey, this is too good to just be in shelved out and waiting for the right uh I mean entrepreneur to come and take this uh to the light. So I said, hey, you know what? I'm gonna take the jump. And I founded the company, and uh the rest is history. So, I mean, four years down the road, mean we managed to get some chip development. We we worked so hard, we grinded like mean going crazy here and there with very few people. Then slowly, slowly, we started like getting more traction. We've been acquired by Nanovie, where we're helping us significantly through connections and funding. We grow now to the point that we're starting now going for larger mass production, much the newer generations of chipsets, customer tractions, even engaging in applications that we actually are excited about in various fields in terms of like mean wearables, drone technologies, predictive maintenance, and whatnot. So, yeah, so just a little bit to start it, mean as in research, and then we say, hey, this is too good, and now here we are.

SPEAKER_04 8:30

Yeah. Are you still doing uh teaching? Are you still doing lectures and things?

SPEAKER_01 8:35

Yes, indeed. Um yeah, I mean, I I think I mean I'm I'm getting more white hair than expected at my age because of the stress, but um uh currently really focused on the company, but teaching is something that I'm passionate about. So, I mean, I still teach undergraduate students as well.

SPEAKER_05 8:53

Yeah, yeah. No, that's cool. Yeah, I was saying it's not unusual to find uh in this space people doing, you know, professors and educators doing some really cool research, and then it's sort of like, hey, maybe we should productize this and uh you know get it to market. I I can probably just top of my head, three three to five at least folks that I know that yeah.

SPEAKER_01 9:14

I think it's the Silicon Valley uh mindset that when I was at Stanford just like me got in me, and then when I came here, I said, okay, let me apply that.

SPEAKER_05 9:23

Right. Just spin it out, raise some money, and yes, indeed. Uh yeah, that is awesome. And Nanoview is based in Australia too, I believe, since we're talking about geographies.

SPEAKER_01 9:34

Yeah, so Nanoview has the main office in Australia, they have here uh an RD office in Singapore. So we have now the headquarters here in Singapore, an RD office in Egypt, and we have uh managing a marketing office in North America, United States. So we are spreading across the globe as well.

SPEAKER_05 9:53

Yes. Yeah, I was talking to someone the other day about, you know, we've done a good job sort of flattening the earth um with uh these tools to meet and collaborate, but we have not solved the curvature of the earth problem with time. So time zones are still challenging uh for all of us to kind of get everybody on the same page. But uh no, it sounds cool. Um yeah. Sarah, do you have any uh questions for Mohammed before he jumps into his uh his spiel?

SPEAKER_02 10:25

No, I've been pretty curious, so let's get started. And I think the questions will pile up as you go.

Chip Architecture & Memory Strategy

SPEAKER_05 10:30

Yes, yeah. So a reminder to our visitors, this is a live stream, uh emphasis on the live. And so uh I see we've got a bunch of people here. So uh feel free to dig in, uh, shout out where you're from, uh, where you're calling in from, and then um also put your questions in here. And Sarah and I will monitor the stream and uh and politely cut in at various times to uh to pepper Muhammad with your questions. So don't be shy uh on that stuff. Okay, so why don't we bring up uh um uh your deck and uh we will give you the floor, we will hang back and uh let you kind of get into it. And like I said, we'll monitor the chat and uh oh, I see Tim here from San Jose. Okay. Um actually, why don't we just uh oh let's see. Well, let's just throw this up here just for just to show people how it works. So Tim from San Jose looking at a discussion about CubeSat launch from build by high school students. What is CubeSat? Is that uh I'm not sure what that is.

SPEAKER_01 11:35

Oh wow, okay. So high school students doing some sort of like um, I mean, a satellite cube and they're looking for a little bit of a power hardware into space.

SPEAKER_05 11:44

Yes, that is interesting. Um so sidebar on that for Tim is like there's a ton of action going on in the edge AI space um in space. Uh so a lot of like uh low Earth orbit satellite stuff happening to do um analysis of uh crops and uh environmental detection and things like that. So doing that monitoring on the platform in space is kind of the next thing. So lots of companies doing stuff like that. Um and probably uh you know the EMAS silicon could help there too at some point. But I I will defer to Mohammed to explain how that would work.

SPEAKER_00 12:20

Yeah, absolutely. Yeah, more than happy to explore uh I mean that beyond uh what's happening on the earth.

SPEAKER_05 12:26

Perfect. All right, well, we will we will hang back and let you let you do your thing.

Power Gating with MRAM/RRAM

Compression, Quantization, Pruning Explained

Benchmarks and Efficiency Claims

Dev Kits, SDK, and Tooling

Drone Energy Optimization Use Case

SPEAKER_01 12:31

Sure. Okay, all right. Thanks, Pete and Sarah, for the introduction, opportunity, and the time. So, and thanks everyone for uh showing up. I'm really excited and honored to uh give you a bit of highlight, I mean what we're working on and what I mean, what's EMS about and where do we we stand at the moment and what are the secret sauce behind it, hoping that it can spark interest from uh potential collaborators or even others who can see what can innovate beyond that as well. So the main thing that I'm looking into or like being a focus on, the idea is going from atoms to applications, right? Which is really from the nanotechnology from these devices all the way to the application level. I mean, we're looking at the whole thing as one monolithic unit and see how we can optimize all of these layers together at the same time to achieve significant gains. So just a little bit of uh introduction, and I think we just like mean uh shed the lights on it before. Um, EMAS is short for embedded AI systems, and uh we actually founded 2020 in Singapore, and then as the journey went by, we finally had a RD center here in Singapore and in Cairo, Egypt, where I'm spearheading both of them, and sometimes I travel back and forth. Um, I mean the idea being in Egypt is that I mean I'm originally from Egypt and there's a lot of good talent there. Um, on a side note, actually, if you can see the logo of EMAS, is actually it is one of the what's called the Shen symbols, which is an ancient Egyptian, it's just a little bit rotated. Uh, that always when kings put their names inside that cartouche. Uh just um in a fun fact uh as we go. Uh, we kind of have also commercial business in the United States, and we look into um building a chip and the entire ecosystem around it, which is the compiler, the software, the board, right, that allows you to run AI with minimal uh retraining, right, to deploy any kind of solution with our uh chip in it. So the current leadership team, we have uh I mean Mark Garnson was the CEO of the semiconductor branch of None of You. Um he actually came out of retirement to uh lead this one and in the executive front. And I have uh Scott Smeiser, um, who I have the pleasure of working with. He's the vice president in sales and marketing, and myself, right? I mean, where I'm uh I mean now currently this acting as the CDO of EMAS and I'm the founder of it. At the same time, I said I'm I'm an associate professor at NTU Singapore here. Um, I mean, where I've been had the pleasure of conducting all the research before uh spinning off, right, in its premise. And just like before that, I was at Stanford. Uh before that, I was at EPL in Switzerland. So I consider myself somewhat of an international citizen where I've been lived in many continents. All right, so um I think that since we are talking about we are here now in the edge um AI community, right? We have the main challenge that we want to run a lot of applications in a battery constrained systems. And if you wanted to do this, right, the idea, if you wanted to get that done, you really have to look into the entire system as a whole, not into different pieces separately. Yes, you can solve some of the components there, but to get your maximum gains, right, you really to want to bring your AI applications very close to where the sensors are, redesign your semiconductor chip so that it knows uh mean what kind of uh where it directs the energy to where it's actually needed and it doesn't consume any other energy otherwise. And at the same time, make the application squeeze it and fit it into the current chip with its tight constraints, at the same time, do not uh reduce the accuracy, or otherwise you will not have a meaningful application coming back. So, but if you do this, right, if this is a very complex uh mean equation to solve, but if you manage to get that, it really unlocks new opportunities. And I mean, there's a lot of work happening globally, right, trying to tackle all of this, and just we are fortunate to be one of these uh I mean players on that field, right, to try to really crack this equation and try to get uh the whole thing uh mean serviceable and available right to the masses uh um in this context. So so how do we approach the uh this kind of problem, right? As I said, we look into it as one full uh mean monolithic uh layer. Is it from atoms all the way to the application? So when we design the silicon, right, we try to see what kind of emerging nanotechnology devices and integration techniques that we can benefit from that can give us, right? Mean an unfair advantage or can give us significant benefits. And if there's any kind of shortcomings after it, can we solve it at a higher level of abstraction? So we don't have to wait for the technology to really fully mature. If it's just ready, but there are some uh kinks around it, we can try to solve it at the application at the application level. From there, we can take to develop modules, right, that can be um more of a plug-and-play option and modular with different sensors that will be targeting different applications. But as you design the application and the AIs for that, you have to also bear in mind that it will run into these small chips. So you have to be very careful what kind of operations you all can only support, what kind of operations you can provide that maximizes accuracy and make uh give you the answer that you really need without requiring a significant amount of computational power, which will then cause a larger battery or will cause significantly shorter lifetime. So, what I will go, I mean, in uh in the next part of this talk, I will go bit by bit right into these abstraction layers and try to shed some of the technical highlights. I will try to give it in layman uh terms as much as I can, but if you want me to go deeper, I'm more than happy to do so. If you if you feel like it's got this is too much, I'm more than happy to try to even raise the abstraction level a little bit higher. So the way that I said, right, it is a full end-to-end approach, right? We look at the entire system. So we try to say that, okay, I mean, there's a link between the algorithms, right, and the data flow of your hardware, right? So with a chip design, you have to know exactly what kind of algorithms. Try to make your algorithms more aware of the underlying compute fabric and make the compute fabric also at the same time mean more have some sort of uh support for some of the algorithmic transformation that you would have at a higher level. Then you go at the lower level, even at the circuit design, when you actually lay out your chips, right? You try to make sure that there are not any kind of unnecessary delays, and at the same time, give power gaining support or like mean enabling the idea that you would only deliver the power to just the components that need to be run at the moment, at that exact specific point in time, and the other ones would be just like turned off. To be able to do this at such a fine granularity in time and in space, you would need some new nanodevices that can help you switch things on and off, right, in microsecond range. And that's where new non-volatile memories come in play. And then with this new non-volatile memories, they actually have some sort of shortcomings. So you have to redesign the algorithms to be aware of these shortcomings so not to stress them, right, and make expose this kind of inefficiencies in that. That's why the whole thing is end-to-end and everything is interconnected together. When you get everything done properly, right, then you achieve significant gains. But we will see now in each of these layers how this could be done. So the chip itself, if you look at the data flow that we have, right, it's a RISC-V-based processor. We have two AI accelerators in it. One, each one, as you can see here in the uh diagram or in the figure on your right, is actually responsible for certain kinds of kernel operations together, and we have also a support for AI compression, they can run the vast majority of the kernels in convolutional neural networks, multi-layer perceptrons, and recurrent neural networks, okay, that can run them sufficiently with very, very compressed weights that can give an average of 1.3 bits per weight, which means that while it is like non-power of two number, that there are some layers that will have one bits, some layer would have a little bit higher than one bit, and even when the layers with one bit can be compressed, right, to merge everything together. But at the application level, this does not harm the accuracy. And we will see like in the next slide how this is possible. Now, to merge to complement this, we need to increase the on-chip memory to have some sort of support for larger AI workloads. So we look into mean SRAM, that's our temporary memory to store our uh I mean variables, right? All and you have like this all partial sums and whatnot. But for your AI workloads, and the idea if you want to turn on the system on and off, we have this non-volatile memory, right? Magnetic RAM is an example, resistive memory is also another one. So we're using MRAM or R RAM, right? And we have experimented with both technologies as well that can help you store the AI weights. So when the chip is not in use, you can turn the entire chip off and back on, and it will retain its state. So there's no loss of information, you don't need to retrieve the data from any off-chip uh flash memory or even on-chip flash memory. Then when you are having this, right, you can also get your weights there, right? Keep it, keep it there, and then when you are done with your inference, or even layer by layer, you can selectively turn off your processor, you can turn off this accelerator or this accelerator or any kind of the other component that is not in use. Now, another thing that we also think about is that AI not just is compute intensive, it's also memory intensive, it's memory-centric. So you need to have a large accessibility to memory. While we don't have a lot of capacity, but still you need to have wide parallel access. That's why we have a lot of parallelism when accessing this kind of memory that can reach orders of magnitude over 12 gigabytes per second. Bear in mind that this runs at 50 megahertz, right? If it runs at a much higher frequency, we can reach multiple terabytes per second access bandwidth. But that's not the point here because we're running at the edge domain, very high speed is not needed in that context. But could having this non-volatile memory that acts also with this large uh or this high speed bandwidth or this wide bandwidth, right? Eliminated the need for any kind of high bandwidth memory or really aggressive DRAM that can give you this kind of bandwidth, but would be with either higher energy or the idea of having two-chip solutions, right, which would increase your uh footprint. And that could be limiting, right, for some applications such as smart rings or even specs or even any kind of application that require very small footprint applied there. So when you combine these things with the idea that you can compress your workload, you can get, we get actually around from one to five milliwatts running in continuous inference of the certain workloads. And with that, right, you can get that even in reality, when you have the idea that you can get data right from the uh environmental sensors, not continuously, because it the mother nature doesn't provide very high uh I mean uh bandwidth data, right? So you can turn off the system when you don't have data to run the AI, and then once you have sufficient data, you turn back the entire system and make it operate. So you would really save the juice. And not just we're talking about like going into ultra-low power, we're talking about complete shutoff. Now, to get this chip to be interfacing, we have the regular sensors, and also we say that it has a very small package, which is five plus five uh millimeter QFN package, and that would help us right to be run into these kind of systems that require very small footprint. So, how did we do the trick, right? So if you look at the algorithm right layer, I mean, I think many of uh experts here mean it's no surprise that looking into how we can reduce things, there are a number of options. You either you do some pruning, knowledge destination, which is this idea is removing unnecessary weights, right? Which remove unnecessary computations. So that's one way. Secondly, right, we do quantization, which basically means that each variable can use smaller and smaller bits at the same time, right, without any kind of affecting any kind of accuracy. Then, once you've done both of these together, now you have your weights. Can I really compress it right in such a way that when I bring it to the chip, it's really compressed and I can decompress it on the fly. So it goes to my AI accelerator. So we have applied the three of these techniques together, and that allows us to achieve significant production, right, and speed up compared to having a full precision or a full-blown workloads, right, uh, for this case. Now, this is something that's been available in the community. We look into this, we have done it in the convolution neural networks for recurrent networks, and currently we are also conducting a little bit of experiments to apply this in language models. So try to get small language models into tiny language models and large language models to medium-sized or small-sized language models as well. The other thing that I mentioned about is the on-chip memory, right? And we talk about we're using MRAM or RRAM, and that actually enables fine-grained power gating. What do I mean by fine-grained power gating is that you can select small instances of time, and we're talking about here milliseconds, right? You can turn off the entire system and then you can turn it back on. Now, when you're using MRAM and RRAM, it allows you to do this in microsecond rate. So you can turn off and turn on in microseconds, and that's almost 5,000 times quicker, right, than doing this with flash. So suddenly you have this opportunity that once you finish running your AI workload at the moment, while waiting for the data to come, instead of going for ultra-low power, will still leakage, it's gonna be a component, you just turn the entire thing off, right? And when you entire the entire thing off, more than 90% of it is being no consuming zero power at all. And all that's needed here is that once you have sufficient data, you will trigger the system, get it, turn it back on, and re with the previous retained state, and resume the application execution. And when you do something like this, right, then combined with the fact that resisted memory and magnetic memory are even more efficient than flash, right? You can read from them directly, right, instead of doing it through to SRAM and then using that, and you will be able to achieve significant gains, can get you up to 10x, right? Benefits in energy due to this uh this kind of trick here. Of course, now when you want to do this, now this sounds too good to be true. So, what's the catch here? That R RAM and RM MRAM memories are not perfect memories, right? They have a lot of challenges going on. I mean, some of them can actually, when you read them, you don't read it properly. Some of them, when you write them, if the writing to them is not done properly, you have to rewrite a number of times. Even they have very limited lifetime and endurance, so the lifetime of them is very low. It can be like 10 to the six times or million times, and then that cell goes bad. So, how would you be able to tackle this right as the technology matures by the foundries and still make use of that? And that's where we come in. That the applications, right, that when we this when we reprogram the applications, right, they are aware that you should not be doing a lot of rights here. That's one example. Another thing that if you do a lot of rights, you have to do it in a very smart way that you don't have to stress the same location of your memory again and again and again, right? Something that's called like like like where leveling. So, in a combination of the algorithms tuning and the architecture at the same side, right, you will be able to overcome this kind of limitations and you don't have to wait for years for the technology to be super mature, right? And you can actually go ahead and using, become an early adopter of this technology and gives you an edge compared to other players in the field. All right, so when we combine all of that together, right, if you look at where we stand compared to other uh I mean providers of edge e so we're doing using with RISC-5, we'd have two deep learning accelerators. Typical other ones will use ARM processors and they use like their own version of NPU. Sometimes they can use DSPs. Now, with the memory here, we don't require any external memories, although I mean there are some other players also doing the same thing, but the majority also some require some sort of a DRAM or external flash. Now we have one of the very high compressed models, right, on our side, and we can mean provide that to our chip. And others would require to have this kind of like full precision, whether it's at eight bits, four bits, or two bits, right? These models have to be supported in that way. We do not require the user to compress the model because. Because we will do it for them. And as a result, we can get to five milliwatt active power, and that's running on continuous inferencing. It's not just like one pod. This is doing inferencing again and again and again and again. In reality, that doesn't happen. So usually the power drops down to one milliwatt or even lesser than that. So all of this, right, require this enables us to run neural networks, classical applications, and we have multiple uh I/O uh ports that can get a lot of sensor data, fuse it, and give it to the AI engine to give even deeper insights, right? Despite the fact that this is a very, very small chip, but it can have a lot of power usage, a lot of applications in fields such as an audio, vision, right? An IoT and even wearables such as ring or mean like mean uh biometric devices and so on. Right. Do we have any questions, Pete, or should I still continue?

SPEAKER_05 30:57

A couple of questions, uh, some internally um in uh generated. Um there was a question here. I think you had mentioned also on your website around drone drone applicability. And you know, certainly in in kind of drone deployments, it's all about uh power and weight, right, and battery life. Uh, can you talk a little bit more about kind of the applicability for drones and and do you have any kind of drone trials that are in process?

Wearables & Time-Series Sweet Spots

SPEAKER_01 31:25

Yeah, so actually, I mean the drone is one of the examples that I will talk, I'll shed the lights on a little bit later, right? Um, I mean, the the up the use of AI in drones, right, have been captured the interest for quite some time. Like, and the idea is that the all the points that people talk about is enhancing the functionality of drones, right? That like will try to provide with certain vision, try to detect objects there, try to help in navigation and so on. But even if you have the magical chip that would do this, right, and you start speak to the uh that consume zero energy, right? And you speak to drone manufacturers, they will tell you, well, I mean, actually, our problem was never the electronics. Our problem is the energy consumed by the propellers. So that actually sparked, I mean, our uh quote unquote research. And look, can we use our trip, right, to re like distribute the workloads on these propellers and save the energy right while you're flying so you can achieve the same thing but consuming lower energy. So we did some analysis of that, and I will explain it a little bit later, right? But the bottom line, we achieved to get like enhancing the flight time in simulations, yeah. I mean, uh, I mean, granted, but by up to 50 or 60 percent, sometimes can even reach 70 percent improvement in your flight endurance, which is on the same charge, you cover wider distance or you cover longer flight time, and that's thanks to the idea that you will have some sort of automatic updates, right, of your AI application. At the same time, you would have another AI that can perform this kind of prediction using that or relating that to the inactivatable component, and that's what you achieve actually real savings in the in the drones. So, yeah, so that I mean 50%, 60% is not significant, but in drones actually the same.

SPEAKER_05 33:36

Yeah, so you're using the you're using kind of uh the AI workloads to optimize the propeller usage to getting more energy and uh flight time out of the drones, which is really interesting. Um we might be having a little bit of a bandwidth thing going on here. I'm not sure. Sarah, are you seeing a bandwidth thing going on here?

SPEAKER_02 33:54

Yeah, I was thinking we may need some AI to piece together your your answer, but I think most of it we could follow.

SPEAKER_05 34:00

There you go. Yeah. Mohammed, you might be getting a little choppy on us. Um but yeah, we can I I guess if you're gonna talk more about drones, that's pretty cool uh in in in a minute. So we we can dig into that. That is kind of uh a pretty canonical example of where edge AI is helping. I mean, obviously there's the AI vision part.

SPEAKER_01 34:21

Um close a lot of things. Okay.

SPEAKER_05 34:28

Um but uh but certainly I've not heard about uh AI being used to optimize uh some of the propellers and the uh propulsion and stuff, so that's pretty cool. Um another question uh we had uh here. Um well maybe we should wait for you to kind of get your bandwidth back together. Um we are live. Uh here, let me let me know.

SPEAKER_01 34:58

Okay, so I'll yeah, I'm back. I think it's just like when there was a lot of running workloads in my laptop, and I just I need to pull some of them to uh keep the CPU free.

SPEAKER_05 35:10

There you go. We got the CPU thing going on. Um there's a question here about uh uh is this architecture applicable to generative image applications? I think you might have mentioned this. I mean, you know, this is not a TOPS type of platform, it's more of a GOPS platform. So uh yeah, yeah.

SPEAKER_01 35:32

So yeah, so while while the principle is applicable, the current shape that we have, I think it will be a little bit like mean a bit of a stretch to try to do this, unless you do it very, very small images, yeah. Right, right, exactly.

SPEAKER_05 35:46

So not yet, I would say. Um another question here. Uh how do you scale from Alan? How do you scale to microwatts?

Q&A: Attention, Scaling, Controllers

SPEAKER_01 35:59

All right, okay, very good question. So the the most obvious way to do this, right, when you do going now is to go down the technology roadmap. So this chip was fabricated at uh 22 nanometer. Um, I'm still audible, right?

SPEAKER_05 36:17

Yes, sorry, having a little bit of a layout here. Sorry, Sarah's getting hidden by this thing.

SPEAKER_01 36:23

Yes. So if I bring it down all the way to three nanometers, right? So, I mean, we're talking about like mean five or six generations of and reducing the power, and we're going from one milliwatt, this will be in like in the hundreds of microwatt already, if I just like with this kind of technology trick. But I think if you wanted to really sustain that, you have to rethink the data flow, you have to maybe incorporate a little bit of neuromorphic computing, but you have to do it in a scalable manner, right? So that when you do things in that front, you don't have to go back and forth inside and outside of the neuromorphic array. So it is possible you have to do a number of tricks. One of them is involving technology scaling, the other one will be like changing a little bit the data flow.

SPEAKER_05 37:12

Got it, got it. I had a quick question too, and then we have another question coming up here, but um, you spent a bunch of time talking about your memory controller. Is that so that is that a memory controller that's that was designed by EMAS as part of the system?

SPEAKER_01 37:26

Yes. So we have a memory controller inside that's like been designed by us. So the memory controller handles the uh the traffic that goes from the processor and the AI accelerators to the memory, and it handles also the transactions that goes to the MRAM or the R RAM to ensure that you don't you don't like mean stress it and cause any kind of endurance failures or kind of uh uh temporary uh read or write failures.

SPEAKER_05 37:51

Okay. Yeah, and for folks that don't know, I mean, I think the point you were making too is that a lot of um kind of your power optimization and your performance is in your moving from the AI acceleration logic to the memory logic and back and forth and uh making sure that that's all uh perfectly aligned and in sync. Um, because what you see in the out now in the kind of more general purpose computing world is you know completely separate systems for memory and processor and GPU and um that's even why companies like NVIDIA, when they get into data centers, like highly optimize the paths between these different systems to uh you know minimize uh any performance loss. So you're doing it sort of at the on-chip at the microphone.

SPEAKER_00 38:35

Very, very small. Yes, exactly. Yeah, that's awesome. That's awesome.

SPEAKER_05 38:39

Okay, before we get you back into your deck, Sarah, did you have any other questions you wanted to throw at him while we're oh yeah, actually, while we're at it, um I see that you mentioned a lot of applications.

SPEAKER_02 38:50

I would love to know kind of what's your sweet spot application where you've seen the technology work really well and really be able to extract the last bit of performance that you want.

SPEAKER_01 39:00

All right. Um, I think from from what I uh what I've seen from my experience, right, the app the time series data kind of applications is the one that benefit the most from this. Because in in reality, these kind of applications actually do not provide very high data, even the vibration data ones, as long as you're dealing with mechanical or even uh sound, right? These are very, very slow compared to the electrical or electronic, uh how electronics operate. And that gives you enough window between one sample and the other one that you will be able to run your AI, completely turn off the entire system and wait for it while it gets back on. If we're talking about like doing something in like particle accelerators and all that, yeah, that's gonna be very tough uh to do that. Videos, um it's it's always complicated, right? But if you have a very small, but still images would be uh also a workable solution. But when you work with still images, for example, fingerprint, or you're working with like facial identification, right? While we see some gains, right, mean a user would not be feeling the difference if you're running if he stays there for one second versus two seconds, he would not be like me seeing much difference in that. So time series data, wearables, right? Um, anything that uh interacts with this kind of sensors, we see the benefits on that front.

SPEAKER_05 40:23

Cool. All right, we have more questions, but why don't we let you finish your talk and then we'll circle back? So they're starting to pile up a little bit. So we're gonna exit and give you the stage for a minute.

SPEAKER_01 40:36

Okay, okay. I will um I will continue, right? So as a result, um, of we have here right the chip, just like more of a recap, right? With all of these components together, meanwhile we run at 30 uh gigaops, right, at two milliwatts, that gives us an efficiency of 12 trillion operations per watt, right? So this is a very, very efficient chip at this kind of uh at this kind of scale of this kind of like mean uh class. And we're having around four megabytes of on-chip memory. We can provide the compression of the models, and we have fabricated the chip at 22 nanometers, right? Which means that chip is actually quite cheap for selling, but also it gives us a lot of leverage as we scale down the technology, and as technology gets depreciated over time, we go down to the 12, the 10, the 6, the 4. We will see more and more benefits and even opportunities to have even bigger systems. So um, one thing that uh would say, okay, I mean, all of these numbers sounds good, right? I mean, did we run things or did we how did we benchmark? And we actually use the ML ML Commons uh tiny benchmark suite, right? It had like four or five of them, and we compared against the publicly released ones, and we see in a consistent way, we achieve right in terms of like execution time, it means better if it's not the same, right, as the fastest in its class. So where there's no kind of uh degradation of any kind of performance, but simultaneously, right, we can get from 20, 10, even 200 times lower energy compared to which platform you are uh I mean that's having this uh releasing their uh benchmarking results. So these are two applications for image and visual wave words. You can see that we achieve simultaneously 20 times lower energy, and the execution time is within the same range as the fastest. The um the actually the the operating point is where the circle of the EMAS is in. And we can show that also for other uh benchmarks as shown over here, right? Comparing with the best in class for each one of these benchmarks, we consistently the energy gets orders of magnitude, 10, 20, or even some cases 190x, right? While in performance, we are actually all doing quite very well, very close or even faster than the current uh state-of-the-art solutions for that. So this makes the system right really ideal for solutions, right, in wearables, in drones, right, in IoT and robotics, um, in healthcare and med tech, even some kind of automotive as well in agri tech and environmental sensors. So you can apply it in any system that requires that operation on a battery, you need to run it on a very long part uh length time and works with time series data. That's where our application or our chip really shines in. So now if we take a step back, okay, and we say that now we have a very good chip, we have very amazing energy efficiency. But if the user cannot use it very well and do it in a very short time or like requires a significant steep learning curve, right, for it to use this kind of chip, it becomes unappealing, right? And we've seen this again and again in the industry. So it doesn't become any kind of technological advantage anymore, but how can you become really uh well integrated with the entire ecosystem? So in this case, we have actually the development kit, right, that we can give you a virtual board environment. So you can be still okay, you say I have an idea, but I don't want to have the chip right now. We are providing this kind of remote access to the chips that are sitting in our uh server environment that you can run workloads on it and get the performance right on it. If you wanted to have the board and you want to work with the SDK, right, you can take the SDK that's easy compatible with all of our AI development suites. So it's compatible with TensorFlow, compatible with PyTorch. So if you have your models already there, you can very easily with as simple as a click of a button, right? You can create the library that would make things run into our chip, will give you the API that then you would need, you can write it. Now you would write the code, you can write it in bare metal or even on an operating system, which is uh one of them is the Zephyr operating system that's actually has a lot of other software layers. So if you have a full system that had other applications that not just AI, but you build it on top of Zephyr AS, you can easily port all of that and make it run into our chip. At the end of the day, we have a RISC-V core that can best pretty much run any kind of application on it, depending on the performance uh I mean, uh operating point that you would like to have your system running on. So, to be able to do this, we also give the opportunity for users to very quickly develop their application. So we have this is actually the third generation uh evaluation board, which is around 2.5 plus four centimeters. This is a very, very small chip, a very small board connected with USB type C. And we have also provided the support that you can integrate sensors, right, with this uh sensor expansion board. And that is done actually in collaboration with the tier one component providers that they are providing us the component for the board for the evaluation for the sensor boards, and we have developed this one. And I think that I'm happy to announce that in the Things Conference, we should be able to see a live demo running in this new evaluation boards. So the sensor board would look something very small like this, right? And then you would attach it here, you just like put up put it on it. We have developed the libraries, so once you press it on there, you can use the libraries, so it's more of a plug and play, and then you can quickly write have your own prototype for predictive maintenance, for wearables, for cold asset tracking, or for any kind of other kind of sensing environments. We will be able to help you using this as a uh using the blueprint of these kind of uh modular boards, and we attach the right sensors, and you should be able to have your entire PUC very quickly, even in the hardware front and on the software, since we provide you the APIs to access any kind of these sensors as well. So this brings us now the whole entire thing. So we have now the AI, really innovative, right? We have the chip that can run things, mean with very low power. We have the ability to integrate sensors really efficiently and even from a user perspective, right? It doesn't have to take a lot of time. But then what about applications? So we looked into a number of applications on that front. The one that I wanted to focus on is actually in drones. As I mentioned earlier, right, we we tackled the drone problem that in a in a new way that we wanted to help the drone to redirect its energies towards increasing flight time rather than enhancing the AI functionality, which is possible with our chip. I'm not saying it's not possible, but we wanted to create a bigger impact on that. So, we what we have done, we have created a simulation environment using state-of-the-art tools and with hardware in the loop, which means that the drone simulation environment would send data to our chip, our chip would control the virtual drone in this environment, and then we would see compared to a drone that's running with the default controller, how what is the difference in terms of energy consumption and how longer we can fly with the drone here. So we found that in actually we can see 60%, sometimes we can reach 80%, right? Increasing flight time, that's in just in quad in quadcopters, hexacopters, and in octocopters. So we looked in that in different weather conditions. So we are now looking into based on this kind of findings, to map into a real drone doing for live trials. So we're starting now the work with this, and at the same time, looking for OEM drone manufacturers who are willing to collaborate with us on that. So the bottom line is that thanks to the availability of our ship that can run at very low power and getting the time series data right from the drones, such as the altitude, the airspeed, the current orientation, and the target destination, right? The current state of charge of the battery, you'll be able to come up with a virtual model of the drone that uses to derive an AI model for what could be the best next action, of course. And you're doing all of this right in a power budget of less than two milliwatts. So when you're doing this very low power, so any energy is being saved is not consumed by the electronics, right? You don't need a very big uh GPU to get that done. You can actually do it on our trip, right? No problems at all. And then you'll be able to get that done, and you have also enhancing functionalities. In addition to this, you can even save that energy, do something else with it. But the main benefit here is that you can make your drone instead of flying for one hour, you can make it now go for one hour and 30 minutes or even slightly longer than that. So that's a huge thing that would help mean helping in missions for drone delivery, for rescue missions, and whatnot, and all of the other use cases that could be possible with a drone and now with this kind of enhanced functionality. Another thing that we were looking into also is in the wearables. And we have already examined our chip with fault detection, with prediction of mean arrhythmias or sleep rates. So we have successfully integrated our chip with an um with all of these kind of like mean ECG sensors and we and gyroscopic sensors and we managed to do some trials on it to me detect that if someone is falling and we can take a quick action, especially in nations where uh population is aging, right? And it becomes a problem that there's not enough care for them, and and these people could not be really a technology savvy. So if they if something happens to them, right, it raises an issue, right? And that stresses the government resources to be able to take care of them. So these kind of solutions actually could help not just in mean fault detection, but sometimes it can help in kind of a preventive uh measures. So we're looking to mean slowly in penetrate in this field, and we're having also trials, mean looking also uh potential partners on that field as well. So the technology at the moment, right? So before I conclude, so it's now we have uh developed it, examined it, we have uh all the solutions that can make the technology really sustain uh superior energy efficiency. We're even advancing it with newer generations, with a newer chip, with a smaller technology node with enhanced functionality on it. And at the moment, we are now into the commercialization phase. We're trying to uh work with potential partners to come to come up with POCs and even do some design in. And they are already working with some of these partners, uh, some of them actually even tier one electronic components providers that I mean I'm not allowed to uh disclose their name yet. But to conclude, right, I mean, just remind you that the idea of EMAS is to look into the entire stack, right? And hopefully that by this you would find this inner hidden knobs and you'll be able to achieve significant gains, and that allows you, right, to extend the battery lifetime of the electronic components that even have an impact into uh mean other components in the entire system as well. Thank you so much, I mean, for listening, and I'm more than happy to answer other questions.

SPEAKER_05 52:20

Yes, we have lots of questions. We do have a few minutes left, so hopefully we can power through some of these things. But uh thank you, Mohammed, for uh for that presentation. You were uh I could tell leveraging your professorial lecture skills there. So that was good. Um, a couple things to note, by the way, before we jump into the questions, too, is that uh if people are at the Things Conference in Amsterdam, uh there will be a demo there happening next week. And then key uh key item too is Muhammad himself will be at our Taipei event in person. So if you want to meet uh the professor and the founder, uh come to the Taipei event and uh and you'll be able to um see him there. And there's our little uh banner for getting your tickets to the Taipei event in November, November 11th and 12th. So you can meet Muhammad in person. Okay. Uh let's see. Questions, questions. Well, you mentioned this before. Uh so just a question from um uh hybrid robotics uh N7 PKT. So you have the virtual dev kit as well as MCU boards that people can purchase, right? So those are currently available.

SPEAKER_00 53:38

Yes.

SPEAKER_05 53:40

Cool. All right, good. So and you can go to your website to get it.

SPEAKER_01 53:45

Yes, indeed. Like if someone is very interested, I mean we will uh help him to set up the the connection and uh he will be able to develop his application and run uh perform the live runs remotely, and then he can see the performance of it.

SPEAKER_05 53:58

Yeah, here's a question that um I wasn't quite sure how to interpret this. The attention layers. What attention layers does your AI model mapping flow support?

SPEAKER_01 54:09

So oh okay. That's uh that's uh that's actually been an uh interesting question. So at the moment, right, we are uh doing the uh convolution layers and we're looking into uh all the attention layers are now being running on the uh RISC-V core. So the attention layers can run, right, as any kind of uh any kind of application, but it doesn't have any kind of hardware support yet, right? We are developing our newer uh I mean AI accelerator that would look into that, but given that our main focus was an edge AI, right? And uh transformers or generative AI uh models, right? Do not run on that platform yet. So that's why we do not have at the moment running on the attention layers, but convolutional layers, any kind of vector matrix layer, right? That would work very well on our accelerators.

SPEAKER_05 55:02

Cool. Good. Uh here's a question from Alex Miller. Uh got a thumbs up on this one. So for two DL accelerators, uh, network arc architecture is hard coded while weights are fully programmable. Uh, can you mention what type of network architecture is hard-coded?

SPEAKER_01 55:22

Um, actually, no, we we don't have any kind of hard-coded. So you can run uh mean kernels, like mean in like mean if we talk, let's take convolution for example. You can do one cross three, three cross one, uh three cross three, five cross five, seven cross seven. You can do residual networks, right? The idea of having two accelerators that one of them can actually do for these kind of shallow layers, right? If you're doing uh depth for depth uh first convolutions, right? So one of them is actually really, really superior in that compared to the other one, while the other one handles more dense connection much more efficiently. So the two of them have this kind of nice interplay between each other. Um, but then you can basically run any kind of net any network as long as they uh mean fit in the convolution and in any kind of fully connected layers.

SPEAKER_05 56:12

Cool. Uh Sarah, any of these questions here catch your eye for the next one?

SPEAKER_02 56:19

I'm muted, let me see. Oh I like the question about the enablement as well. So if you can comment on that.

SPEAKER_05 56:28

Which one's that?

SPEAKER_02 56:29

Enabling how do you enable building AI applications? Exactly.

SPEAKER_05 56:35

All right.

SPEAKER_01 56:36

Okay, so there are, I mean, there are two approaches for this. Let's say that you are someone who knows how to code your own AI application, right? And you develop your own application using PyTorch or TensorFlow. So all what you need to do is to have this, the the model, right, that you have, and then you would run it into our mapping uh mapping tool. So which basically it's a command line, you give it the latest checkpoint with the weights, right, and the biases, and you click convert, it does the conversion for you, and it will give you a library file and that could that would program the AI accelerators, another file that would load all the AI weights with the bias conversion. So then it will give you an API, right? Like use this API in your main code and your main application. At the end of the day, you got you we run in C, right? If you if you want to run it on PyTorch or you want to run directly on Python, you have to run it on top of the Zephyr operating system, which is still in the same approach, but then we tell you this is the API that you need to use, and you just need to import that library into Python. So that is one if you are someone who has developed his application. But let's say that you, someone who just have data and you don't even know how to code your application, right? This is something that we are uh we have a uh uh let's say a beta version of, but we are working to make it very mature through the what's a concept of called neural architecture search. So we take the data and we build the AI model for you, right? We let the AI search and come up with an AI model for you based on your accuracy metrics and satisfying the constraints of our chip. So we look at say our chip has this amount of memory, so you can only have this amount of parameters, right? It tried to select from different predefined architecture skeletons or scaffolds, then it tried to program or try to train some of them, it comes up with the one that gives you the best uh accuracy. Or you, if you want the entire parent of frontier, it will can spin it out for you and tell you, okay, which one would you like to choose? But this is for someone who's still exploring this a little bit further, and I think we're not the only ones, there are other companies doing this to expand the adoption of AI and edge AI by companies and use cases that do not have this already integrated into the system or enabled yet.

SPEAKER_05 58:52

So using AI to help develop the AI models, yeah, basically. What we have we're about to run out of time, but I had one more question that jumped up here. I'm a big fan of WASM, long story, but uh someone brought up uh do you have a WASM runtime for your hardware?

SPEAKER_01 59:10

Um, I think we haven't done that yet, right? But I mean we uh it's one of the things that we're looking into. Okay, cool.

Events, CTAs, and Closing

SPEAKER_05 59:18

Yes, I'm a big fan of that. Uh cool. All right, great. We are at about time. I would say, you know, call to action. Uh, you know, go to uh Mohammed's website, uh EMAS, and um is it nanoview.com or what's the what's the yeah nanoview.com slash emass. Yeah, there it is. Nanoview.com slash emass. Uh come see uh in Amsterdam next week. Some folks from uh EMass that'll be there, meet Mohammed in person um and Taipei and um follow us on YouTube if you haven't. I think we're like about five users, uh subscribers short of 50,000 right now. So uh for everyone who's uh on the live stream. And subscribed, I think we'd we'd reach our number there. But uh really appreciate the time, Mohammed. Thank you, Sarah, uh, for co-hosting here. Right. And uh yeah, fantastic. Really appreciate it.

SPEAKER_01 1:00:12

Right. Thanks, B, for the opportunity. Yeah, pleasure with you, Sarah. And thanks everyone for uh attending.

SPEAKER_04 1:00:18

Cool. All right, bye-bye.

SPEAKER_01 1:00:20

Bye.