EDGE AI POD

Smarter AI, Faster Hardware

EDGE AI FOUNDATION

Use Left/Right to seek, Home/End to jump to start or end. Hold shift to jump forward or backward.

0:00 | 11:44

Your phone, watch, and even your fridge want real-time intelligence—but power and latency won’t tolerate bloated models or generic compute. We walk through a practical path from Python to custom hardware using high-level synthesis, then invite you to prove it in our Efficient Inferencing Hackathon. With a ready-to-run RISC‑V Rocket Core baseline for MNIST, a full Siemens EDA toolchain, and on-demand training, you’ll learn how to cut latency and power while protecting accuracy through precision mapping, parallelism, and smarter dataflow.

We start by mapping the compute landscape—CPUs for flexibility, GPUs for throughput, TPUs/NPUs for tensors, and custom FPGA/ASIC designs for peak power-performance-area. From there, we get tactical: use quantization to right-size bit-widths; apply loop pipelining and unrolling to unlock throughput; partition memories and stream between layers to eliminate round-trips; and iterate quickly with HLS directives instead of rewriting RTL. You’ll see how a baseline inference in the millisecond range can be driven far lower with disciplined co-design, and how Catapult HLS, Questa, and PowerPro provide the feedback loop—latency, area, and power—to make confident trade-offs.

Participants receive a virtual machine, C kernels for convolution and dense layers, and a step-by-step path from Keras to synthesizable RTL. The goal is simple and demanding: deliver the fastest MNIST implementation that meets accuracy, area, and energy targets. Along the way, the HLS Academy community offers guidance from experts and peers, and winners will be announced at the Edge AI Foundation event in Taipei, with prizes including a 3D printer, an FPGA board, and Bose earbuds.

Ready to turn models into efficient silicon? Join the workshop series, claim your VM via the QR code at hls.academy, and use the promo code with two underscores to unlock full access. If this resonates, subscribe, share with a teammate who ships edge AI, and leave a review to help others find the show.

Send us Fan Mail

Support the show

Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org

Why Efficient Inferencing Now

SPEAKER_00

Hello, I'm here representing Siemens EDA and Catapult High Level Synthesis, and I'm super excited to talk to you about our efficient inferencing hackathon. So also immediately following this, we are going to be having a presentation workshop about high-level synthesis. So this is going to be a little sneak peek into our presentation as well as the introduction to the hackathon. Talk a little bit about introduction, some background, the challenge itself, details, and the prizes. Now, you can't have a good hackathon without a problem. But what exactly is the problem that we're facing? Devices are getting smarter, and a lot smarter. You have smartphones, smart glasses, smart homes, smart watches, smart speakers, smart fridges, smart toilets, you get the idea. And as we get these more devices and we start implementing these AI inferencing algorithms in them, we can see that there will be more and more different types of embedded devices. As of today, there's about 50 embedded devices for every personal computer. But you may ask, how do we actually go about creating these embedded devices? And that's what I'm trying to be here to show you today. Now, this graph is the computational load in gigaflops for the ImageNet

CPUs, GPUs, TPUs, and Custom Silicon

SPEAKER_00

algorithm. It is an increase by a factor of a hundred in the past five years. Back in 2017, there were 30 billion floating point operations, and today there's been over 3 trillion. And this is only one model. The image ImageNet, which I mentioned. But there's also a variety of others of a lot of different implementations. So not only is there a large variety of models, but these models themselves are getting larger and larger. And you may ask, where exactly do these models run? So everyone is familiar with the building box of a computer, so you can start on a CPU. If your device, if your inferencing algorithm works with an off-the-shelf CPU, you're golden. They're cheap, they're easy to use, they're reprogrammable, but it doesn't work for every application. The performance and the energy may not be there. That's where you start looking at alternatives, one of which is starting to target your GPU for your algorithms. That may work, but that may also not be enough. And you start targeting more and more specific pieces of hardware, one of which would be the TPUs or MPUs. And then lastly, will be these custom pieces of hardware, whether it be

What High-Level Synthesis Enables

SPEAKER_00

an FPGA or an ASIC processor, this is where you get that highest performance and lowest energy. And that is what we focus on at Siemens ZDA with our catapult high-level synthesis tool is being able to bring to you an easier development solution to get those custom pieces of hardware. And what exactly is that solution? That solution is high-level synthesis. So instead of designing hardware by hand using Verilog VHDL, we are able to use a higher extrapolation language, which in this case will be C or System C. Now, instead of detailing every single register, every single operator, every single wire, we have some level of automation that helps handle these details. And not only that, we also allow for design space exploration by letting you try out a variety of implementations for you to pick the best solution. So this lowers your development time and what I like to call intelligent implementations, where you're able to try a variety of different implementations to try to figure out what works the best for you. And there are other features that are available with this, like parallelism or implementation for precision, which is going to be finding your right bits for your operations. Something that I'd like to mention is that Python for AI development tends to work in Flow32. Not every single algorithm actually needs that full bit precision. Some can work on much smaller values. And Catapult allows you to modulate those values in hardware to allow you to run inferencing algorithms with the least needed amount of hardware.

Precision, Quantization, and Parallelism

SPEAKER_00

And that's exactly where we bring us to our hackathon, where we are trying to help you design your hardware using high-level synthesis and show you all the different ways that you can optimize. You can start in software by making sure that your architecture of your model, your layers and channels are at the correct level for what you need for your accuracy. We can bring to your that quantization, which is that precision for the number of bits for your algorithms, which lets you try a variety of different ones both while training the neural net and also in the hardware itself. You can get parallelization to try a variety of different reuse factors to measure your throughput and your latency. And all of this can be done in our tool to compare this hardware and software. And it's a much easier device to be able to do than handwritten RTL. I'm not sure how many of you in this room has worked with handwritten RTL before. One, two, three, four. Oh, perfect. We got like a handful here. That's great. So you guys know that if someone were to come up to you and say, hey, I have this design uh for you, and you give them a lead time about two years, or not even that, you give them a lead time of a year or the six months. And then like a month or two before you're done, someone comes up and they're like, hey, actually, I think I want to change the precision on this. Do you mind going back through your entire design and changing all of this? You would probably throw it in their face, right? You'd be like, I don't want to work here anymore. So

Hackathon Challenge and Tooling

SPEAKER_00

that is what our tool is able to do is bring that development time down from that year or six months, down to a few weeks, down to a month, and it lets you try the variety. So instead of making that one design in that six months, you're able to make 50 or 100 and determine what makes the most sense for you. Now, introducing the hackathon itself. So everyone in this room and everyone who attends our uh presentation today is going to get a head start and ability to join our hackathon. We're going to be using the MNIST algorithm, which I'm going to show on the next slide for those who don't know, and we are going to be judging you off your power, performance, area, and accuracy. We are also going to be providing a full on-demand training that will take you start to finish on how to use the tool and how to give you some tips on how to make the most efficient solution. Now, for those who are not aware, the MNIST algorithm is a very common uh AI inferencing algorithm, very much the hello world of the AI world. So we recognize handwritten digits from 0 to 9, and we start with a very small neural net, and it's able to be retrained very well, well characterized, and we are going to be using the base data set that's in the TensorFlow library for both training and accuracy validation. Now, your mission, if you choose to accept it, will be to combine hardware and software to create the fastest running implementation that fits accuracy, area, and energy efficient requirements. You will be provided a bare metal software app with MNIST already on that's running on a rocket core with over one millisecond current latency. Now that can be optimized down much, much farther, down to around 10 milliseconds or even faster. And this is something that we want you to try is to bring a solution that's already built for you and optimize it using our tool. So

Timeline, Academy, and Prizes

SPEAKER_00

after signing up with the virtual machine, you'll be able to you'll be provided the CNN for Keras for MNIST, the RISC-V Rocket Core design, the entire suite of Siemens EDA tools, catapult high-level synthesis, quest of logic simulation, power pro power analysis, and all the variety of other products needed to perform these operations. You'll also be provided a C algorithms for the convolutional and dense layer, and you'll be provided the entire thing coded in high-level synthesis. Now, for this process of optimization, we want you to optimize your neural net, quantize your numbers, introduce parallelism, and optimize the data flow. All of these are going to help you bring the most efficient solution to our project and hopefully other projects that you work on. Like I mentioned, we're gonna step you through the bases and we'll take you from Python, Keras, TensorFlow, C to Verilog. We are also going to allow you to have full access to all of the metrics that are available in the tool, and you'll be able to compile your hardware, your customized hardware, and see a variety of different uh critiques about it. And we'll make these steps easy for you. Like I mentioned, we're gonna be providing an entire intellectual suite for you to understand from start to finish. Now, the hackathon is gonna run starting today to October 31st. We're gonna be providing a full featured virtual machine, all the tools, licenses, and IP needed. Your virtual machine will be eligible for 30 days. So the entire hackathon runs for about 30 uh three months, but you will get 30 days of the actual tools for you to determine what works. And this will all be running through our brand new academy, which I'm excited to announce, which is hls.academy. This is an online uh forum that allows you to learn about high-level synthesis and talk to other people in industry. You can talk to people like myself and other experts in the field, and we are looking to expand this widely and you'll be able to talk to other people working on the hackathon about this project. The winners are going to be announced in November at the Edge AI Foundation event in Taipei. Now, what exactly do you win? That's why we're all here, right? So, first place we'll get the LEGU Neptune 4 Max 3D printer, second place we'll get an FPGA development board, third place we'll get Bose Noise Canceling Earbuds, and everyone who participates will get a LinkedIn badge that is provided through the Siemens EDA training academy. Now, here

Sign-Up Details and Workshops

SPEAKER_00

is a QR code to sign up now. This will link you to our uh HLS Academy where you can create an account and you'll be able to sign up for on-demand training. Online, there's going to be a uh promo code that's in the instructions, just saying it out loud here. It is two underscores, not one underscore. So you can get the full free discount to make sure that you can uh participate. This is the same QR code, don't worry. So I would like you to join us for the workshop that we'll be having over the next three days. It will be the same workshop for the three days, so if you can't make it today or tomorrow, you can make it on Friday and all of the above. Once again, I'd like to thank you for listening to me talk today, and I hope to see you uh for my presentation, and also I'll be available at the demo tube of Zelda.