EDGE AI POD
Discover the cutting-edge world of energy-efficient machine learning, edge AI, hardware accelerators, software algorithms, and real-world use cases with this podcast feed from all things in the world's largest EDGE AI community.
These are shows like EDGE AI Talks, EDGE AI Blueprints as well as EDGE AI FOUNDATION event talks on a range of research, product and business topics.
Join us to stay informed and inspired!
EDGE AI POD
Comparative Analysis of NPU Optimized Software Framework
The future of AI isn't just in massive cloud servers—it's already sitting in your pocket. In this eye-opening presentation, Yeon-seok, CEO and co-founder of JTIC AI, reveals how his company is revolutionizing the AI landscape by tapping into the underutilized Mobile Processing Units (MPUs) that have been standard in smartphones since 2017.
While tech giants pour billions into cloud infrastructure, JTIC AI has identified a critical opportunity: leveraging the powerful AI processors already in billions of devices worldwide. This approach delivers not just cost savings, but crucial advantages including offline functionality, enhanced data security, and real-time responsiveness—without depending on internet connectivity.
The technical journey involves three essential components: hardware utilization, model optimization, and runtime software. Yeon-seok breaks down sophisticated model optimization techniques like pruning, quantization, and knowledge distillation that make complex AI models deployable to mobile devices. However, the biggest challenge isn't hardware capability but software fragmentation. Unlike the GPU market dominated by NVIDIA and CUDA, mobile devices operate in a fragmented ecosystem where Apple, Qualcomm, MediaTek, and others maintain incompatible software stacks—creating significant barriers for AI engineers.
JTIC AI's innovative solution is an end-to-end automated pipeline that handles everything from model optimization to device-specific benchmarking. Their system can determine which runtime will deliver optimal performance for specific models on specific devices—something that's impossible to predict without comprehensive testing. With this approach, developers can deploy sophisticated AI across the mobile ecosystem without wrestling with manufacturer-specific implementations.
Ready to unlock the AI capabilities already sitting in your users' pockets? Discover how on-device AI can transform your applications with better privacy, offline functionality, and faster response times—all while reducing your cloud infrastructure costs.
Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org
Good afternoon, ladies and gentlemen. My name is Yeon-seok. I'm the CEO and co-founder of the JTIC AI. So actually there are tons of hardware company in this conference, but we are actually the software company and focusing on utilizing the on-device AI environment. So today I'm going to present to you about the comprehensive analysis of on-device AI software. So we are the JTIC AI and we are leveraging mobile hardware and especially MPU, and we are going to make those kind of powerful on-device AI services. And we are going to make those kind of powerful on-device AI services and we are going to like we are providing those kind of ecosystem for AI engineers.
Speaker 1:So let's think about the background first. So those kind of companies like OpenAI and Google and those kind of like companies are bleeding for the like AI infrastructure. They are mainly the AI infrastructure mainly consists with GPU cloud server and they are essential and expensive, even though it is not avoidable. So on-device AI has grown up to the world by utilizing those kind of hardware that everyone has in the edge, everyone has in their palm and pocket. So it provides not only the cost reduction but it also provides another advantages like offline functionality and those like secured AI and like runtime responsiveness. So to build the on-device AI, there are three key elements are needed. The first thing is the target hardware to replace the GPU computation to the edge devices and second one is optimizing the AI model to make smaller AI model size to be deployable for each of the devices that everyone has. Third thing is the runtime software that runs the AI model at each of the devices that everyone has. Third thing is the runtime software that runs the AI model at each of the target devices. So let's think about the smaller model size first. So, as you might know, there are well-known technologies to try to build those kind of model optimization. First, pruning is like literally pruning means those kind of like reducing the like unimportant parameters from the neural network by reducing those kind of AI parameters to be like smaller model size. Second, quantization is like it is representing those kind of data type of the AI model to be a smaller bit twist. So back in the days, integer quantization was well known. But recently there are several researches like floating point quantization, which represents a 32-bit floating point to the 16-bit or 8-bit or even 4-bit floating point. Third one is knowledge distillation. So recently the knowledge distillation approaches like making those kinds of SLLM which is powerful enough with, even though they are having small size. There are like large-language models to be like smaller size with knowledge distillation and those kind of company deep-seq or like Stanford, like R1 model has those kind of like impressive approaches for the knowledge distillation. And let's think about the target devices after that.
Speaker 1:So there are MPU in the wall, which is AI target processor unit, and it is released to the wall from 2017 and 2018. So from iPhone 8 and Galaxy S10, mpu has released to the world from 2017 and 2018. So from iPhone 8 and Galaxy S10, mpu has released to the world, to the actual users. So now 2025, everyone has their own AI processor in their pocket and palm. So we are trying to utilize those kind of MPU that everyone has.
Speaker 1:But even though everyone already has their MPU, which is AI target processor, everywhere, there is a limitation to utilize those kind of MPU. So, as you might know, those kind of manufacturers are trying to make their own chip, but they are maintaining their own software as well. To utilize those kind of MPU in the global domain, the user has to support those kind of MPUs separately. So there is a CUDA in the GPU cloud server. So if I develop some kind of like NVIDIA GPU utilization software I can deploy to anywhere because everyone uses NVIDIA GPU but there is no dominant player in the mobile area. So if I program something like for the Apple environment, I have to do everything from the bottom to support in the Android-like specific manufacturer's company.
Speaker 1:Plus that you know AI engineers doesn't really care about the AI infrastructure they do things with. They do AI things through the Python environment. So the embedded area like is already like built with the C and C++ and those kind of like manufacturers are providing AI environment with C and C++ and it. Those kind of manufacturers are providing AI environment with C and C++ and it is kind of like big burden for them to study from the bottom. So from this kind of barrier, the AI engineers are having problem to utilize those kind of MPU for their AI models.
Speaker 1:So let's think about the software framework. There's like TensorFlow and PyTorch that everyone knows already. It is kind of like software standard to make AI models but they are running at the GPU cloud. There are mobile versions of the PyTorch and TensorFlow but they are too generic. It is not quite like optimized for the mobile environment. So there are another approach like GDML, which is running those kind of large-language model at the target devices with C++-based source code. But since it is kind of general approach and it is from the reason that mobile manufacturers are not providing those kind of C++ library, they are providing the SDK it is quite limited to utilize MPU with those kind of approaches.
Speaker 1:So to utilize MPU we have to support those kind of SDK and software stack that manufacturer companies are providing Apple providing the QAML and Qualcomm providing those kind of like. Qualcomm AI SDK, mediatek, which charges 40% in the mobile area they provide the NeurPilot which is made by MediaTek themselves. For other manufacturers' companies they have their own SDK but they are not compatible at all. So AI engineers should support and develop those kind of manufacturer software all at once. They cannot make it all at once, they have to develop separately. So to deploy the on-device AI development with those kind of MPU utilization they have to support all different type of environment, all different type of manufacturers, those kind of like. So it is hard for them to utilize those kind of different type of manufacturers and they have to consume the long time to build those to give impact with their AI services.
Speaker 1:So from this kind of environment, pytorch and TensorFlow, those kind of environment. Pytorch and TensorFlow, those kind of traditional AI standards, are trying to make those kind of new software framework for the mobile devices. So there is an execute torch made by PyTorch and LightRT which means runtime from the TensorFlow and it is a modified version of the TensorFlow Lite, a modified version of the TensorFlow Lite and they are targeting to support different types of manufacturers like mobile environments from those kinds of their traditional software. But even though they are making those kinds of unifying software through their traditional software stack, pytorch is supporting the NNDAP module only from their PyTorch structure and TensorFlow Lite supports their own model for the LiteRT. So from this environment user has to set up each of different type of manufacturer's software stack into the execute Torch or LiteRT as well.
Speaker 1:Plus that even though we set the AI runtime software with utilizing the MPU, we cannot guarantee the best performance. So I brought the two examples. The first one is Resonant 101, and the second one is YOLOv7. And the key point is that even though we are utilizing the same device and same MPU with the same AI model architecture, the runtime like software makes the like performance difference. So for the Resonant 1.1 model the TF Lite runs with the lowest latency. But for the YOLOv7 model the QNN, which is from the Qualcomm AI SDK, runs the best performance with the lowest latency. So we cannot guarantee the best performance with the standard software. So it is needed to run the actual benchmark to guarantee the best performance for the AI model at the target devices. So from the AI model we developed those kind of pipeline to run the benchmark and profile every AI model at the actually released physical devices in the world least physical devices in the world. So from those benchmark results we can guarantee that AI model runs best performance at each of the different target devices from each of the different manufacturers.
Speaker 1:So I'd like to say there is end-to-end pipeline for on-device AI is needed. So first thing is like AI modeling. So from AI model we have to optimize those kind of AI model and we have to get the performance profiling not only for the latency but also for the accuracy as well. Plus that, to support those kind of AI model at the deployed target, like software we have to implement the target implementation for the different type of the manufacturer's target. Plus that AI inference is not only existed with the AI inference system. Those kind of feature extraction or those kind of pre-process and post-process are needed to support those kind of AI system. So the actual on-device application implementation is needed as well.
Speaker 1:So let's say that to support the on-device AI, to make those kind of on-device AI application, the end-to-end approach is needed to build those kind of AI software. So we built those kind of automated pipeline to build those kind of like separated manufacturers, like devices and those kind of like SDK all together and we guarantee we build those kind of like build system to enable AI engineers to build end-to-end on-device AI easily within a very short time. So we provide those kind of simple two steps to build the AI model runnable at the target devices. So we get the AI model from our customer and we run the actual benchmark and profiling to the all different type of the target digital devices From the benchmark result. After user get installed those kind of AI application, we detect the user's device and get the best performance model from the benchmark result. This is how we guarantee the best performance AI inference at the user's target devices for on-device AI.
Speaker 1:So with this kind of approach we'd like to build those kind of ecosystem for AI engineers to build the on-device AI easily. So, as you might know, on-device AI brings the network-independent functionality. So we are enabling those kinds of AI services without network infrastructure at underground or in rural areas, even like spacecraft and aircraft. Plus, we can guarantee the real-time AI performance for those kinds of AI services that everyone has as devices. And you know, recently people don't really care about the data for AI services, but we are trying to build those kind of secure AI systems with on-the-biz AI technology. All right, thanks for listening, and this was yeah. Please ask me anything if you have in mind. Thank you.