Honey, I Shrunk the LLMs: Edge-Deployed AI Agents Artwork

EDGE AI POD

Discover the cutting-edge world of energy-efficient machine learning, edge AI, hardware accelerators, software algorithms, and real-world use cases with this podcast feed from all things in the world's largest EDGE AI community.

These are shows like EDGE AI Talks, EDGE AI Blueprints as well as EDGE AI FOUNDATION event talks on a range of research, product and business topics.

Join us to stay informed and inspired!

All Episodes

EDGE AI POD

Honey, I Shrunk the LLMs: Edge-Deployed AI Agents

January 27, 2026 • EDGE AI FOUNDATION

0:00 | 42:00

The landscape of artificial intelligence is experiencing a profound transformation, with AI capabilities moving from distant cloud servers directly to edge devices where your data lives. This pivotal shift isn't just about running small models locally—it represents a fundamental reimagining of how we interact with AI systems.

In this fascinating exploration, Dell Technologies' Aruna Kolluru takes us deep into the world of edge-deployed AI agents that can perceive their surroundings, generate language, plan actions, remember context, and use tools—all without requiring cloud connectivity. These aren't simple classification systems but fully autonomous digital partners capable of making complex decisions where your data is generated.

Discover how miniaturized foundation models like Mistral and TinyLlama, combined with agentic frameworks and edge-native runtimes, have made this revolution possible. Through compelling real-world examples, Aruna demonstrates how these systems are transforming industries today: autonomous factory agents detecting defects and triggering interventions, rural healthcare assistants providing offline medical guidance, disaster response drones generating situational awareness, and personalized retail advisors creating real-time offers for shoppers.

The technical journey doesn't stop at deployment. We examine the sophisticated optimization techniques making these models edge-friendly, the memory systems enabling contextual awareness, and the planning frameworks orchestrating multi-step workflows. Importantly, we tackle the critical governance considerations for these autonomous systems, including encrypted storage, tool access control, and comprehensive audit logging.

Whether you're a developer looking to build edge AI solutions, an enterprise decision-maker exploring AI deployment options, or simply curious about where AI is headed, this episode offers invaluable insights into a technology that's bringing intelligence directly to where it's needed most. Subscribe to our podcast and join the conversation about the future of AI at the edge!

Send a text

Support the show

Learn more about the EDGE AI FOUNDATION - edgeaifoundation.org

Speaker 1: 0:05

Our next speaker, that would be, I would say, deep into the technical heart of Edge AI, and it's my pleasure to introduce Aruna Koulourou, from Dell Technologies, and her session is entitled Edge Deployed AI Agents Architecture Optimization and Security. Aruna, architecture optimization and security. Arona, the stage is yours.

Speaker 2: 0:28

Thank you so much. Let me share my screen. Can you see my screen? Yeah, hello everyone. I'm a workload AI specialist from Dell. It's funny that Evgeny mentioned about the Dell AI laptops. I do work for Dell and, though I don't work in that division, I actually work from the AI specialist division, focusing more on the enterprise deployments.

Speaker 2: 1:15

But AI agents have been very popular these days. Let's actually talk a little bit about them. I actually feel we are at a very pivotal moment in the evolution of this AI. For years, ai has lived on the cloud, in the data centers, where powerful models ran behind the scenes, but today the whole paradigm is shifting. Ai is coming closer to where the data is generated, on the devices in our environments, even in disconnected or low connected zones. What we are talking about today is not just AI at the edge, but how these generative AI agents at the edge. That's a very subtle but a very powerful distinction. These agents don't just classify or recognize something. They generate language, summaries, decisions and even plans of action, the reason. They recall from past interactions, they choose tools to accomplish goals and they do it on-device, privately, securely and often autonomously.

Speaker 2: 2:20

This talk, let's actually focus a little bit more beyond the model compression. Let's walk through some of the stack, as well as a couple of use cases as well and why we are here. So Benedict Evans is a well-known technology analyst and a thought leader. He spent years analyzing trends in tech, especially around platforms, software and AI. In tech, especially around platforms, software and AI. He's known for his sharp insights and widely followed presentations and newsletters that really explore how technology reshapes industries and societies. His quote, actually, I thought, really captures that fundamental shift on how we interact with AI.

Speaker 2: 3:02

Old AI we called a model to do a task, like asking a calculator for an answer or something like that. With the new AI, you collaborate with an agent, like working with a team member, who understands the context, adapts and helps you reach those goals. It's not just about using AI anymore. It's about working with AI as a partner that can plan, reason and act alongside you. So let's quickly see what an AI agent is right. Just a very quick introduction. Like AI, agent is a system that can perceive its environment, make decisions and take actions to achieve a specific goal. I know there are other considerations out there about the ethical considerations and everything, but in essence, that's what an AI agent is. Agentic AI refers to AI systems that exhibit agency, which means that they can autonomously pursue goals, plan, adapt and make decisions over time, often in complex or dynamic environments. So agents are separate. When we talk about agentic AI, it's about bringing all these AI agents together to work on something autonomously. Ai agents together to work on something autonomously.

Speaker 2: 4:25

So we used to think of Edge as a place to run small, pre-trained models. These were typically lightweight classifiers or detectors, good for object recognition, sensor filtering, maybe the basic NLP, the outputs for deterministic and narrow like classify, detect, match. But what's happening now is like this massive leap we are moving from inference to intelligence at the edge. So in some of the demos we've actually seen some of the generative AI at the edge. Here what we are talking about is generative AI.

Speaker 2: 5:02

Agents are a new class of AI systems that don't just answer, they explain, they don't just recognize, they actually generate and adapt and instead of just inferring, they actually plan. So let me actually talk about an example here. Imagine a drone flying over a wildfire zone. It doesn't capture footage, it generates captions about terrain changes, summarizes dangers and plans the next survey route all on the device, without ever needing the cloud. Or you can even think about a retail assistant running in store. It watches shelves via vision model, detects what's empty and generates a personalized promo message for shoppers walking by. All this is happening offline, with edge inferencing, retrieval and speech synthesis. So how is it different, right? So this is different because it's actually making those decisions at the edge itself. Now we've seen what agents are.

Speaker 2: 6:08

I think there are three major enablers for this shift. One is the miniature model foundation models. We are seeing emergence of high quality models that can be quantized down to run on smaller devices. Models like even Mistral, tiny Llama, gemma 2 billion can be compressed to 4-bit or even like 3-bit optimized versions. They run comfortably in like 4 to 8 gig of VRAM. This kind of specs you'll find on maybe Jetson, orion, raspberry Pi with swap or like even laptops right? I've mentioned about the Qualcomm and Dell laptops, and the definition of edge is also slightly changing and the capacity at the edge is also changing and in some of the laptops the kind of specs we have is quite fascinating. Right now I have another laptop which actually has like 16 gigs of GPU in it and, if possible, I'll show a demo or I'll actually show some of the screenshots I've got from the demo, I'm actually running like multiple of these models simultaneously as AI agents at the edge, and these AI agents actually also push that move towards more domain-specific models. And we say domain-specific models, we want those models to be very small, trained on specific data, so that also drives smaller models to be run on that.

Speaker 2: 7:40

And then the second part of it is the agentic frameworks. Having the model is not enough, right? And then the second part of it is the agentic frameworks. Having the model is not enough, right, you need to give it a structure. Frameworks like Landgraf lets you define the data-based planning flows where the agent makes decisions and passes context from step to step. And then we have Autogen that allows agents to use tools, memory and history in conversational loops. And then Crew AI organizes agents into role-based teams like a planner, a verifier, an executor, really enabling that coordinated workflow across different agents. These frameworks are the glue that turns a single model into an agent, intelligent worker or like a swarm of them working together.

Speaker 2: 8:29

Then the last one is the edge native runtimes and toolchains. Deployment is now easier than ever with projects like O-Lama. It's like a CLI and server interface or a runtime for running quantized LLMs locally with REST endpoints. And we have Lamacpp, which is the optimized C++ runtime with GGUF model support, token streaming and flash retention. And then you have LM Studio, a desktop app to run and chat with the local models, and WebLLM and WebGPU run LLM models right into your browser, even on a phone. These tool chains make it possible to deploy agents on mobile devices, embedded systems or even into your browser tab completely offline as well. So we have this agents in sync right.

Speaker 2: 9:25

So once we have all these multiple models, the agent communication gets very, very important, and the newest protocols which have been released in the last few months model context protocol allows for the AI agent to talk to all the environment and the tools around it. For example, if an AI agent which we are talking about, the planning agent to talk to all the environment and the tools around it. For example, if an AI agent which we're talking about, the planning agent wants to look at a disaster from a drone and it want to talk to a weather app to look at what's the weather next, to make the decision of what are the next steps, it could do that, or it has to send this message to so many people it can actually interact with other tools to do that. Mcp allows a standard way of doing that. Before mcp, um, there everything was like everyone had their own way to uh interact. Um, most most often a json was used as an output and input to other agents, but with this one you have a standard way to interact with things. And then A2A is like an agent-to-agent protocol from Google which enables, again, a standard way for agents to talk to each other in a standard way. And you might have seen, lots of enterprises are actually releasing their MCP support for their tools so these agents can talk to them, and this is a very, very high level example of how multiple agents are interacting with MCP the diagram there.

Speaker 2: 10:56

And then we have generative agent. Now that we have actually spoken a little bit about the shift. What we're talking about here is not about a single model running on the device. A true agent is more like a modular, intelligent system capable of perceiving, thinking, remembering, even deciding, and acting, all in a constrained environment. Let's break down into the five essential traits for an agent right Perception this is where the agent actually connects to the real world. It might be from the microphone, the voice camera, video or even a sensor like temperature or GPS this actually is your maybe like an audio transcription or image analysis, motion detection etc. That can be done on the device. And this local perception is what makes age agents situationally aware.

Speaker 2: 11:53

And then we have the generation. This is the agent's expensive or the expressive cognitive power. This can generate natural language response. It can generate task summaries, decisions. This is what makes generative and not just reactive. And the third one is the planning agent. This is the intelligent loop. Instead of just operating one output, the agent might need to decide a sequence of actions, frameworks like Landgraf or Autoshock. This is where reasoning happens deciding what to do next, not what to say.

Speaker 2: 12:54

Then we have the memory. Edge agent isn't very useful if it forgets everything between the sessions. Agents use a local elected databases like face. Face is actually a light version, is not available, but if you don't compile a GPU version, it actually is a pretty light to store relevant context. This lets them remember previous user interactions, learn from the usage and retrieve past data, even offline. This memory actually makes them contextually aware and personalized. And then we have the tool use, perhaps the most agent-like behavior. Agents can call APIs, run scripts, query databases or control hardware, like maybe turning on the lights or activating a drone. This is where frameworks like Autogen Shine, combining generation with tool invocation logic. This makes them actionable.

Speaker 2: 13:52

Not just smart assistants, but autonomous workers. Together, these five pillars from the complete agent architecture, they actually make that complete architecture. This one is a few examples. These are not prototypes or far future concepts. These are deployable AI agents running on edge hardware. Today let's explore a couple of these cases Autonom autonomous factory agents in industrial settings like manufacturing plants.

Speaker 2: 14:29

A vision transformer model can detect irregularities like misalignment or broken parts. A local LLM maybe a Mistral 7B generates a defect report in natural language. If needed, it triggers a robotic arm or files an alert for inspection. This whole inspection process is automated. It used to happen and this can happen without any calls onto the cloud or to a data center. Agents are embedded in line, continuously learning from.

Speaker 2: 15:06

Failure. Patterns stored maybe in light databases like maybe SQLite, are queried by vector databases. And then rural health assistants. Imagine like a health worker in a remote village. They speak into a mic Patient has mild fever and a cough. Last night gave maybe like 400 milligrams of paracetamol at maybe 10 pm or something. The agent uses whisper for local speech to text. Then quantized models like FEE summarize this into a structured clinical notes. A local vector databases allows the agents to recommend protocols or verify previous symptoms, and the response is read aloud to the clinician using a text-to-speech, fully offline.

Speaker 2: 15:51

This setup brings contextual autonomous support to places where there's no connectivity, without compromising any privacy as well, and in disaster response drone, in emergency zones, a drone scans a flooded area, feeding images to some of the image or object detection, like maybe YOLO or any object detection tools, and then we can also use SAM to segment if required, and then that data or you can also send it to something. What I've done in the demo is used Lava, which is a multimodal model, to really identify the scenario itself, not just the objects in the image. This agent generates captions, like maybe floodwater has breached and cars stranded or something like that, and then it uses a local LLM to plan the path for human responders or other drones. The agent can now log this to local memory, generate a visual map, overlay or broadcast a summary over maybe lower or ultra low bandwidth scenarios. Again, here there's no delay, just actionable intelligence on the device itself.

Speaker 2: 17:11

The last one is the personalized retail advisor. I spoke about this. You can use a camera to track the stock values, like how much stock is there on the shelves? Ai agent identifies what's missing and creates real time offers or stock requests. The system can interact with shoppers using audio. Would you like a discount or organic oaths, maybe? These agents are powered by local Olamar runtime, imagine encoders and Whisper-based audio feedback. It's a personalized, private and adaptive experience powered by Gen AI at the edge.

Speaker 2: 17:58

So these examples span different industries, but they share the same principles. I wanted to pick a completely diverse examples, but the underlying it's the same. It's a local inference, contextual generation, it's autonomy and privacy. They'll demonstrate the incredible versatility of GenAI agents in different areas. Then in agentic AI applications areas.

Speaker 2: 18:27

And then in agentic AI applications, data traverses through model context protocol. It facilitates structured interaction between a language model and its evolving context, whether it's the memory tools or the environment. And then we have the A2A protocol enables agents to communicate, delegate subtasks or coordinate workflows. And then we have the A2A protocol enables agents to communicate, delegate subtasks or coordinate workflows. And we have this shared infrastructure, which includes external APIs, tools, vector databases, file systems or even in buses. And then I did not represent here, but I was actually thinking wherever you see those agents right Scout agent, planner agent and the communicator agent. Each one should operate within its own isolated security zone, enforcing the principle of least privilege. This segmentation actually ensures that if one agent is compromised, others remain unaffected. Inside each agent we actually see the model context protocol. This governs how the agent language model interacts with its evolving context, like previous tasks which it has done, or the memory or tools it accessed.

Speaker 2: 19:43

Maintaining the confidentiality and integrity of this context is crucial because it often contains sensitive information, for example customer profiles, intellectual property or a workflow state. And then we have the A2A protocol Between agents. We have the agent-to-agent handoff. This is where agents communicate, delegate tasks or share partial results. These handoffs must be tightly controlled and auditable to avoid data leaks and privilege escalation.

Speaker 2: 20:20

This layer is a potential vulnerability. If a summarized agent or, like one agent, the scout agent, receives raw, undetected data from the researcher agent, it might unintentionally expose sensitive information downstream. And then we have shared infrastructure and the risk surfaces. At the bottom is a shared infrastructure that all the agents may access. This includes the shared memory, where all the agents store and retrieve intermediate, state or long-term memory, and external APIs like web search, databases or internal services, and then tools like file processor, translations or calculators, and vector databases or file storages which hold embedding spaces or contextual data. These components form a risk surface why? Because they often cross boundaries. For example, an API might log queries, a tool might behave unexpectedly or a vector DDB might be queried by unauthorized agents. All these actually provide those barriers.

Speaker 2: 21:34

So far we talked about what agents do and how they're structured. Now let's talk about how we get these large language models small and fast enough to run at the age. So model optimization has advanced dramatically over the past 18 months. We know about quantization. That's the most impactful trick. It reduces the bit of the model rates we all spoke about that. I'm not going deeper into that. But tools like GPTQ, awq and XL2 helps you quantize with minimal loss of performance. This reduces memory by four to eight times and improves inference speed significantly a must for each device. And quantization compresses.

Speaker 2: 22:21

But distillation actually learns. A smaller model is trained to mimic the outputs of a larger model. Phi is a great example of that. It only has 1.3 billion parameters but punches about sweet because it's distilled from GPT 3.5-like data. This technique helps you retain the knowledge while fitting the edge constraints. This technique helps you retain the knowledge while fitting the edge constraints. And then we have a LoRa or QLoRa, which are like fantastic for making edge friendly fine tuning practical. Instead of updating all model parameters, you train and store only low rank weight deltas. These are tiny, usually under 100 MB, and can be merged into a base model at inference time. So, yes, you can actually do on-device fine-tuning which enables personalized or domain-specific agents without retraining them from scratch and then run. So there's some runtime tricks, right Like the final layer of optimization. Use GGUF model format, the standard in LAMAcpp, for optimized disk and memory handling, and enable flash attention too, to speed up the context window operations. Use fused operators and quantaware token streaming for low latency interaction. With these optimizations you can actually serve a 4-bit model at 10 plus tokens per second on something like Jetson.

Speaker 2: 23:59

So now let's explore one of the most powerful capabilities of the agents multimodal reasoning and tool. Using Multimodal means, the agent understands more than the text. It can actually take in images via encoders like SAM or MobileNet, or Variance and audio using the Whisper, cpp or Teams for speech to text. It can also read sensor values. Multimodal inputs make agents aware of the real world, not just what users say, but what's happening visually, physically, around them, and then generative output.

Speaker 2: 24:39

These agents aren't just interpreting data, they're gathering decisions right, creating captions for images tree fallen on the North Pole right. Creating captions for images tree fallen on the North Road right. Or summarizing events or commands, saying that flood detected in whatever sector, rerouting this particular drone, or responding verbally or visually via text to speech or UI. And they do all this on-device. Then we have the tool using behavior. This is what separate agents from assistants. Agents don't respond, they act Once they process the context and reason over it. They can actually call APIs locally or on a LAN, trigger scripts or shell commands or actually control hardware, like maybe actuate a motor, toggle a sensor, send a LoRa signal. This turns the agent from passive. The demo itself. If we have time, let's go there Now. We actually you.

Speaker 3: 27:19

Thank you, I don't know if we can't hear you anymore. I think you have some network issues, because you always go like silent and now you were off, but we can't hear you anymore. Think you have some network issues because you always go like silent and now you were off, but we can't hear you anymore. No, we don't hear anything. Yeah, sorry everyone. As Evgeny mentioned, he just chose that we are live and that just happened. So let me know. Okay, I'll do that. I'll bring you on the other, okay.

Speaker 2: 28:11

Can you hear me now?

Speaker 3: 28:12

Yes, yes, we can. Okay, we'll go with that, that's okay.

Speaker 2: 28:23

We've talked about the input, perception and output, generation and planning. Now we reach a critical capability that makes age agents truly intelligent over time, and that's memory. When we say memory, we don't just mean saving blocks. We are talking about contextual memory that lets agents adapt to previous interactions, personalization that reflects the user environment or situation, and retrieval-based reasoning that enriches generation using stored knowledge. And let's unpack this right like local vector databases To store memory. Edge agents use vector databases that are on-device. Like I said, face can be a good example. Device. Like I said, face can be a good example, for example, not a GPU-based version. And then, for example, you can say summarize previous visits by patient diary, so and so. The agent queries its FACE index and returns, records, notes or even like audio snippets. This is how agents build contextual intelligence. And then the retrieval augmented generation. We pair vector memory with a local LLM using RAG, which stands for retrieval, augmented generation. Tools like LAMA index and GPT cache make it possible to pull relevant memory chunks, feed them into a prompt, generate a personalized, up-to-date response. This bridges the gap between static LLM knowledge and the dynamic user-specific context. And then we have the personalization with memory and drag and place. Agents can remember a username, history and preferences, maintain tone or style consistency, track repeated questions or issues. This enables use cases like a retail agent that recalls a shopper's previous interests, and some of the light embedding models like Mini LM E5 Lite. These are small, fast, highly accurate, perfect for edge inference. All of them can run using sentence transformers or even in Lamacpp style environments with slight tuning. In short, this memory layer is what separates a generic chatbot from a true agent, chatbot from a true agent. And then we want to really focus on these agents, not just generating, but they need to plan sequence and act across multiple stages. So let's explore the three agent planning frameworks that are incredibly relevant here.

Speaker 2: 31:05

Langraph again, it's a DAG based planning. It lets you build agent flows like graph, where each node represents a step, a memory retrieval planning tool called validation. You define the edges that describe how state flows from one node to the next. You can loop, retry, verify or even branch based on outcomes. This is ideal for multi-step tasks on agents. An agent might first retrieve prior context from a vector database, then plan a response, then follow a tool for the API, then verify the result, because Langraph is local executable if it's edge workflows perfectly and similarly Autogen. We spoke about Autogen and then Crew AI. It actually provides role-based multi-agent teams. Crew AI takes it further by letting you build a team of agents, each with its own speciality A navigator that plans the sequence, a verifier that checks the constraints, a responder that interacts with the user. This is powerful for multi-agent orchestration on edge, especially when agents need to handle different modalities or responsibilities. You can use this even on Jetson device by assigning roles to smaller models or tasks. This even on Jetson device, by assigning roles to smaller models or tasks. We have seen how powerful and autonomous these edge agents can become. But this autonomy comes with the responsibility, especially when agents are making decisions locally, generating content and interacting with the real world. Now let's talk about the governance auditing control for edge-deployed agents.

Speaker 2: 32:47

Encrypted local stores for all local memory, especially user data or logs, must be stored securely. Use encrypted file systems or abstractions like OpenDAL, open Data Access Layer. This helps agents store memory, retrieve data or action logs in a way that are actually tamper resistant and private. Edge does not mean insecure. In fact, local encryption is often more trustworthy than sending data on the network and then dual call, whitelisting and scoping, since agents can call tools like scripts, apis, hardware triggers. We need strict control. Define scope for what they are allowed to do. Use whitelisting to restrict which tools are callable, under which conditions. For example, a nurse assistant can generate reports but not modify medications. You want functional autonomy, not free for all behavior. And then we have the audit log that every agent decision tool, call and response should be logged, not just for debugging but for accountability and compliance. These logs should be structured, timestamped and stored locally and optionally, if they can be synced to your secure hardware. This enables auditability in regulated industries like healthcare, defense and finance.

Speaker 2: 34:12

And then the last one is the hardware-based test. More-aged devices often come with hardware-based, hardware-backed security or hardware-based security. That's the trusted platform module and arm trust zone, or encrypted SD cards or secure boot environments. Use these to verify that the agent runtime hasn't been tampered with, that the models are signed, that memory is not leaked during execution. This is especially important for mission critical or safety sensitive agents. And then the emerging protocols. We spoke about MCP and A2A on a standard way and a secure way to communicate, and there's also Cisco A&P, which is the AI network protocol for secure agent to agent communications, and IBM's work on FHE, that fully homomorphic encryption for private model inference, even on edge hardware. These are building the secure fabric needed for decentralized AI systems.

Speaker 2: 35:15

And then let's take a step back now and look at the patterns right, the common blueprints of how edge agents are deployed in the real world. These are repeatable designs that show up by any means. These are not comprehensive. These are someable designs that show up by any means. These are not, like, comprehensive. These are some of the patterns.

Speaker 2: 35:31

On-demand agents these agents are user triggered. Think of a retail kiosk or an airport translator or a field assistant. Right, they wake up and the user interacts maybe a button press, voice command or even a gesture. They load models, respond intelligently and shut down or idle when needed. These are energy efficient and suited for public facing or battery limited environments. And then we have the event driven loops.

Speaker 2: 35:59

Here the agent is always listening to a sensor Maybe it's monitoring temperature, maybe it's morning temperature, motion or light or audio and a threshold is crossed, like a spike in the noise or an object detected. The agent is triggered. It then runs a pipeline like analyze, detect and act. You see this in smart factories, security cameras, environmental sensors or something. And then we have tool enhanced reasoning.

Speaker 2: 36:31

Some agents are not just responders, they are planners with capabilities. They use local tools like, maybe, a calculator, gps, sql function, or like a file system or an external hardware. They reason over context and decide when to use a tool, like uploading a report, opening a gate or sending a local message. This is like true autonomy, where the agent thinks through actions and executes these local resources. And then the swarm coordination.

Speaker 2: 37:02

This pattern is emerging. Think of multiple drones, robots or devices that each run their own local agent. They share sensor data, learn from each other or coordinate tasks all within themselves. You will see this in agriculture or wildlife monitoring or disaster response, or even in military field intelligence. Right, right, swarm agents act like distributed teams adapting to conditions, collaborating and improving what they're doing. These patterns are not just theoretical. They're being deployed right now and they help us design systems that are resilient to failure, fast to respond and flexible to deploy anywhere. Um, how are we doing with the time like? Is there time to switch for a live demo, or should I actually to respond and flexible to deploy in there? How are we doing with the time Like, is there time to switch for a live demo, or should I actually go with the slides showing screenshots?

Speaker 1: 37:56

of Denver. You actually exceeded the time by 10 minutes, yeah.

Speaker 2: 38:03

Okay, I'll quickly one minute. I'll take one minute. This is actually three different agents trying to take the input and what these agents are doing. I had, like multiple agents here you can see for the scout agent, I have a model and a YOLO model and for the planner agent, I had a fee and a communicator agent, I have a Mr model and when I actually upload an image it actually generates a report like this this is the image I have uploaded and it each agent actually sends the information to the next agent. But you can see it identifies the whole image, what's there. This is the output from the YOLO and then it identifies the whole scene from the lava and it actually plans on what has to be done from the output of the scout agent and the last communicated agent generates a report to publish it to everyone, communicate of what is happening. Sorry, I don't have time.

Speaker 1: 39:11

Thank you. Thank you, aruna, for the very informative talk. I mean it was really interesting this deep dive into the architecture and optimization of edge-deployed AI agents and also the demos, which are really interesting. For the sake of time, I'm going to pick up again two questions from the audience. One question from George Williams. Of time, I'm going to pick up again two questions from the audience. One question from George Williams. Since agents are based on large or small language models, are they also vulnerable to hallucinations and are there ways to quantify this in agentic deployments and what are the best practices to reduce hallucinations at the edge? Esp in resource-constrained hardware.

Speaker 2: 39:46

ESP in resource-constrained hardware? Yeah, so irrespective of whether it's Edge or in the data center, agents are always domain-specific, right. So we are not going on a very broad range of questions. When you're answering them, asking them, you actually have a set of data which you train them for a very domain-specific purpose. For example, when I used just the YOLO model for the demo, it was just giving me a random thing saying that it's a boating experience even when there's a disaster. That's the reason I'm actually using that in combination with a LARVA model which has the holistic context. So the domain specific models trained on specific data is the one way to reduce the hallucinations, to give the data which it is responding to. So I assume there'll be less hallucinations by having domain specific models responding just to those questions.

Speaker 1: 40:40

The last question is by Vikash Kodati and regarding the vision to action slide like flood detection and rerouting are they already being offered as a service from Dell?

Speaker 2: 40:55

It's not a service from Dell. Dell actually offers multiple endpoints, like the H-devices, ruggedized hardware to deploy in the disaster-like situations or anywhere, maybe on a truck or something. We have had many use cases where these things are actually deployed on the truck right. So for disaster, for defense, for agriculture all these were deployed on the truck in ruggedized environments. And also the latest version of the laptops have quite a large GPU memory and all the models I was showing on that demo were running on one laptop, so and they're not like a very small optimized model. So I was running with Mistral and YOLO and Lava and Fee all four of them on the same laptop at the same time. So they're quite capable. The models are becoming more powerful and the devices are becoming larger, so