<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Ensigncode Technical Blog</title><description>Practical CUDA, GPU optimization, inference, and AI engineering insights from the Ensigncode engineering team.</description><link>https://ensigncode.com/</link><language>en-us</language><copyright>Copyright 2026 Ensigncode</copyright><item><title>What Is CUDA and Why Should You Care? A Plain-English Primer</title><link>https://ensigncode.com/blog/what-is-cuda-plain-english-primer/</link><guid isPermaLink="true">https://ensigncode.com/blog/what-is-cuda-plain-english-primer/</guid><description>A jargon-free explanation of what CUDA is and why parallel GPU computing matters for modern workloads.</description><pubDate>Tue, 20 Jan 2026 00:00:00 GMT</pubDate><category>CUDA Basics</category><category>CUDA</category><category>Basics</category><category>GPU</category><author>info@ensigncode.com (Ensigncode Engineering)</author></item><item><title>Why Your AI Model Is Wasting GPU Memory (And How to Fix It)</title><link>https://ensigncode.com/blog/why-your-ai-model-is-wasting-gpu-memory/</link><guid isPermaLink="true">https://ensigncode.com/blog/why-your-ai-model-is-wasting-gpu-memory/</guid><description>The usual culprits behind wasted GPU memory in AI serving and the fixes that reclaim it.</description><pubDate>Wed, 04 Jun 2025 00:00:00 GMT</pubDate><category>GPU Optimisation</category><category>GPU</category><category>Memory</category><category>Optimization</category><category>LLM</category><author>info@ensigncode.com (Ensigncode Engineering)</author></item><item><title>Stop AI Overthinking: Controlling Inference Compute at Runtime</title><link>https://ensigncode.com/blog/stop-ai-overthinking-controlling-inference-compute/</link><guid isPermaLink="true">https://ensigncode.com/blog/stop-ai-overthinking-controlling-inference-compute/</guid><description>How to stop reasoning models from wasting compute on easy inputs using token budgets and adaptive reasoning.</description><pubDate>Wed, 14 May 2025 00:00:00 GMT</pubDate><category>Inference Optimisation</category><category>Inference</category><category>Cost</category><category>Reasoning</category><author>info@ensigncode.com (Ensigncode Engineering)</author></item><item><title>Real-Time AI Thinking: Changing Model Behaviour Mid-Inference</title><link>https://ensigncode.com/blog/real-time-ai-thinking-changing-model-behaviour-mid-inference/</link><guid isPermaLink="true">https://ensigncode.com/blog/real-time-ai-thinking-changing-model-behaviour-mid-inference/</guid><description>Techniques to steer an LLM while it generates, from logit control to dynamic stopping, for adaptive real-time output.</description><pubDate>Tue, 08 Apr 2025 00:00:00 GMT</pubDate><category>Advanced Inference</category><category>Inference</category><category>LLM</category><category>Control</category><author>info@ensigncode.com (Ensigncode Engineering)</author></item><item><title>GPU Rendering Optimisation: The Engineering Playbook</title><link>https://ensigncode.com/blog/gpu-rendering-optimisation-engineering-playbook/</link><guid isPermaLink="true">https://ensigncode.com/blog/gpu-rendering-optimisation-engineering-playbook/</guid><description>Practical techniques to cut frame time in GPU rendering: batching, culling, and pass-level profiling.</description><pubDate>Wed, 19 Mar 2025 00:00:00 GMT</pubDate><category>GPU Rendering</category><category>Rendering</category><category>GPU</category><category>Performance</category><author>info@ensigncode.com (Ensigncode Engineering)</author></item><item><title>Spend X, Save Y: The GPU Investment ROI for ML Teams</title><link>https://ensigncode.com/blog/gpu-investment-roi-calculator-ml-teams/</link><guid isPermaLink="true">https://ensigncode.com/blog/gpu-investment-roi-calculator-ml-teams/</guid><description>A framework for deciding when spending on GPU optimization pays for itself against ongoing infrastructure costs.</description><pubDate>Thu, 27 Feb 2025 00:00:00 GMT</pubDate><category>Business Strategy</category><category>Business</category><category>GPU</category><category>ROI</category><category>Cost</category><author>info@ensigncode.com (Ensigncode Engineering)</author></item><item><title>CUDA Memory Management: From Basics to Production Patterns</title><link>https://ensigncode.com/blog/cuda-memory-management-basics-to-production/</link><guid isPermaLink="true">https://ensigncode.com/blog/cuda-memory-management-basics-to-production/</guid><description>Understand the CUDA memory hierarchy and the pooling and coalescing patterns that keep production kernels fast.</description><pubDate>Thu, 16 Jan 2025 00:00:00 GMT</pubDate><category>CUDA Engineering</category><category>CUDA</category><category>Memory</category><category>Optimization</category><author>info@ensigncode.com (Ensigncode Engineering)</author></item><item><title>Profiling CUDA Workloads: Finding the Real Bottleneck</title><link>https://ensigncode.com/blog/profiling-cuda-workloads-finding-the-real-bottleneck/</link><guid isPermaLink="true">https://ensigncode.com/blog/profiling-cuda-workloads-finding-the-real-bottleneck/</guid><description>Use Nsight to distinguish compute-bound, memory-bound, and CPU-stalled workloads before you optimize anything.</description><pubDate>Wed, 20 Nov 2024 00:00:00 GMT</pubDate><category>Performance Engineering</category><category>CUDA</category><category>Profiling</category><category>Nsight</category><category>Performance</category><author>info@ensigncode.com (Ensigncode Engineering)</author></item><item><title>Multi-GPU CUDA: Scaling Beyond One Card</title><link>https://ensigncode.com/blog/multi-gpu-cuda-scaling-beyond-one-card/</link><guid isPermaLink="true">https://ensigncode.com/blog/multi-gpu-cuda-scaling-beyond-one-card/</guid><description>A guide to scaling CUDA workloads across multiple GPUs without letting communication overhead eat your gains.</description><pubDate>Wed, 02 Oct 2024 00:00:00 GMT</pubDate><category>Distributed Computing</category><category>CUDA</category><category>Multi-GPU</category><category>NCCL</category><category>Scaling</category><author>info@ensigncode.com (Ensigncode Engineering)</author></item><item><title>Building a Production CUDA Inference Pipeline: End-to-End Guide</title><link>https://ensigncode.com/blog/building-a-production-cuda-inference-pipeline/</link><guid isPermaLink="true">https://ensigncode.com/blog/building-a-production-cuda-inference-pipeline/</guid><description>How to take a model from a notebook to a production CUDA inference service with batching, concurrency, and low latency.</description><pubDate>Wed, 14 Aug 2024 00:00:00 GMT</pubDate><category>Production Engineering</category><category>CUDA</category><category>Inference</category><category>TensorRT</category><category>Production</category><author>info@ensigncode.com (Ensigncode Engineering)</author></item></channel></rss>