Google Gemma 4 MoE (26B) on AMD Ryzen AI

May 10, 2026 #AI #AMD #LLM

Overview:

This technical log documents the installation and optimisation of the Google Gemma 4 Mixture-of-Experts (MoE) model on the MINISFORUM AI X1 Pro, a mini PC featuring the AMD Ryzen AI 9 HX 470 processor. The report details the challenges of running a large 26-billion-parameter model on a consumer-grade Unified Memory Architecture, focusing on critical RAM allocation and BIOS UMA adjustments. It explains how to resolve memory-mapping failures and hardware-specific OOM errors by bypassing standard Linux kernel overcommit limits and fine-tuning the vLLM and ROCm software stack. Performance comparisons highlight that while Ollama offers higher speeds for individual users, the vLLM backend provides superior efficiency for multi-user API environments. Ultimately, the guide provides a comprehensive resolution matrix and a definitive Docker configuration to achieve stable inference on this specific RDNA 3.5 hardware.

More…

A Learning Guide to the blog-post-excerpt Claude Skill

May 9, 2026 #Claude #Skills

A practical, end-to-end walkthrough of the project-scoped blog-post-excerpt Claude skill that lives in this repository under .claude/skills/. The guide follows the chapter structure of Anthropic's Complete Guide to Building Skills for Claude and applies each principle to the working skill that automates Jekyll front-matter excerpt generation. Learn how progressive disclosure splits across SKILL.md, its body, and the references folder; how to craft a triggering description; how to inject an inline SVG illustration plus a TLDR into a single line of YAML; and how a final verification step keeps the front matter from breaking on every run.

More…

Running Gemma 4 E4B on the AMD ROCm

May 5, 2026 #AI

This deep-dive shows how to run the Google DeepMind Gemma 4 E4B model — a 4.5B-effective dense network with Per-Layer Embeddings — on a Minisforum AI X1 Pro driven by the AMD Ryzen AI 9 HX 470, Radeon 890M iGPU and XDNA 2 NPU. It walks the verified vLLM Docker recipe on ROCm 7.2, decomposes the hybrid sliding-window plus global attention that makes a 128K context fit on a 16GB-class memory budget, and shows where MIGraphX can offload an ONNX sidecar to the NPU. The result is a layered guide from architecture math through tuning, quantisation, and benchmarking.

More…

LLM Wiki

April 8, 2026 #AI

Andrej Karpathy Obsidian method refers to his publicly shared system for using LLMs to build and maintain personal knowledge bases as interlinked markdown wikis, all viewed and navigated through Obsidian. The system, which he calls LLM Wiki or LLM Knowledge Bases.

More…

Running AMD ROCm AI Workloads locally

March 7, 2026 #AI

This guide demonstrates how to run AMD ROCm AI workloads on the MINISFORUM AI X1 Pro-470 mini PC powered by the AMD Ryzen AI 9 HX 470 (12-core Zen5, up to 5.2GHz), featuring the integrated Radeon 890M (gfx1150) GPU and an 86 TOPS NPU. Running Ubuntu with the OEM kernel, the setup includes installing ROCm 7.2, verifying HSA agents with rocminfo, and deploying PyTorch in Docker containers. It also details configuring MIGraphX and ONNX Runtime with the MIGraphX Execution Provider via Docker Compose, enabling high-performance on-device ML inference — fully local, no discrete GPU required..

More…