Accelerating AI development with Windows-based AI workstations - Windows Developer Blog

Today, we announced powerful capabilities for AI development with Windows AI Foundry, featuring components like Windows ML that enable developers to bring their own models and deploy them efficiently across diverse silicon partner ecosystem including – AMD, Intel, NVIDIA and Qualcomm spanning CPU, GPU, NPU (Neural Processing Unit). In the rapidly evolving field of AI development, having robust hardware is just as important as reliable software and tools for optimizing performance and efficiency.

We heard from developers that they need powerful workstations equipped with advanced CPUs, GPUs and specialized processors like NPUs with blazing fast memory and storage for moving massive amounts of data around to handle the intensive computational demands of AI tasks such as data processing, model training, finetuning, inference and deployment. A robust Windows-based workstation combined with a reliable platform like Windows AI Foundry enables AI developers to streamline their workflows, reduce latency and achieve real-time processing capabilities.

Your AI stack, your machine: The case for Windows-based Workstations

For AI developers, a workstation is more than just a powerful machine; it’s a critical asset that enhances their ability to develop, test and deploy AI models efficiently and more importantly – locally. It offers a few key benefits:

Choosing a Windows-based AI workstation is especially important for Windows developers who require speed, flexibility and control in AI workloads. It is also ideal for leveraging AI Toolkit for Visual Code and the full capabilities of Windows AI Foundry and its components such as Foundry Local and Windows ML, which are optimized for local workflows.

AI developers on Windows 11 can choose from a wide array of PC hardware from OEM partners like Dell, HP, Lenovo and more. Not only are these systems optimized for AI model finetuning, inferencing and deployment, but they also provide flexibility in your developer workflow with different form factors and price points.

For developers requiring raw power, a desktop workstation with a powerful CPU and GPU can significantly reduce the time for local model finetuning. Those looking for physical space efficiency may consider a mini workstation, while developers on the move can benefit from a mobile workstation, which enables them to run inference on a proprietary model locally in absence of a stable network connection.

At Build 2025, we featured the latest workstations from a few of our OEM partners:

Fine-Tuning on the Dell Pro Max Tower T2 with Intel Core Ultra 7 processor and the NVIDIA RTX PRO 6000 Blackwell GPU

Local fine-tuning on a workstation improves developer productivity by offering a quick, powerful and easy way to run and debug your prototypes locally before you take your final workflows to cloud.

With the Dell Pro Max Tower T2, we fine-tuned the Microsoft’s Phi-4-mini model using the NVIDIA GPU with LoRA (low-rank adaption) in AI Toolkit and the alpaca-cleaned dataset which contained 51,760 prompts. For a batch size of 8 prompts, the system was able to backpropagate 2.16 batches per second, finishing 3 full epochs of fine-tuning in about 2 hours and 15 minutes. The same workflow could take up to a day to complete on the cloud due to round trip latency with the additional costs of cloud billing.

Example 1: Fine-tuning Phi-4-mini model local off the NVIDIA RTX PRO 60000 GPU on a Dell Pro Max Tower T2 (sped up 150x)

AI Anywhere: Harnessing GPU Power on the go with the HP ZBook Ultra G1a 14”

The new HP ZBook Ultra G1a 14” powered by the AMD Ryzen AI Max+ PRO 395 is a Copilot+ PC that leverages the power of a CPU, GPU and NPU to maximize performance and resources in various workflows.

Example 2: Loading and running 70b DeepSeek R1 locally on HP ZBook Ultra G1a 14”

In one use case, we utilized AI Tool Kit for VS Code to load and run a large 70b parameter DeepSeek R1 model locally – an impressive feat for a mobile workstation.

Example 3: Running SDXL and Phi-4 Mini concurrently on the HP ZBook Ultra G1a 14” laptop

In another use case, we ran Stable Diffusion XL (SDXL) model and Phi-4 Mini CPU ONNX model at the same time. The image generation ran at ~3 to 6 it/sec concurrently while the text generation task concluded within 7-17 tokens/sec. You can see from the task manager that the GPU is utilized to give max processing performance. The NVMe SSD makes it ideal for high performance applications like these.

To run both the image generation and text generation concurrently, you typically need a system with enough processing power, memory and storage. Although a desktop setup can meet these requirements, the HP ZBook Ultra G1a 14” laptop is capable of handling such tasks within a compact form factor for productivity on the go.

By 2027, it is expected that 60% of PCs shipped will feature on-device AI capabilities2 pushing the boundaries of AI models running locally. Having the right workstation not only accelerates the development cycle but also ensures that AI models run across a wide breadth of CPUs, GPUs and NPUs. Understanding the breadth of available options and investing in the right workstation can significantly enhance developers’ ability to innovate and succeed in the competitive landscape of AI.

To drive innovation, both software and hardware must align. Not only do we encourage developers to check out Windows AI Foundry, but we also encourage developers to leverage the latest hardware in Windows-based AI workstations for the best developer experience.

1 The Dell Pro Max Tower T2 is the World’s fastest for single-threaded application performance – Based on internal analysis of competitors within the entry level workstation space, Lenovo P3 Ultra, HP Z2 G9 Mini, HP Z2 G1a Mini and Lenovo P3 Tiny. February 2025

2 Based on report by Canalys “Now and next for AI-capable PCs: Revolutionizing computing: AI PCs and the market outlook.” (January 2024)

Privacy and SecurityCost EfficiencySpeed and ReliabilityDell Pro Max Tower T2Dell Pro Max 16 HPZBook Ultra G1a 14”HPZ2 Mini G1a Lenovo ThinkPad P14s Gen 6LenovoThinkPad P16s Gen 4Fine-Tuning on the Dell Pro Max Tower T2 with Intel Core Ultra 7 processor and the NVIDIA RTX PRO 6000 Blackwell GPUMicrosoft’s Phi-4-mini. Example 1: Fine-tuning Phi-4-mini model local off the NVIDIA RTX PRO 60000 GPU on a Dell Pro Max Tower T2 (sped up 150x)AI Anywhere: Harnessing GPU Power on the go with the HP ZBook Ultra G1a 14” Example 2: Loading and running 70b DeepSeek R1 locally on HP ZBook Ultra G1a 14”Example 3: Running SDXL and Phi-4 Mini concurrently on the HP ZBook Ultra G1a 14” laptop