Finding the right local large language model (LLM) for specific hardware can be a daunting task. A new tool, whichllm, simplifies this process by auto-detecting a system's GPU, CPU, and RAM, then ranking compatible models from HuggingFace based on real-world performance metrics.
At the top of the list is the Qwen/Qwen3.6-27B model, which boasts a score of 92.8 and achieves a throughput of 27 tokens per second (t/s). This model stands out for its strong performance and relatively efficient use of resources. Following closely is the Qwen/Qwen3-32B model, with a score of 83.0 and a faster rate of 31 t/s, though it ranks lower due to its overall benchmark performance.
While the 32B model can fit into various systems, whichllm prioritizes performance over size, indicating that the 27B model is the superior choice for most applications. The ranking reflects a nuanced understanding of performance metrics, where speed and quality are both critical. Newer generation models often provide better benchmarks despite their smaller size.
Performance Insights
The performance scores clarify why whichllm emphasizes the 27B model as the best option. Although the Qwen/Qwen3-32B is larger, it does not perform as well in benchmark tests. This distinction highlights the tool's purpose: to offer users a reliable method for selecting models that fit their hardware while delivering optimal performance.
Efficient Hardware Utilization
Users can take advantage of whichllm's capabilities to streamline their hardware planning. The tool supports various model formats, allowing flexibility in deployment. Models can be run in different modes, including a CPU-only option, catering to a wide range of user needs. The auto-pick feature ensures that even those less familiar with model specifications can easily identify the best choice for their setup.
Additionally, whichllm provides a simple code snippet for users who wish to implement their chosen model in Python. This accessibility lowers the barrier for developers looking to utilize LLMs without extensive technical knowledge.
Looking Ahead
As the demand for efficient AI solutions grows, tools like whichllm are set to play a key role in bridging the gap between complex AI models and user-friendly access. By prioritizing performance metrics over mere size, whichllm aids in optimal model selection and encourages a more informed approach to AI infrastructure planning. As advancements in model development continue, users can expect ongoing improvements in how these tools assess and rank LLMs for various hardware configurations.
Whichllm stands out as a valuable resource for anyone looking to maximize their local LLM capabilities while ensuring they use the most effective model for their hardware setup.
Quick answers
What is whichllm?
Whichllm is a tool that auto-detects hardware specifications and ranks compatible local LLMs based on performance benchmarks.
How does whichllm rank models?
Models are ranked based on real-world performance metrics, including speed and quality, rather than just size.
Can whichllm run models on CPU only?
Yes, whichllm includes a CPU-only mode for users whose systems may not support GPU operations.



