Ollama model llama-guard3:8b - Tricked

Mashiane

Expert
Licensed User
Longtime User
Let me try this out...

With Claude Code

B4X:
ollama launch claude --model nemotron-3-nano:30b-cloud

With GitHub Copilot..

B4X:
ollama launch vscode --model nemotron-3-nano:30b-cloud
Alright, alright, alright... time is gonna tell

1778144546372.png
 

Daestrum

Expert
Licensed User
Longtime User
@Mashiane It does run nicely, locally, on really low hardware (no dedicated GPU) I get 13-17 tokens/sec (via LLM Studio). I have 64GB RAM, iGPU grabs 30GB. (only slightly slower than Qwen 3.6 35B model running locally)
 
Top