<?xml version="1.0"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
	<channel>
		<title>Coolscript  - Recent changes [en]</title>
		<link>https://coolscript.net/index.php/Special:RecentChanges</link>
		<description>Track the most recent changes to the wiki in this feed.</description>
		<language>en</language>
		<generator>MediaWiki 1.40.1</generator>
		<lastBuildDate>Tue, 02 Jun 2026 20:04:31 GMT</lastBuildDate>
		<item>
			<title>Ollama Modelfile List</title>
			<link>https://coolscript.net/index.php?title=Ollama_Modelfile_List&amp;diff=1151&amp;oldid=1149</link>
			<guid isPermaLink="false">https://coolscript.net/index.php?title=Ollama_Modelfile_List&amp;diff=1151&amp;oldid=1149</guid>
			<description>&lt;p&gt;&lt;/p&gt;
&lt;a href=&quot;https://coolscript.net/index.php?title=Ollama_Modelfile_List&amp;amp;diff=1151&amp;amp;oldid=1149&quot;&gt;Show changes&lt;/a&gt;</description>
			<pubDate>Thu, 28 May 2026 11:31:59 GMT</pubDate>
			<dc:creator>Admin</dc:creator>
			<comments>https://coolscript.net/index.php/Talk:Ollama_Modelfile_List</comments>
		</item>
		<item>
			<title>Ollama Modelfile List</title>
			<link>https://coolscript.net/index.php?title=Ollama_Modelfile_List&amp;diff=1149&amp;oldid=0</link>
			<guid isPermaLink="false">https://coolscript.net/index.php?title=Ollama_Modelfile_List&amp;diff=1149&amp;oldid=0</guid>
			<description>&lt;p&gt;Created page with &amp;quot;&amp;#039;&amp;#039;&amp;#039;This is a local Ollama installation running on a powerful local GPU.&amp;#039;&amp;#039;&amp;#039;  __NOTOC__ This page serves as a comprehensive technical registry and capabilities guide for the locally installed Ollama Large Language Models (LLMs). It outlines their core architecture, performance profiles, memory usage, and safety alignment status.  == Local Hardware Performance Overview == The following estimates assume full GPU offloading utilizing a high-end local GPU layout. Splitting mod...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&amp;#039;&amp;#039;&amp;#039;This is a local Ollama installation running on a powerful local GPU.&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
&lt;br /&gt;
__NOTOC__&lt;br /&gt;
This page serves as a comprehensive technical registry and capabilities guide for the locally installed Ollama Large Language Models (LLMs). It outlines their core architecture, performance profiles, memory usage, and safety alignment status.&lt;br /&gt;
&lt;br /&gt;
== Local Hardware Performance Overview ==&lt;br /&gt;
The following estimates assume full GPU offloading utilizing a high-end local GPU layout. Splitting models onto system RAM will heavily degrade these numbers.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikibase&amp;quot; style=&amp;quot;width:100%; border:1px solid #ccc; border-collapse:collapse; text-align:left;&amp;quot;&lt;br /&gt;
! Scope / Scale&lt;br /&gt;
! Parameter Range&lt;br /&gt;
! Avg. VRAM Footprint&lt;br /&gt;
! Speed (Tokens/sec)&lt;br /&gt;
! Context Latency (TTFT)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;#039;&amp;#039;&amp;#039;High-Capability&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
| 20B – 31B&lt;br /&gt;
| 16 GB – 24 GB&lt;br /&gt;
| 20 – 35 tok/s&lt;br /&gt;
| Moderate (~1.5s - 3.0s)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;#039;&amp;#039;&amp;#039;Mid-Range / Specialist&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
| 8B – 16B&lt;br /&gt;
| 6 GB – 14 GB&lt;br /&gt;
| 40 – 65 tok/s&lt;br /&gt;
| Low (~0.8s - 1.5s)&lt;br /&gt;
|-&lt;br /&gt;
| &amp;#039;&amp;#039;&amp;#039;Low-Resource / Edge&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
| 1B – 7B&lt;br /&gt;
| 2 GB – 6 GB&lt;br /&gt;
| 70 – 120+ tok/s&lt;br /&gt;
| Near-Instant (&amp;lt;0.5s)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
&lt;br /&gt;
== High-Capability &amp;amp; Large-Scale Models (15B - 31B) ==&lt;br /&gt;
&lt;br /&gt;
=== gemma-4-31b-uncensored ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; A highly capable, large-scale model based on the Google Gemma 4 architecture, modified to bypass standard safety guardrails.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Complex reasoning, deep creative writing, philosophical exploration, and processing intricate multi-step prompts without refusal barriers.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; ~31 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~20.5 GB (Quantized)&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; Generates ~22-26 tokens/sec. High reasoning overhead can cause a slightly delayed Time-To-First-Token (TTFT) of 2-3 seconds.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:green; font-weight:bold;&amp;quot;&amp;gt;[No] Uncensored / Abliterated&amp;lt;/span&amp;gt; – All standard safety alignment filters have been removed.&lt;br /&gt;
&lt;br /&gt;
=== qwen3-coder:30b ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; Alibaba&amp;#039;s flagship open-weights coding model, optimized for enterprise-grade software engineering, architecture design, and complex debugging.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Writing full-stack applications, handling multi-file repository contexts, explaining complex algorithms, and optimizing legacy code.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; ~30 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~19.8 GB&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; Achieves ~25-30 tokens/sec. Processing long context inputs (codebases) will cause pre-fill latency to scale upward.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;[Yes] Censored&amp;lt;/span&amp;gt; – Standard alignment remains active to prevent malicious code generation (e.g., malware design).&lt;br /&gt;
&lt;br /&gt;
=== qwen3.6:27b ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; A premium, heavy-duty generalist model designed for advanced reasoning, mathematical logic, translation, and analytical workflows.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; High-fidelity data analysis, complex summarizing, multi-language translation, and acting as a central orchestration agent.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; ~27 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~18.2 GB&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; ~28-33 tokens/sec. Consistent, stable token execution pacing.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;[Yes] Censored&amp;lt;/span&amp;gt; – Adheres to strict ethical, helpful, and harmless boundaries.&lt;br /&gt;
&lt;br /&gt;
=== gemma4:26b ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; The standard, fully-aligned iteration of Google&amp;#039;s 26B Gemma 4 model, balancing massive parameter depth with strict safety guardrails.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Enterprise deployments, academic research assistance, and standard corporate productivity tools where safety compliance is required.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; ~26 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~17.5 GB&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; ~30 tokens/sec. Safety evaluations add a marginal execution latency overhead to the initial response generation.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;[Yes] Strictly Censored&amp;lt;/span&amp;gt; – Contains default Google RLHF guardrails against sensitive, harmful, or controversial topics.&lt;br /&gt;
&lt;br /&gt;
=== gpt-oss:20b ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; An open-source generalist model trained to mimic commercial GPT-style interactions across diverse task types.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; General brainstorming, text transformation, data cleaning, and everyday agentic tasks.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; ~20 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~14.0 GB&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; Highly responsive at ~35 tokens/sec with low overall processing latency.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:orange; font-weight:bold;&amp;quot;&amp;gt;[Partial] Lightly Censored&amp;lt;/span&amp;gt; – Usually features basic ethical boundaries but is significantly more permissive than strict corporate models.&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
&lt;br /&gt;
== Mid-Range &amp;amp; Specialist Models (8B - 16B) ==&lt;br /&gt;
&lt;br /&gt;
=== hf.co/mradermacher/Mistral-Nemo-Instruct-2407-abliterated-i1-GGUF:Q5_K_M ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; A precision-tuned 12B parameter model combining Mistral&amp;#039;s architecture with &amp;quot;abliteration&amp;quot; techniques to erase negative refusal weights.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Unrestricted academic research, writing intense fiction, roleplay, and analyzing controversial text datasets.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; 12.2 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~8.9 GB (Q5_K_M layout)&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; Swift ~45-52 tokens/sec response generation. Near-instant text generation initialization.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:green; font-weight:bold;&amp;quot;&amp;gt;[No] Uncensored / Abliterated&amp;lt;/span&amp;gt; – Chemically stripped of refusal behaviors while retaining structural coherence.&lt;br /&gt;
&lt;br /&gt;
=== hf.co/MaziyarPanahi/Mistral-Nemo-Instruct-2407-GGUF:Q5_K_M ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; The cleanly quantized, official instruction-tuned variant of the highly regarded Mistral-Nemo 12B model.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; General instruction following, structured text generation, multi-lingual translation, and logic puzzles.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; 12.2 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~8.9 GB (Q5_K_M layout)&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; Highly predictable ~45-52 tokens/sec generation speed with negligible pre-fill lag.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;[Yes] Censored&amp;lt;/span&amp;gt; – Retains default Mistral safety alignments.&lt;br /&gt;
&lt;br /&gt;
=== MFDoom/deepseek-coder-v2-tool-calling:16b ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; A custom Mixture-of-Experts (MoE) fine-tune specifically optimized to act as an agentic backend capable of executing external function calls.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; AI agents, local tool integration, and automated pipeline execution.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; 16 Billion Total (uses ~3.3B active parameters per token)&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~11.2 GB&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; Blazing fast inference speeds due to MoE architecture, averaging ~55-65 tokens/sec. High initial layout parsing efficiency.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;[Yes] Censored&amp;lt;/span&amp;gt; – Aligned to prevent execution of destructive or malicious local commands.&lt;br /&gt;
&lt;br /&gt;
=== deepseek-coder-v2:16b ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; The foundational Mixture-of-Experts (MoE) coding model from DeepSeek, prized for high-efficiency programming generation.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Continuous inline code completion, rapid prototyping, code refactoring, and general script writing.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; 16 Billion Total&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~11.2 GB&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; ~55-65 tokens/sec. Optimized for sub-second text stream delivery loops.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;[Yes] Censored&amp;lt;/span&amp;gt; – Standard guardrails against generating malware or exploits.&lt;br /&gt;
&lt;br /&gt;
=== qwen3:14b ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; The standard mid-tier iteration of the Qwen 3 general-purpose intelligence framework.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Summarizing lengthy documents, conversational search support, copy editing, and medium-complexity script writing.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Strengths:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; ~14 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~10.1 GB&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; Runs fluidly at ~42-48 tokens/sec. Moderate latency scaling during lengthy prompt ingestion phases.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;[Yes] Censored&amp;lt;/span&amp;gt; – Fully aligned with default safety training.&lt;br /&gt;
&lt;br /&gt;
=== qwen2.5-coder:14b ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; The established, highly mature 14B coding engine from the Qwen 2.5 generation.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Stable software development environments requiring predictable, reliable syntax generation.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; 14.8 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~10.5 GB&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; Stable ~42-48 tokens/sec. Excellent processing consistency throughout extended syntax blocks.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;[Yes] Censored&amp;lt;/span&amp;gt; – Features standard safety controls.&lt;br /&gt;
&lt;br /&gt;
=== gemma3:12b ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; A legacy mid-tier general model from Google’s third-generation open weights release.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Everyday office automation tasks, document formatting, and general QA datasets.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; 12.2 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~8.8 GB&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; ~45 tokens/sec. Low baseline generation latency.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;[Yes] Censored&amp;lt;/span&amp;gt; – Governed by standard Google safety alignment.&lt;br /&gt;
&lt;br /&gt;
=== gemma4:e4b ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; An experimental or early-quantized/preview variant of the Gemma 4 framework optimized for edge environments.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Comparing structural generation changes between Gemma versions or running low-overhead general tasks.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; ~12 Billion Equivalent&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~7.9 GB&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; Spits out tokens highly efficiently at ~50 tokens/sec. Rapid initialization cycles.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;[Yes] Censored&amp;lt;/span&amp;gt; – Standard guardrails apply.&lt;br /&gt;
&lt;br /&gt;
=== gemma2:9b ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; Google’s highly successful 9B parameter generalist model from the Gemma 2 era.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Low-resource conversational assistance, flashcard generation, and quick text summarization.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; 9.2 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~6.8 GB&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; Sharp, snappy output reaching ~55 tokens/sec. Minimal processing delay.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;[Yes] Censored&amp;lt;/span&amp;gt; – Fully aligned.&lt;br /&gt;
&lt;br /&gt;
=== hf.co/NousResearch/Hermes-3-Llama-3.1-8B-GGUF:Q5_K_M ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; A premium, creative fine-tune of Llama 3.1 8B by Nous Research, tailored for advanced roleplay, agentic steps, and complex instruction following.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Creative writing, world-building, intricate multi-turn roleplay, and agentic workflows.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; 8.0 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~5.9 GB (Q5_K_M execution footprint)&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; Highly responsive ~60 tokens/sec stream velocity. Instant initial output response behavior.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:orange; font-weight:bold;&amp;quot;&amp;gt;[Partial] Highly Permissive&amp;lt;/span&amp;gt; – While not aggressively abliterated, it is fine-tuned to be neutral, non-preachy, and almost entirely free of false-positive refusals.&lt;br /&gt;
&lt;br /&gt;
=== gemma4-8b-uncensored ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; A modified 8B Gemma 4 base designed to offer modern reasoning power without any topic restriction.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Running unfiltered writing experiments or analysis tasks on low-spec hardware.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; ~8 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~5.5 GB&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; Snappy ~58-64 tokens/sec. Zero latency blocks.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:green; font-weight:bold;&amp;quot;&amp;gt;[No] Uncensored&amp;lt;/span&amp;gt; – Safety tuning bypassed.&lt;br /&gt;
&lt;br /&gt;
=== dolphin-2.9-8b:latest ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; Eric Hartford&amp;#039;s iconic Dolphin fine-tune applied to an 8B base, explicitly optimized to be helpful, harmless, and completely unbiased/unfiltered.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Unrestricted hacking/penetration testing research, unfiltered creative prose, and raw data transformations.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; 8.0 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~5.3 GB&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; Fast ~60 tokens/sec stream output rate. Exceptionally low prompt parsing delays.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:green; font-weight:bold;&amp;quot;&amp;gt;[No] Uncensored&amp;lt;/span&amp;gt; – Fully uncensored by design.&lt;br /&gt;
&lt;br /&gt;
=== hf.co/Qwen/Qwen3-8B-GGUF:Q4_K_M ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; The lean, highly quantized 8B baseline of the Qwen 3 general series.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Basic chat utilities, lightweight translation scripts, and low-latency local assistants.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; 8.2 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~5.2 GB (Q4_K_M sweet spot footprint)&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; Swift ~65 tokens/sec throughput with immediate text streaming characteristics.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;[Yes] Censored&amp;lt;/span&amp;gt; – Standard default alignment.&lt;br /&gt;
&lt;br /&gt;
---&lt;br /&gt;
&lt;br /&gt;
== Low-Resource &amp;amp; Edge Models (1B - 7B) ==&lt;br /&gt;
&lt;br /&gt;
=== qwen2.5-coder:7b ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; A highly optimized 7B parameter programming specialist designed to run efficiently on standard laptops.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Real-time IDE integration, autocomplete loops, and small-scale scripting.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; 7.6 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~4.8 GB&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; Blazing fast performance at ~75-85 tokens/sec. Perfect for immediate inline coding block autocomplete behaviors.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;[Yes] Censored&amp;lt;/span&amp;gt; – Basic coding safety limits apply.&lt;br /&gt;
&lt;br /&gt;
=== qwen7b-32k:latest ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; A specialized 7B variant configured specifically to ingest and remember massive text inputs up to a 32,000 token window.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Ingesting whole research papers, large source-code files, or extensive chat histories in a single prompt.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; 7.2 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~4.7 GB base allocation. Maxing out the context window to 32k tokens causes the pre-allocated Key-Value (KV) cache memory to expand significantly (up to an additional 4-8 GB of VRAM dynamically).&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; ~70 tokens/sec under standard usage. Heavy prompt ingestion will scale initial TTFT up to several seconds during token processing phases.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;[Yes] Censored&amp;lt;/span&amp;gt; – Standard Qwen alignment.&lt;br /&gt;
&lt;br /&gt;
=== llama3.2:latest ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; Meta&amp;#039;s highly popular, lightweight 3B generalist model designed for edge computing and mobile deployment.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Ultra-fast everyday text tasks, basic email formatting, and maintaining a low-footprint background desktop assistant.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; 3.2 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~2.5 GB&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; Extreme text throughput speeds averaging ~100-120+ tokens/sec. Instantaneous response execution.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;[Yes] Censored&amp;lt;/span&amp;gt; – Adheres strictly to Meta&amp;#039;s safety guidelines.&lt;br /&gt;
&lt;br /&gt;
=== qwen2.5-coder:3b ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; An ultra-compact coding model tailored for low-resource hardware, suitable for embedded systems or background IDE plugins.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Quick syntax checks, simple function writing, and short utility scripts.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; 3.1 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~2.4 GB&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; ~110 tokens/sec. Text returns arrive immediately with zero typing lag.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;[Yes] Censored&amp;lt;/span&amp;gt; – Basic alignment active.&lt;br /&gt;
&lt;br /&gt;
=== qwen3b-high-ctx:latest ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; A 3B parameter model optimized specifically for handling enlarged context lengths on constrained hardware.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Reading lengthy logs or documentation files on machines lacking dedicated GPU hardware.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; ~3 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~2.3 GB base (expands based on operational context volume loading).&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; ~95-105 tokens/sec under typical loads.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;[Yes] Censored&amp;lt;/span&amp;gt; – Standard alignment.&lt;br /&gt;
&lt;br /&gt;
=== deepseek-r1:1.5b ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; A tiny, distilled reasoning model featuring internal chain-of-thought (&amp;quot;thinking&amp;quot;) capabilities.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Basic logical problem solving, simple math validation, and testing deep reasoning architectures on extremely weak hardware.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; 1.5 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~1.1 GB&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; Outrageously fast formatting speeds up to ~130 tokens/sec. However, note that total latency is lengthened because the model generates internal thinking tokens before revealing the raw target answer text.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:red;&amp;quot;&amp;gt;[Yes] Censored&amp;lt;/span&amp;gt; – Retains default alignment protocols during its thinking phase.&lt;br /&gt;
&lt;br /&gt;
=== smollm2-uncensored:latest ===&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Purpose:&amp;#039;&amp;#039;&amp;#039; An ultra-compact 1.7B parameter model optimized for mobile or CPU-only setups, stripped of systemic safety refusals.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Best For:&amp;#039;&amp;#039;&amp;#039; Unfiltered basic text generation, edge-device testing, and quick offline note restructuring.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Resource Profile:&amp;#039;&amp;#039;&amp;#039;&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Parameters:&amp;#039;&amp;#039;&amp;#039; 1.7 Billion&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Memory/VRAM Usage:&amp;#039;&amp;#039;&amp;#039; ~1.3 GB&lt;br /&gt;
** &amp;#039;&amp;#039;&amp;#039;Performance &amp;amp; Latency:&amp;#039;&amp;#039;&amp;#039; Absolute maximum speed configuration running at ~130+ tokens/sec with sub-millisecond start delivery times.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Censorship Status:&amp;#039;&amp;#039;&amp;#039; &amp;lt;span style=&amp;quot;color:green; font-weight:bold;&amp;quot;&amp;gt;[No] Uncensored&amp;lt;/span&amp;gt; – Safety mechanisms fully removed.&lt;br /&gt;
&lt;br /&gt;
[[Category:Local AI Models]]&lt;br /&gt;
[[Category:Ollama Infrastructure]]&lt;/div&gt;</description>
			<pubDate>Tue, 26 May 2026 20:37:42 GMT</pubDate>
			<dc:creator>Admin</dc:creator>
			<comments>https://coolscript.net/index.php/Talk:Ollama_Modelfile_List</comments>
		</item>
</channel></rss>