AI-powered programming assistant with local models
Русская версия • Features • Instruments • Tools Reference
Ollama Code is a CLI tool for AI-powered programming assistance using local Ollama models. The project provides full control over code and data, working completely offline.
run → run_shell_command)Different models require different amounts of VRAM. Below is a guide for NVIDIA GPUs:
| Model Size | Min VRAM | Recommended GPU | Notes |
|---|---|---|---|
| 3B | 4 GB | RTX 3050, GTX 1660, RTX 5060 | Basic models, quantization recommended |
| 7B | 6 GB | RTX 3060, RTX 4060, RTX 5060 | Good balance of speed and quality |
| 8B | 8 GB | RTX 3070, RTX 4060 Ti, RTX 5070 | DeepSeek R1, Llama 3.1 |
| 14B | 12 GB | RTX 3080, RTX 4070, RTX 5070 Ti | Qwen2.5-Coder 14B |
| 30B | 20 GB | RTX 3090, RTX 4090, RTX 5080 | Qwen3-Coder 30B |
| 70B | 32 GB | RTX 5090, 2x RTX 3090 | DeepSeek R1 70B, Llama 3.1 70B |
| 120B+ | 48+ GB | 2x RTX 5090, A100 | Requires multi-GPU or cloud |
Performance tests conducted with standard tasks (code generation, refactoring, debugging):
| GPU | VRAM | Model | Quantization | Speed (tok/s) | Quality Score |
|---|---|---|---|---|---|
| RTX 3060 | 12 GB | llama3.2:3b | Q4_K_M | 45-55 | Good |
| RTX 3060 | 12 GB | qwen2.5-coder:7b | Q4_K_M | 28-35 | Very Good |
| RTX 3060 | 12 GB | deepseek-r1:8b | Q4_K_M | 22-28 | Excellent |
| RTX 3060 | 12 GB | qwen2.5-coder:14b | Q3_K_M | 12-18 | Excellent |
| RTX 3070 | 8 GB | llama3.2:3b | Q4_K_M | 55-65 | Good |
| RTX 3070 | 8 GB | qwen2.5-coder:7b | Q4_K_M | 35-42 | Very Good |
| RTX 3070 | 8 GB | deepseek-r1:8b | Q4_K_M | 28-35 | Excellent |
| RTX 3080 | 10 GB | qwen2.5-coder:7b | Q8_0 | 40-48 | Excellent |
| RTX 3080 | 10 GB | qwen2.5-coder:14b | Q4_K_M | 25-32 | Excellent |
| RTX 3080 | 10 GB | deepseek-r1:8b | Q8_0 | 32-40 | Excellent |
| RTX 3090 | 24 GB | qwen2.5-coder:14b | Q8_0 | 38-45 | Excellent |
| RTX 3090 | 24 GB | qwen3-coder:30b | Q4_K_M | 18-25 | Outstanding |
| RTX 3090 | 24 GB | deepseek-r1:32b | Q4_K_M | 12-18 | Outstanding |
| RTX 4070 | 12 GB | qwen2.5-coder:14b | Q5_K_M | 35-42 | Excellent |
| RTX 4070 Ti | 16 GB | qwen3-coder:30b | Q4_K_M | 22-28 | Outstanding |
| RTX 4080 | 16 GB | qwen3-coder:30b | Q5_K_M | 28-35 | Outstanding |
| RTX 4080 | 16 GB | deepseek-r1:32b | Q4_K_M | 20-28 | Outstanding |
| RTX 4090 | 24 GB | qwen3-coder:30b | Q8_0 | 45-55 | Outstanding |
| RTX 4090 | 24 GB | deepseek-r1:32b | Q5_K_M | 35-45 | Outstanding |
| RTX 4090 | 24 GB | deepseek-r1:70b | Q3_K_M | 8-12 | Exceptional |
| RTX 5070 | 12 GB | qwen2.5-coder:14b | Q6_K | 45-55 | Excellent |
| RTX 5070 Ti | 16 GB | qwen3-coder:30b | Q5_K_M | 35-45 | Outstanding |
| RTX 5080 | 16 GB | qwen3-coder:30b | Q6_K | 45-55 | Outstanding |
| RTX 5080 | 16 GB | deepseek-r1:32b | Q5_K_M | 38-48 | Outstanding |
| RTX 5090 | 32 GB | qwen3-coder:30b | FP16 | 80-100 | Exceptional |
| RTX 5090 | 32 GB | deepseek-r1:70b | Q4_K_M | 25-35 | Exceptional |
| RTX 5090 | 32 GB | llama3.1:70b | Q5_K_M | 30-40 | Exceptional |
Note: Speed varies based on context length, prompt complexity, and system configuration. Quality Score is subjective based on code generation accuracy and coherence. RTX 50 series shows significant performance improvements due to Blackwell architecture and GDDR7 memory.
| Quantization | Size Reduction | Quality Loss | Recommended For |
|---|---|---|---|
| Q4_K_M | ~70% | Minimal | Most use cases |
| Q5_K_M | ~65% | Very Low | Better quality |
| Q6_K | ~60% | Negligible | High quality needs |
| Q8_0 | ~50% | None | Maximum quality |
| FP16 | 0% | None | RTX 5090 with 32GB+ VRAM |
# Clone the repository
git clone <repository-url>
cd ollama-code
# Install dependencies
npm install
# Build the project
npm run build
# Interactive mode
npm run start
# With specific model
npm run start -- --model llama3.2
# One-off query
npm run start -- "Explain how async/await works in JavaScript"
# Debug mode
npm run debug
Ollama Code now includes a full-featured web interface:
# Start Web UI (development)
cd packages/web-app
npm run dev
# Start with terminal support
npm run dev:server
Web UI Features:
| Tab | Features |
|---|---|
| Chat | Streaming responses, model selection, session management |
| Files | File browser, Monaco editor, syntax highlighting |
| Terminal | Full PTY terminal with xterm.js |
API Endpoints:
| Endpoint | Description |
|---|---|
/api/models |
List available Ollama models |
/api/chat |
Chat with streaming |
/api/generate |
Generate with streaming |
/api/fs |
Filesystem operations |
/terminal |
WebSocket terminal |
Full-featured web application with three main components:
| Component | Technology | Features |
|---|---|---|
| ChatInterface | React + Zustand | Streaming, model selection, session persistence |
| FileExplorer | Monaco Editor | Syntax highlighting, multi-language support, auto-save |
| TerminalEmulator | xterm.js + node-pty | Full PTY support, resize, 256 colors |
Terminal WebSocket Server:
Comprehensive API documentation for all packages:
// SDK Usage
import { query, createSdkMcpServer, tool } from '@ollama-code/sdk';
const result = await query({
prompt: 'Explain async/await',
model: 'llama3.2',
});
// MCP Server
const myTool = tool({
name: 'echo',
description: 'Echo back a message',
parameters: { message: { type: 'string' } },
execute: async (params) => ({ echo: params.message }),
});
| Component | Description |
|---|---|
| PluginLoader | Discovery from builtin, user, project, npm sources |
| PluginManager | Lifecycle management with enable/disable hooks |
| PluginSandbox | Filesystem, network, command restrictions |
| PluginMarketplace | NPM-based search, install, update, uninstall |
Builtin Plugins (5):
core-tools — echo, timestamp, get_envdev-tools — python_dev, nodejs_dev, golang_dev, rust_dev, typescript_devfile-tools — read_file, write_file, edit_filesearch-tools — grep, glob, web_fetchshell-tools — run_shell_command| Model Size | Template | Prompt Size |
|---|---|---|
| <= 10B | 8b | ~500 tokens |
| <= 30B | 14b | ~800 tokens |
| <= 60B | 32b | ~1200 tokens |
| > 60B | 70b | ~1500 tokens |
Completed migration to axios for all HTTP operations:
| File | Changes |
|---|---|
packages/core/src/utils/httpClient.ts |
Axios instance with interceptors |
packages/core/src/core/ollamaNativeClient.ts |
Streaming with axios |
packages/core/src/tools/web-search/providers/*.ts |
Provider migration |
Features:
Fixed monorepo TypeScript configuration:
composite: true for referenced packagestsconfig.server.json)LoadingIndicator.tsx where useMemo hooks were called after early return, violating React Rules of Hooks| Feature | Description |
|---|---|
| Plugin System v2 | PluginLoader, PluginCLI, PluginMarketplace, PluginSandbox |
| HTTP Client | Axios with interceptors, retry logic, timeout handling |
| React Optimization | 6 specialized contexts, 11 memoized components |
| Cancellation | CancellationToken, AbortController cleanup |
| Context Caching | KV-cache reuse for 80-90% faster conversations |
Major architectural enhancements for better performance and extensibility:
| Feature | Description |
|---|---|
| Zustand Migration | Replaced Context API, eliminates unnecessary re-renders |
| Event Bus | Typed pub/sub system for loose component coupling |
| Command Pattern | Full Undo/Redo support for reversible operations |
| Plugin System v1 | Dynamic tool loading, builtin plugins, lifecycle hooks |
| Context Caching | KV-cache reuse for 80-90% faster conversations |
| Prompt Documentation | Complete documentation of prompt formation system |
| Store | Purpose |
|---|---|
sessionStore |
Session state and metrics |
streamingStore |
Streaming state + AbortController |
uiStore |
UI settings with persistence |
commandStore |
Command pattern for undo/redo |
eventBus |
Event pub/sub system |
Dynamic plugin architecture with lifecycle hooks:
const plugin: PluginDefinition = {
metadata: { id: 'my-plugin', name: 'My Plugin', version: '1.0.0' },
tools: [
{ id: 'hello', name: 'hello', execute: async () => ({ success: true }) },
],
hooks: {
onLoad: async (ctx) => ctx.logger.info('Loaded'),
onBeforeToolExecute: async (id, params) => true,
},
};
Builtin Plugins:
core-tools — echo, timestamp, get_envdev-tools — python_dev, nodejs_dev, golang_dev, rust_dev, typescript_devfile-tools — read_file, write_file, edit_filesearch-tools — grep, glob, web_fetchshell-tools — run_shell_commandTyped events for cross-component communication:
// Subscribe to events
eventBus.subscribe('stream:finished', (data) => {
console.log('Tokens:', data.tokenCount);
});
// Emit events
eventBus.emit('command:executed', { commandId: '123', type: 'edit' });
New comprehensive documentation in docs/PROMPT_SYSTEM.md:
getCoreSystemPrompt() — main system prompt constructiongetCompressionPrompt() — history compression to XMLgetToolCallFormatInstructions() — for models without native toolsgetToolLearningContext() — learning from past mistakesgetEnvironmentInfo() — runtime environment contextMajor performance improvement for multi-turn conversations:
| Feature | Description |
|---|---|
| 80-90% Faster | Subsequent messages use cached context tokens |
| KV-cache Reuse | Leverages Ollama’s native context caching |
| Auto Endpoint Selection | Switches between /api/generate and /api/chat |
| Session Tracking | Per-session context management |
// Enable context caching
const config: ContentGeneratorConfig = {
model: 'llama3.2',
enableContextCaching: true, // Key improvement
};
// Performance gains:
// Message 1: 100% (baseline)
// Message 2: ~15% tokens processed (85% cached)
// Message 10: ~7% tokens processed (93% cached)
All context caching components are fully tested with 118 tests:
| Component | Tests | Coverage |
|---|---|---|
| ContextCacheManager | 50 | TTL, eviction, concurrency, edge cases |
| OllamaContextClient | 32 | Streaming, errors, session management |
| HybridContentGenerator | 36 | Endpoint selection, token counting |
See docs/CONTEXT_CACHING.md for full API documentation.
| Component | Description |
|---|---|
| Zustand Stores | Replaced Context API for better performance |
| Event Bus | Typed publish/subscribe for loose coupling |
| Command Pattern | Undo/Redo support for reversible operations |
| Plugin System | Dynamic tool loading at runtime |
interface ContentGeneratorConfig {
// Enable context caching for faster conversations
enableContextCaching?: boolean;
// Session ID for context tracking
sessionId?: string;
}
The header now provides real-time context usage visualization:
| Feature | Description |
|---|---|
| Token Progress Bar | Visual indicator of context window usage |
| Model Context Size | Shows model’s context window (128K, 32K, etc.) |
| Capability Icons | Visual indicators for vision, tools, streaming support |
| Full-Width Display | Progress bar spans full info panel width |
Streamlined CLI by removing unused commands:
/bug, /docs, /help, /setup-github/stats + /about → /infoThe system now automatically learns from tool call errors and creates dynamic aliases:
| Feature | Description |
|---|---|
| Automatic Learning | Records tool call errors and creates aliases automatically |
| Fuzzy Matching | Uses Levenshtein distance to suggest correct tool names |
| Persistence | Learning data saved to ~/.ollama-code/learning/ |
| Dynamic Aliases | Runtime alias creation without code modifications |
How it works:
Three new comprehensive development tools have been added:
| Tool | Aliases | Description |
|---|---|---|
python_dev |
py, python, pip, pytest |
Python development (run, test, lint, venv, pip) |
nodejs_dev |
node, npm, yarn, pnpm, bun |
Node.js development with auto-detected package manager |
golang_dev |
go, golang |
Go development (run, build, test, mod) |
php_dev |
php, composer, phpunit, artisan |
PHP development with Composer and Laravel support |
The model now receives detailed environment information at session start, including:
New comprehensive documentation:
Models can now use short tool names:
| Alias | Tool Name |
|---|---|
run, shell, exec, cmd |
run_shell_command |
read |
read_file |
write, create |
write_file |
grep, search, find |
grep_search |
ls, list, dir |
list_directory |
Session ID is now shown in the header for easier debugging and log correlation.
Added startup warning if terminal encoding is not UTF-8.
// Progress bar for model downloads
<ProgressBar
progress={45}
label="Downloading model"
speed="5.2 MB/s"
eta="2m 30s"
/>
// Thinking indicator for reasoning models
<ThinkingIndicator
message="Analyzing code..."
elapsedTime={45}
showContent
/>
// Token usage display
<TokenUsageDisplay
totalTokens={1500}
promptTokens={500}
completionTokens={1000}
tokensPerSecond={45}
/>
> Execute SELECT * FROM users LIMIT 10 in SQLite database data.db
> Save database backup to /backup/db.sql
> Show schema of users table
> Run nginx container on port 8080
> Show logs of my-app container
> Stop all containers
> Build Docker image from current directory
> Get value of key session:user:123
> Set cache:data with 1 hour expiry
> Publish message to notifications channel
> Show all keys with user: prefix
ollama-code/
├── packages/
│ ├── core/ # Core: Ollama client, tools, types
│ ├── cli/ # CLI interface based on Ink
│ ├── web-app/ # Web UI: Next.js application (NEW)
│ ├── webui/ # Web components for UI
│ └── sdk-typescript/ # SDK for programmatic use
├── scripts/ # Build and run scripts
├── integration-tests/ # Integration tests
└── docs/ # Documentation
| Guide | Description |
|---|---|
| CLI_GUIDE.md | Complete CLI usage guide |
| CORE_GUIDE.md | Core library developer guide |
| WEB_UI_GUIDE.md | Web UI complete usage guide |
| FEATURES.md | Feature reference |
| TOOLS.md | Tools reference |
| USAGE_GUIDE.md | Usage guide |
| EXAMPLES.md | Usage examples |
| OLLAMA_API.md | API documentation |
| Руководство | Описание |
|---|---|
| CLI_GUIDE.ru.md | Полное руководство по CLI |
| CORE_GUIDE.ru.md | Руководство разработчика Core |
| WEB_UI_GUIDE.ru.md | Полное руководство по Web UI |
| FEATURES.ru.md | Справочник функций |
| TOOLS.ru.md | Справочник инструментов |
| README.ru.md | README на русском |
| Document | Description |
|---|---|
| WEB_UI.md | Web UI technical docs |
| FEATURES.md | Complete feature reference |
| TOOLS.md | Detailed tools reference |
| USAGE_GUIDE.md | Usage guide |
| EXAMPLES.md | Usage examples |
| OLLAMA_API.md | API documentation |
| Document | Description |
|---|---|
| PROJECT_STRUCTURE.md | Project structure |
| ROADMAP.md | Development roadmap |
| CONTRIBUTING.md | Contribution guidelines |
| Document | Description |
|---|---|
| PLUGIN_SYSTEM.md | Plugin architecture and API |
| PLUGIN_MARKETPLACE.md | Plugin Marketplace usage guide |
| PLUGIN_SANDBOX.md | Plugin security and sandboxing |
| Document | Description |
|---|---|
| PROMPT_SYSTEM_V2.md | Model-size-optimized prompts (NEW) |
| PROMPT_SYSTEM.md | Legacy prompt system docs |
| Command | Description |
|---|---|
npm run build |
Build all packages |
npm run start |
Run CLI |
npm run dev |
Run in development mode |
npm run debug |
Run with debugger |
npm run test |
Run tests |
npm run lint |
Lint code |
npm run typecheck |
TypeScript type check |
Options:
-d, --debug Debug mode
-m, --model Specify model
-s, --sandbox Run in sandbox
-y, --yolo Auto-confirm all actions
--approval-mode Approval mode: plan, default, auto-edit, yolo
--experimental-lsp Enable experimental LSP support
--ollama-base-url Ollama server URL (default: http://localhost:11434)
--ollama-api-key API key for remote instances
| Variable | Description |
|---|---|
OLLAMA_BASE_URL |
Ollama server URL |
OLLAMA_API_KEY |
API key (optional) |
OLLAMA_MODEL |
Default model |
OLLAMA_KEEP_ALIVE |
Model memory retention time (default: 5m) |
DEBUG |
Enable debug mode (1 or true) |
OLLAMA_CODE_DEBUG_LOG_FILE |
Log to file |
The project includes ready-to-use VSCode debug configurations:
The project uses native Ollama APIs:
| Endpoint | Method | Description |
|---|---|---|
/api/tags |
GET | List local models |
/api/show |
POST | Model info |
/api/generate |
POST | Text generation |
/api/chat |
POST | Chat with model |
/api/embed |
POST | Embeddings |
/api/create |
POST | Create model |
/api/pull |
POST | Download model |
/api/ps |
GET | Running models |
/api/version |
GET | Ollama version |
Full API docs: OLLAMA_API.md
| Model | Purpose | Size |
|---|---|---|
llama3.2 |
General purpose | 3B |
qwen2.5-coder:7b |
Programming | 7B |
qwen2.5-coder:14b |
Programming | 14B |
qwen3-coder:30b |
Programming | 30B |
deepseek-r1:8b |
Reasoning (thinking) | 8B |
codellama |
Programming | 7B+ |
mistral |
General purpose | 7B |
nomic-embed-text |
Embeddings | 274M |
# Build core
npm run build --workspace=packages/core
# Build cli
npm run build --workspace=packages/cli
# All tests
npm run test
# Core package tests
npm run test --workspace=packages/core
# Integration tests
npm run test:integration:sandbox:none
packages/core/src/tools/BaseDeclarativeToolindex.tstool-names.tsApache License 2.0
Documentation created with GLM-5 from Z.AI
See CONTRIBUTING.md for contribution guidelines.