Open WebUI 👋
Open WebUI is an extensible, feature-rich, and user-friendly self-hosted AI platform designed to operate entirely offline. It supports various LLM runners like Ollama and OpenAI-compatible APIs, with built-in inference engine for RAG, making it a powerful AI deployment solution.
Passionate about open-source AI? Join our team →
Tip
Looking for an Enterprise Plan? – Speak with Our Sales Team Today!
Get enhanced capabilities, including custom theming and branding, Service Level Agreement (SLA) support, Long-Term Support (LTS) versions, and more!
For more information, be sure to check out our Open WebUI Documentation.
Key Features of Open WebUI ⭐
| Feature | Description |
|---|---|
| 🚀 Effortless Setup | Install seamlessly using Docker or Kubernetes (kubectl, kustomize or helm) for a hassle-free experience with support for both :ollama and :cuda tagged images. |
| 🤝 Ollama/OpenAI API Integration | Effortlessly integrate OpenAI-compatible APIs for versatile conversations alongside Ollama models. Customize the OpenAI API URL to link with LMStudio, GroqCloud, Mistral, OpenRouter, and more. |
| 🛡️ Granular Permissions and User Groups | By allowing administrators to create detailed user roles and permissions, we ensure a secure user environment. This granularity not only enhances security but also allows for customized user experiences, fostering a sense of ownership and responsibility amongst users. |
| 📱 Responsive Design | Enjoy a seamless experience across Desktop PC, Laptop, and Mobile devices. |
| 📱 Progressive Web App (PWA) for Mobile | Enjoy a native app-like experience on your mobile device with our PWA, providing offline access on localhost and a seamless user interface. |
| ✒️🔢 Full Markdown and LaTeX Support | Elevate your LLM experience with comprehensive Markdown and LaTeX capabilities for enriched interaction. |
| 🎤📹 Hands-Free Voice/Video Call | Experience seamless communication with integrated hands-free voice and video call features, allowing for a more dynamic and interactive chat environment. |
| 🛠️ Model Builder | Easily create Ollama models via the Web UI. Create and add custom characters/agents, customize chat elements, and import models effortlessly through Open WebUI Community integration. |
| 🐍 Native Python Function Calling Tool | Enhance your LLMs with built-in code editor support in the tools workspace. Bring Your Own Function (BYOF) by simply adding your pure Python functions, enabling seamless integration with LLMs. |
| 📚 Local RAG Integration | Dive into the future of chat interactions with groundbreaking Retrieval Augmented Generation (RAG) support. This feature seamlessly integrates document interactions into your chat experience. You can load documents directly into the chat or add files to your document library, effortlessly accessing them using the # command before a query. |
| 🔍 Web Search for RAG | Perform web searches using providers like SearXNG, Google PSE, Brave Search, serpstack, serper, Serply, DuckDuckGo, TavilySearch, SearchApi and Bing and inject the results directly into your chat experience. |
| 🌐 Web Browsing Capability | Seamlessly integrate websites into your chat experience using the # command followed by a URL. This feature allows you to incorporate web content directly into your conversations, enhancing the richness and depth of your interactions. |
| 🎨 Image Generation Integration | Seamlessly incorporate image generation capabilities using options such as AUTOMATIC1111 API or ComfyUI (local), and OpenAI's DALL-E (external), enriching your chat experience with dynamic visual content. |
| ⚙️ Many Models Conversations | Effortlessly engage with various models simultaneously, harnessing their unique strengths for optimal responses. Enhance your experience by leveraging a diverse set of models in parallel. |
| 🔐 Role-Based Access Control (RBAC) | Ensure secure access with restricted permissions; only authorized individuals can access your Ollama, and exclusive model creation/pulling rights are reserved for administrators. |
| 🌐🌍 Multilingual Support | Experience Open WebUI in your preferred language with our internationalization (i18n) support. Join us in expanding our supported languages! We're actively seeking contributors! |
| 🧩 Pipelines, Open WebUI Plugin Support | Seamlessly integrate custom logic and Python libraries into Open WebUI using Pipelines Plugin Framework. Launch your Pipelines instance, set the OpenAI URL to the Pipelines URL, and explore endless possibilities. Examples include Function Calling, User Rate Limiting to control access, Usage Monitoring with tools like Langfuse, Live Translation with LibreTranslate for multilingual support, Toxic Message Filtering and much more. |
| 🌟 Continuous Updates | We are committed to improving Open WebUI with regular updates, fixes, and new features. |
Want to learn more about Open WebUI's features? Check out our Open WebUI documentation for a comprehensive overview!
Sponsors 🙌
Emerald
| Logo | Description |
|---|---|
![]() |
n8n • Does your interface have a backend yet? Try n8n |
![]() |
Tailscale • Connect self-hosted AI to any device with Tailscale |
We are incredibly grateful for the generous support of our sponsors. Their contributions help us to maintain and improve our project, ensuring we can continue to deliver quality work to our community. Thank you!
How to Install 🚀
Installation via Python pip 🐍
Open WebUI can be installed using pip, the Python package installer. Before proceeding, ensure you're using Python 3.11 to avoid compatibility issues.
- Install Open WebUI: Open your terminal and run the following command to install Open WebUI:
pip install open-webui
- Running Open WebUI: After installation, you can start Open WebUI by executing:
open-webui serve
This will start the Open WebUI server, which you can access at http://localhost:8080
Quick Start with Docker 🐳
Note
Please note that for certain Docker environments, additional configurations might be needed. If you encounter any connection issues, our detailed guide on Open WebUI Documentation is ready to assist you.
Warning
When using Docker to install Open WebUI, make sure to include the
-v open-webui:/app/backend/datain your Docker command. This step is crucial as it ensures your database is properly mounted and prevents any loss of data.
Tip
If you wish to utilize Open WebUI with Ollama included or CUDA acceleration, we recommend utilizing our official images tagged with either
:cudaor:ollama. To enable CUDA, you must install the Nvidia CUDA container toolkit on your Linux/WSL system.
Installation with Default Configuration
| Command | Description |
|---|---|
docker run -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main |
If Ollama is on your computer |
docker run -d -p 3000:8080 -e OLLAMA_BASE_URL=https://example.com -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main |
If Ollama is on a Different Server |
docker run -d -p 3000:8080 --gpus all --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:cuda |
To run Open WebUI with Nvidia GPU support |
docker run -d -p 3000:8080 -e OPENAI_API_KEY=your_secret_key -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main |
If you're only using OpenAI API |
Installation for OpenAI API Usage Only
| Command | Description |
|---|---|
docker run -d -p 3000:8080 -e OPENAI_API_KEY=your_secret_key -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main |
If you're only using OpenAI API |
Installing Open WebUI with Bundled Ollama Support
This installation method uses a single container image that bundles Open WebUI with Ollama, allowing for a streamlined setup via a single command. Choose the appropriate command based on your hardware setup:
| Command | Description |
|---|---|
docker run -d -p 3000:8080 --gpus=all -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama |
With GPU Support |
docker run -d -p 3000:8080 -v ollama:/root/.ollama -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:ollama |
For CPU Only |
Both commands facilitate a built-in, hassle-free installation of both Open WebUI and Ollama, ensuring that you can get everything up and running swiftly.
After installation, you can access Open WebUI at http://localhost:3000. Enjoy! 😄
Other Installation Methods
We offer various installation alternatives, including non-Docker native installation methods, Docker Compose, Kustomize, and Helm. Visit our Open WebUI Documentation or join our Discord community for comprehensive guidance.
Look at the Local Development Guide for instructions on setting up a local development environment.
Troubleshooting
Encountering connection issues? Our Open WebUI Documentation has got you covered. For further assistance and to join our vibrant community, visit the Open WebUI Discord.
Open WebUI: Server Connection Error
If you're experiencing connection issues, it’s often due to the WebUI docker container not being able to reach the Ollama server at 127.0.0.1:11434 (host.docker.internal:11434) inside the container . Use the --network=host flag in your docker command to resolve this. Note that the port changes from 3000 to 8080, resulting in the link: http://localhost:8080.
Example Docker Command:
docker run -d --network=host -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://127.0.0.1:11434 --name open-webui --restart always ghcr.io/open-webui/open-webui:main
Keeping Your Docker Installation Up-to-Date
In case you want to update your local Docker installation to the latest version, you can do it with Watchtower:
docker run --rm --volume /var/run/docker.sock:/var/run/docker.sock containrrr/watchtower --run-once open-webui
In the last part of the command, replace open-webui with your container name if it is different.
Check our Updating Guide available in our Open WebUI Documentation.
Using the Dev Branch 🌙
Warning
The
:devbranch contains the latest unstable features and changes. Use it at your own risk as it may have bugs or incomplete features.
If you want to try out the latest bleeding-edge features and are okay with occasional instability, you can use the :dev tag like this:
docker run -d -p 3000:8080 -v open-webui:/app/backend/data --name open-webui --add-host=host.docker.internal:host-gateway --restart always ghcr.io/open-webui/open-webui:dev
Offline Mode
If you are running Open WebUI in an offline environment, you can set the HF_HUB_OFFLINE environment variable to 1 to prevent attempts to download models from the internet.
export HF_HUB_OFFLINE=1
What's Next? 🌟
Discover upcoming features on our roadmap in the Open WebUI Documentation.
License 📜
This project is licensed under the Open WebUI License, a revised BSD-3-Clause license. You receive all the same rights as the classic BSD-3 license: you can use, modify, and distribute the software, including in proprietary and commercial products, with minimal restrictions. The only additional requirement is to preserve the "Open WebUI" branding, as detailed in the LICENSE file. For full terms, see the LICENSE document. 📄
Support 💬
If you have any questions, suggestions, or need assistance, please open an issue or join our Open WebUI Discord community to connect with us! 🤝
Star History
Created by Timothy Jaeryang Baek - Let's make Open WebUI even more amazing together! 💪
Ollama
Get up and running with large language models.
macOS
Windows
Linux
curl -fsSL https://ollama.com/install.sh | sh
Docker
The official Ollama Docker image ollama/ollama is available on Docker Hub.
Libraries
| Library | Description |
|---|---|
| ollama-python | Python library for Ollama |
| ollama-js | JavaScript library for Ollama |
Community
| Platform | Link |
|---|---|
| Discord | Discord community |
| Reddit community |
Quickstart
To run and chat with Gemma 3:
ollama run gemma3
Model library
Ollama supports a list of models available on ollama.com/library
Here are some example models that can be downloaded:
| Model | Parameters | Size | Download |
|---|---|---|---|
| Gemma 3 | 1B | 815MB | ollama run gemma3:1b |
| Gemma 3 | 4B | 3.3GB | ollama run gemma3 |
| Gemma 3 | 12B | 8.1GB | ollama run gemma3:12b |
| Gemma 3 | 27B | 17GB | ollama run gemma3:27b |
| QwQ | 32B | 20GB | ollama run qwq |
| DeepSeek-R1 | 7B | 4.7GB | ollama run deepseek-r1 |
| DeepSeek-R1 | 671B | 404GB | ollama run deepseek-r1:671b |
| Llama 4 | 109B | 67GB | ollama run llama4:scout |
| Llama 4 | 400B | 245GB | ollama run llama4:maverick |
| Llama 3.3 | 70B | 43GB | ollama run llama3.3 |
| Llama 3.2 | 3B | 2.0GB | ollama run llama3.2 |
| Llama 3.2 | 1B | 1.3GB | ollama run llama3.2:1b |
| Llama 3.2 Vision | 11B | 7.9GB | ollama run llama3.2-vision |
| Llama 3.2 Vision | 90B | 55GB | ollama run llama3.2-vision:90b |
| Llama 3.1 | 8B | 4.7GB | ollama run llama3.1 |
| Llama 3.1 | 405B | 231GB | ollama run llama3.1:405b |
| Phi 4 | 14B | 9.1GB | ollama run phi4 |
| Phi 4 Mini | 3.8B | 2.5GB | ollama run phi4-mini |
| Mistral | 7B | 4.1GB | ollama run mistral |
| Moondream 2 | 1.4B | 829MB | ollama run moondream |
| Neural Chat | 7B | 4.1GB | ollama run neural-chat |
| Starling | 7B | 4.1GB | ollama run starling-lm |
| Code Llama | 7B | 3.8GB | ollama run codellama |
| Llama 2 Uncensored | 7B | 3.8GB | ollama run llama2-uncensored |
| LLaVA | 7B | 4.5GB | ollama run llava |
| Granite-3.3 | 8B | 4.9GB | ollama run granite3.3 |
Note
You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.
Customize a model
Import from GGUF
Ollama supports importing GGUF models in the Modelfile:
- Create a file named
Modelfile, with aFROMinstruction with the local filepath to the model you want to import.
FROM ./vicuna-33b.Q4_0.gguf
- Create the model in Ollama
ollama create example -f Modelfile
- Run the model
ollama run example
Import from Safetensors
See the guide on importing models for more information.
Customize a prompt
Models from the Ollama library can be customized with a prompt. For example, to customize the llama3.2 model:
ollama pull llama3.2
Create a Modelfile:
FROM llama3.2
# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# set the system message
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""
Next, create and run the model:
ollama create mario -f ./Modelfile
ollama run mario
>>> hi
Hello! It's your friend Mario.
For more information on working with a Modelfile, see the Modelfile documentation.
CLI Reference
Create a model
ollama create is used to create a model from a Modelfile.
ollama create mymodel -f ./Modelfile
Pull a model
ollama pull llama3.2
This command can also be used to update a local model. Only the diff will be pulled.
Remove a model
ollama rm llama3.2
Copy a model
ollama cp llama3.2 my-model
Multiline input
For multiline input, you can wrap text with """:
>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.
Multimodal models
ollama run llava "What's in this image? /Users/jmorgan/Desktop/smile.png"
Output: The image features a yellow smiley face, which is likely the central focus of the picture.
Pass the prompt as an argument
ollama run llama3.2 "Summarize this file: $(cat README.md)"
Output: Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
Show model information
ollama show llama3.2
List models on your computer
ollama list
List which models are currently loaded
ollama ps
Stop a model which is currently running
ollama stop llama3.2
Start Ollama
ollama serve is used when you want to start ollama without running the desktop application.
REST API
Ollama has a REST API for running and managing models.
Generate a response
curl http://localhost:11434/api/generate -d '{
"model": "llama3.2",
"prompt":"Why is the sky blue?"
}'
Chat with a model
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2",
"messages": [
{ "role": "user", "content": "why is the sky blue?" }
]
}'
See the API documentation for all endpoints.
Community Integrations
Web & Desktop
| Integration | Description |
|---|---|
| Open WebUI | Open WebUI integration |
| SwiftChat (macOS with ReactNative) | SwiftChat integration |
| Enchanted (macOS native) | Enchanted integration |
| Hollama | Hollama integration |
| Lollms-Webui | Lollms-Webui integration |
| LibreChat | LibreChat integration |
| Bionic GPT | Bionic GPT integration |
| HTML UI | HTML UI integration |
| Saddle | Saddle integration |
| TagSpaces | TagSpaces integration |
| Chatbot UI | Chatbot UI integration |
| Chatbot UI v2 | Chatbot UI v2 integration |
| Typescript UI | Typescript UI integration |
| Minimalistic React UI for Ollama Models | Minimalistic React UI integration |
| Ollamac | Ollamac integration |
| big-AGI | big-AGI integration |
| Cheshire Cat assistant framework | Cheshire Cat integration |
| Amica | Amica integration |
| chatd | chatd integration |
| Ollama-SwiftUI | Ollama-SwiftUI integration |
| Dify.AI | Dify.AI integration |
| MindMac | MindMac integration |
| NextJS Web Interface for Ollama | NextJS Web Interface integration |
| Msty | Msty integration |
| Chatbox | Chatbox integration |
| WinForm Ollama Copilot | WinForm Ollama Copilot integration |
| NextChat | NextChat integration |
| Alpaca WebUI | Alpaca WebUI integration |
| OllamaGUI | OllamaGUI integration |
| OpenAOE | OpenAOE integration |
| Odin Runes | Odin Runes integration |
| LLM-X | LLM-X integration |
| AnythingLLM (Docker + MacOs/Windows/Linux native app) | AnythingLLM integration |
| Ollama Basic Chat: Uses HyperDiv Reactive UI | Ollama Basic Chat integration |
| Ollama-chats RPG | Ollama-chats RPG integration |
| IntelliBar | IntelliBar integration |
| Jirapt | Jirapt integration |
| ojira | ojira integration |
| QA-Pilot | QA-Pilot integration |
| ChatOllama | ChatOllama integration |
| CRAG Ollama Chat | CRAG Ollama Chat integration |
| RAGFlow | RAGFlow integration |
| StreamDeploy | StreamDeploy integration |
| chat | chat integration |
| Lobe Chat | Lobe Chat integration |
| Ollama RAG Chatbot | Ollama RAG Chatbot integration |
| BrainSoup | BrainSoup integration |
| macai | macai integration |
| RWKV-Runner | RWKV-Runner integration |
| Ollama Grid Search | Ollama Grid Search integration |
| Olpaka | Olpaka integration |
| Casibase | Casibase integration |
| OllamaSpring | OllamaSpring integration |
| LLocal.in | LLocal.in integration |
| Shinkai Desktop | Shinkai Desktop integration |
| AiLama | AiLama integration |
| Ollama with Google Mesop | Ollama with Google Mesop integration |
| R2R | R2R integration |
| Ollama-Kis | Ollama-Kis integration |
| OpenGPA | OpenGPA integration |
| Painting Droid | Painting Droid integration |
| Kerlig AI | Kerlig AI integration |
| AI Studio | AI Studio integration |
| Sidellama | Sidellama integration |
| LLMStack | LLMStack integration |
| BoltAI for Mac | BoltAI for Mac integration |
| Harbor | Harbor integration |
| PyGPT | PyGPT integration |
| Alpaca | Alpaca integration |
| AutoGPT | AutoGPT integration |
| Go-CREW | Go-CREW integration |
| PartCAD | PartCAD integration |
| Ollama4j Web UI | Ollama4j Web UI integration |
| PyOllaMx | PyOllaMx integration |
| Cline | Cline integration |
| Cherry Studio | Cherry Studio integration |
| ConfiChat | ConfiChat integration |
| Archyve | Archyve integration |
| crewAI with Mesop | crewAI with Mesop integration |
| Tkinter-based client | Tkinter-based client integration |
| LLMChat | LLMChat integration |
| Local Multimodal AI Chat | Local Multimodal AI Chat integration |
| ARGO | ARGO integration |
| OrionChat | OrionChat integration |
| G1 | G1 integration |
| Web management | Web management integration |
| Promptery | Promptery integration |
| Ollama App | Ollama App integration |
| chat-ollama | chat-ollama integration |
| SpaceLlama | SpaceLlama integration |
| YouLama | YouLama integration |
| DualMind | DualMind integration |
| ollamarama-matrix | ollamarama-matrix integration |
| ollama-chat-app | ollama-chat-app integration |
| Perfect Memory AI | Perfect Memory AI integration |
| Hexabot | Hexabot integration |
| Reddit Rate | Reddit Rate integration |
| OpenTalkGpt | OpenTalkGpt integration |
| VT | VT integration |
| Nosia | Nosia integration |
| Witsy | Witsy integration |
| Abbey | Abbey integration |
| Minima | Minima integration |
| aidful-ollama-model-delete | aidful-ollama-model-delete integration |
| Perplexica | Perplexica integration |
| Ollama Chat WebUI for Docker | Ollama Chat WebUI for Docker integration |
| AI Toolkit for Visual Studio Code | AI Toolkit for Visual Studio Code integration |
| MinimalNextOllamaChat | MinimalNextOllamaChat integration |
| Chipper | Chipper integration |
| ChibiChat | ChibiChat integration |
| LocalLLM | LocalLLM integration |
| Ollamazing | Ollamazing integration |
| OpenDeepResearcher-via-searxng | OpenDeepResearcher-via-searxng integration |
| AntSK | AntSK integration |
| MaxKB | MaxKB integration |
| yla | yla integration |
| LangBot | LangBot integration |
| 1Panel | 1Panel integration |
| AstrBot | AstrBot integration |
| Reins | Reins integration |
| Flufy | Flufy integration |
| Ellama | Ellama integration |
| screenpipe | screenpipe integration |
| Ollamb | Ollamb integration |
| Writeopia | Writeopia integration |
| AppFlowy | AppFlowy integration |
| Lumina | Lumina integration |
| Tiny Notepad | Tiny Notepad integration |
| macLlama (macOS native) | macLlama integration |
| GPTranslate | GPTranslate integration |
| ollama launcher | ollama launcher integration |
| ai-hub | ai-hub integration |
| Mayan EDMS | Mayan EDMS integration |
Cloud
| Cloud | Link |
|---|---|
| Google Cloud | Google Cloud integration |
| Fly.io | Fly.io integration |
| Koyeb | Koyeb integration |
Terminal
| Terminal | Link |
|---|---|
| oterm | oterm integration |
| Ellama Emacs client | Ellama Emacs client integration |
| Emacs client | Emacs client integration |
| neollama | neollama integration |
| gen.nvim | gen.nvim integration |
| ollama.nvim | ollama.nvim integration |
| ollero.nvim | ollero.nvim integration |
| ollama-chat.nvim | ollama-chat.nvim integration |
| ogpt.nvim | ogpt.nvim integration |
| gptel Emacs client | gptel Emacs client integration |
| Oatmeal | Oatmeal integration |
| cmdh | cmdh integration |
| ooo | ooo integration |
| shell-pilot | shell-pilot integration |
| tenere | tenere integration |
| llm-ollama | llm-ollama integration |
| typechat-cli | typechat-cli integration |
| ShellOracle | ShellOracle integration |
| tlm | tlm integration |
| podman-ollama | podman-ollama integration |
| gollama | gollama integration |
| ParLlama | ParLlama integration |
| Ollama eBook Summary | Ollama eBook Summary integration |
| Ollama Mixture of Experts (MOE) in 50 lines of code | Ollama Mixture of Experts integration |
| vim-intelligence-bridge | vim-intelligence-bridge integration |
| x-cmd ollama | x-cmd ollama integration |
| bb7 | bb7 integration |
| SwollamaCLI | SwollamaCLI integration |
| aichat | aichat integration |
| PowershAI | PowershAI integration |
| DeepShell | DeepShell integration |
| orbiton | orbiton integration |
| orca-cli | orca-cli integration |
| GGUF-to-Ollama | GGUF-to-Ollama integration |
| AWS-Strands-With-Ollama | AWS-Strands-With-Ollama integration |
| ollama-multirun | ollama-multirun integration |
| ollama-bash-toolshed | ollama-bash-toolshed integration |
Apple Vision Pro
| Integration | Link |
|---|---|
| SwiftChat | SwiftChat integration |
| Enchanted | Enchanted integration |
Database
| Integration | Link |
|---|---|
| pgai | pgai integration |
| MindsDB | MindsDB integration |
| chromem-go | chromem-go integration |
| Kangaroo | Kangaroo integration |
Package managers
| Package Manager | Link |
|---|---|
| Pacman | Pacman integration |
| Gentoo | Gentoo integration |
| Homebrew | Homebrew integration |
| Helm Chart | Helm Chart integration |
| Guix channel | Guix channel integration |
| Nix package | Nix package integration |
| Flox | Flox integration |
Libraries
| Library | Link |
|---|---|
| LangChain | LangChain integration |
| LangChain.js | LangChain.js integration |
| Firebase Genkit | Firebase Genkit integration |
| crewAI | crewAI integration |
| Yacana | Yacana integration |
| Spring AI | Spring AI integration |
| LangChainGo | LangChainGo integration |
| LangChain4j | LangChain4j integration |
| LangChainRust | LangChainRust integration |
| LangChain for .NET | LangChain for .NET integration |
| LLPhant | LLPhant integration |
| LlamaIndex | LlamaIndex integration |
| LlamaIndexTS | LlamaIndexTS integration |
| LiteLLM | LiteLLM integration |
| OllamaFarm for Go | OllamaFarm for Go integration |
| OllamaSharp for .NET | OllamaSharp for .NET integration |
| Ollama for Ruby | Ollama for Ruby integration |
| Ollama-rs for Rust | Ollama-rs for Rust integration |
| Ollama-hpp for C++ | Ollama-hpp for C++ integration |
| Ollama4j for Java | Ollama4j for Java integration |
| ModelFusion Typescript Library | ModelFusion Typescript Library integration |
| OllamaKit for Swift | OllamaKit for Swift integration |
| Ollama for Dart | Ollama for Dart integration |
| Ollama for Laravel | Ollama for Laravel integration |
| LangChainDart | LangChainDart integration |
| Semantic Kernel - Python | Semantic Kernel - Python integration |
| Haystack | Haystack integration |
| Elixir LangChain | Elixir LangChain integration |
| Ollama for R - rollama | Ollama for R - rollama integration |
| Ollama for R - ollama-r | Ollama for R - ollama-r integration |
| Ollama-ex for Elixir | Ollama-ex for Elixir integration |
| Ollama Connector for SAP ABAP | Ollama Connector for SAP ABAP integration |
| Testcontainers | Testcontainers integration |
| Portkey | Portkey integration |
| PromptingTools.jl | PromptingTools.jl integration |
| LlamaScript | LlamaScript integration |
| llm-axe | llm-axe integration |
| Gollm | Gollm integration |
| Gollama for Golang | Gollama for Golang integration |
| Ollamaclient for Golang | Ollamaclient for Golang integration |
| High-level function abstraction in Go | High-level function abstraction in Go integration |
| Ollama PHP | Ollama PHP integration |
| Agents-Flex for Java | Agents-Flex for Java integration |
| Parakeet | Parakeet integration |
| Haverscript | Haverscript integration |
| Ollama for Swift | Ollama for Swift integration |
| Swollama for Swift | Swollama for Swift integration |
| GoLamify | GoLamify integration |
| Ollama for Haskell | Ollama for Haskell integration |
| multi-llm-ts | multi-llm-ts integration |
| LlmTornado | LlmTornado integration |
| Ollama for Zig | Ollama for Zig integration |
| Abso | Abso integration |
| Nichey | Nichey integration |
| Ollama for D | Ollama for D integration |
| OllamaPlusPlus | OllamaPlusPlus integration |
Mobile
| Integration | Link |
|---|---|
| SwiftChat | SwiftChat integration |
| Enchanted | Enchanted integration |
| Maid | Maid integration |
| Ollama App | Ollama App integration |
| ConfiChat | ConfiChat integration |
| Ollama Android Chat | Ollama Android Chat integration |
| Reins | Reins integration |
Extensions & Plugins
| Integration | Link |
|---|---|
| Raycast extension | Raycast extension integration |
| Discollama | Discollama integration |
| Continue | Continue integration |
| Vibe | Vibe integration |
| Obsidian Ollama plugin | Obsidian Ollama plugin integration |
| Logseq Ollama plugin | Logseq Ollama plugin integration |
| NotesOllama | NotesOllama integration |
| Dagger Chatbot | Dagger Chatbot integration |
| Discord AI Bot | Discord AI Bot integration |
| Ollama Telegram Bot | Ollama Telegram Bot integration |
| Hass Ollama Conversation | Hass Ollama Conversation integration |
| Rivet plugin | Rivet plugin integration |
| Obsidian BMO Chatbot plugin | Obsidian BMO Chatbot plugin integration |
| Cliobot | Cliobot integration |
| Copilot for Obsidian plugin | Copilot for Obsidian plugin integration |
| Obsidian Local GPT plugin | Obsidian Local GPT plugin integration |
| Open Interpreter | Open Interpreter integration |
| Llama Coder | Llama Coder integration |
| Ollama Copilot | Ollama Copilot integration |
| twinny | twinny integration |
| Wingman-AI | Wingman-AI integration |
| Page Assist | Page Assist integration |
| Plasmoid Ollama Control | Plasmoid Ollama Control integration |
| AI Telegram Bot | AI Telegram Bot integration |
| AI ST Completion | AI ST Completion integration |
| Discord-Ollama Chat Bot | Discord-Ollama Chat Bot integration |
| ChatGPTBox: All in one browser extension | ChatGPTBox integration |
| Discord AI chat/moderation bot | Discord AI chat/moderation bot integration |
| Headless Ollama | Headless Ollama integration |
| Terraform AWS Ollama & Open WebUI | Terraform AWS Ollama & Open WebUI integration |
| node-red-contrib-ollama | node-red-contrib-ollama integration |
| Local AI Helper | Local AI Helper integration |
| vnc-lm | vnc-lm integration |
| LSP-AI | LSP-AI integration |
| QodeAssist | QodeAssist integration |
| Obsidian Quiz Generator plugin | Obsidian Quiz Generator plugin integration |
| AI Summmary Helper plugin | AI Summmary Helper plugin integration |
| TextCraft | TextCraft integration |
| Alfred Ollama | Alfred Ollama integration |
| TextLLaMA | TextLLaMA integration |
| Simple-Discord-AI | Simple-Discord-AI integration |
| LLM Telegram Bot | LLM Telegram Bot integration |
| mcp-llm | mcp-llm integration |
| SimpleOllamaUnity | SimpleOllamaUnity integration |
| UnityCodeLama | UnityCodeLama integration |
| NativeMind | NativeMind integration |
| GMAI - Gradle Managed AI | GMAI integration |
Supported backends
| Backend | Link |
|---|---|
| llama.cpp | llama.cpp integration |
Observability
| Tool | Link |
|---|---|
| Opik | Opik integration |
| Lunary | Lunary integration |
| OpenLIT | OpenLIT integration |
| HoneyHive | HoneyHive integration |
| Langfuse | Langfuse integration |
| MLflow Tracing | MLflow Tracing integration |
OpenedAI Speech
Notice: This software is mostly obsolete and will no longer be updated.
Some Alternatives:
- https://speaches.ai/
- https://github.com/remsky/Kokoro-FastAPI
- https://github.com/astramind-ai/Auralis
- https://lightning.ai/docs/litserve/home?code_sample=speech
An OpenAI API compatible text to speech server.
- Compatible with the OpenAI audio/speech API
- Serves the /v1/audio/speech endpoint
- Not affiliated with OpenAI in any way, does not require an OpenAI API Key
- A free, private, text-to-speech server with custom voice cloning
Full Compatibility:
| Feature | Description |
|---|---|
tts-1 |
alloy, echo, fable, onyx, nova, and shimmer (configurable) |
tts-1-hd |
alloy, echo, fable, onyx, nova, and shimmer (configurable, uses OpenAI samples by default) |
response_format |
mp3, opus, aac, flac, wav and pcm |
speed |
0.25-4.0 (and more) |
Details:
| Detail | Description |
|---|---|
Model tts-1 via piper tts |
Very fast, runs on CPU |
Model tts-1-hd via coqui-ai/TTS xtts_v2 voice cloning |
Fast, but requires around 4GB GPU VRAM |
| Custom cloned voices | Can be used for tts-1-hd |
| 🌐 Multilingual support | With XTTS voices, the language is automatically detected if not set |
| Custom fine-tuned XTTS model support | See: Custom fine-tuned XTTS model support |
| Configurable generation parameters | See: Generation parameters |
| Streamed output | While generating |
| Occasionally, certain words or symbols may sound incorrect | Can be fixed with regex via pre_process_map.yaml |
| Tested with python | 3.9-3.11, piper does not install on python 3.12 yet |
If you find a better voice match for tts-1 or tts-1-hd, please let me know so I can update the defaults.
Recent Changes
| Version | Date | Changes |
|---|---|---|
| 0.18.2 | 2024-08-16 | Fix docker building for amd64, refactor github actions again, free up more disk space |
| 0.18.1 | 2024-08-15 | Refactor github actions |
| 0.18.0 | 2024-08-15 | Allow folders of wav samples in xtts. Samples will be combined, allowing for mixed voices and collections of small samples. Still limited to 30 seconds total. Fix missing yaml requirement in -min image. Fix fr_FR-tom-medium and other 44khz piper voices (detect non-default sample rates). Minor updates |
| 0.17.2 | 2024-07-01 | Fix -min image (re: langdetect) |
| 0.17.1 | 2024-07-01 | Fix ROCm (add langdetect to requirements-rocm.txt). Fix zh-cn for xtts |
| 0.17.0 | 2024-07-01 | Automatic language detection |
| 0.16.0 | 2024-06-29 | Multi-client safe version. Audio generation is synchronized in a single process. The estimated 'realtime' factor of XTTS on a GPU is roughly 1/3, this means that multiple streams simultaneously, or speed over 2, may experience audio underrun (delays or pauses in playback). This makes multiple clients possible and safe, but in practice 2 or 3 simultaneous streams is the maximum without audio underrun |
| 0.15.1 | 2024-06-27 | Remove deepspeed from requirements.txt, it's too complex for typical users. A more detailed deepspeed install document will be required |
| 0.15.0 | 2024-06-26 | Switch to coqui-tts (updated fork), updated simpler dependencies, torch 2.3, etc. Resolve cuda threading issues |
| 0.14.1 | 2024-06-26 | Make deepspeed possible (--use-deepspeed), but not enabled in pre-built docker images (too large). Requires the cuda-toolkit installed, see the Dockerfile comment for details |
| 0.14.0 | 2024-06-26 | Added response_format: wav and pcm support. Output streaming (while generating) for tts-1 and tts-1-hd. Enhanced generation parameters for xtts models (temperature, top_p, etc.). Idle unload timer (optional) - doesn't work perfectly yet. Improved error handling |
| 0.13.0 | 2024-06-25 | Added Custom fine-tuned XTTS model support. Initial prebuilt arm64 image support (Apple M-series, Raspberry Pi - MPS is not supported in XTTS/torch). Initial attempt at AMD GPU (ROCm 5.7) support. Parler-tts support removed. Move the *.default.yaml to the root folder. Run the docker as a service by default (restart: unless-stopped). Added audio_reader.py for streaming text input and reading long texts |
| 0.12.3 | 2024-06-17 | Additional logging details for BadRequests (400) |
| 0.12.2 | 2024-06-16 | Fix :min image requirements (numpy<2?) |
| 0.12.0 | 2024-06-16 | Improved error handling and logging. Restore the original alloy tts-1-hd voice by default, use alloy-alt for the old voice |
| 0.11.0 | 2024-05-29 | 🌐 Multilingual support (16 languages) with XTTS. Remove high Unicode filtering from the default config/pre_process_map.yaml. Update Docker build & app startup. Fix: "Plan failed with a cudnnException". Remove piper cuda support |
| 0.10.1 | 2024-05-05 | Remove runtime: nvidia from docker-compose.yml, this assumes nvidia/cuda compatible runtime is available by default |
| 0.10.0 | 2024-04-27 | Pre-built & tested docker images, smaller docker images (8GB or 860MB). Better upgrades: reorganize config files under config/, voice models under voices/. Default listen host to 0.0.0.0 |
| 0.9.0 | 2024-04-23 | Fix bug with yaml and loading UTF-8. New sample text-to-speech application say.py. Smaller docker base image. Add beta parler-tts support (you can describe very basic features of the speaker voice) |
| 0.7.3 | 2024-03-20 | Allow different xtts versions per voice in voice_to_speaker.yaml, ex. xtts_v2.0.2. Quality: Fix xtts sample rate (24000 vs. 22050 for piper) and pops |
Installation instructions
Create a speech.env environment file
Copy the sample.env to speech.env (customize if needed)
cp sample.env speech.env
Defaults
TTS_HOME=voices
HF_HOME=voices
#PRELOAD_MODEL=xtts
#PRELOAD_MODEL=xtts_v2.0.2
#EXTRA_ARGS=--log-level DEBUG --unload-timer 300
#USE_ROCM=1
Option A: Manual installation
# install curl and ffmpeg
sudo apt install curl ffmpeg
# Create & activate a new virtual environment (optional but recommended)
python -m venv .venv
source .venv/bin/activate
# Install the Python requirements
# - use requirements-rocm.txt for AMD GPU (ROCm support)
# - use requirements-min.txt for piper only (CPU only)
pip install -U -r requirements.txt
# run the server
bash startup.sh
On first run, the voice models will be downloaded automatically. This might take a while depending on your network connection.
Option B: Docker Image (recommended)
Nvidia GPU (cuda)
docker compose up
AMD GPU (ROCm support)
docker compose -f docker-compose.rocm.yml up
ARM64 (Apple M-series, Raspberry Pi)
XTTS only has CPU support here and will be very slow, you can use the Nvidia image for XTTS with CPU (slow), or use the piper only image (recommended)
CPU only, No GPU (piper only)
For a minimal docker image with only piper support (<1GB vs. 8GB).
docker compose -f docker-compose.min.yml up
Server Options
usage: speech.py [-h] [--xtts_device XTTS_DEVICE] [--preload PRELOAD] [--unload-timer UNLOAD_TIMER] [--use-deepspeed] [--no-cache-speaker] [-P PORT] [-H HOST]
[-L {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
OpenedAI Speech API Server
options:
-h, --help show this help message and exit
--xtts_device XTTS_DEVICE
Set the device for the xtts model. The special value of 'none' will use piper for all models. (default: cuda)
--preload PRELOAD Preload a model (Ex. 'xtts' or 'xtts_v2.0.2'). By default it's loaded on first use. (default: None)
--unload-timer UNLOAD_TIMER
Idle unload timer for the XTTS model in seconds, Ex. 900 for 15 minutes (default: None)
--use-deepspeed Use deepspeed with xtts (this option is unsupported) (default: False)
--no-cache-speaker Don't use the speaker wav embeddings cache (default: False)
-P PORT, --port PORT Server tcp port (default: 8000)
-H HOST, --host HOST Host to listen on, Ex. 0.0.0.0 (default: 0.0.0.0)
-L {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --log-level {DEBUG,INFO,WARNING,ERROR,CRITICAL}
Set the log level (default: INFO)
Sample Usage
You can use it like this:
curl http://localhost:8000/v1/audio/speech -H "Content-Type: application/json" -d '{
"model": "tts-1",
"input": "The quick brown fox jumped over the lazy dog.",
"voice": "alloy",
"response_format": "mp3",
"speed": 1.0
}' > speech.mp3
Or just like this:
curl -s http://localhost:8000/v1/audio/speech -H "Content-Type: application/json" -d '{
"input": "The quick brown fox jumped over the lazy dog."}' > speech.mp3
Or like this example from the OpenAI Text to speech guide:
import openai
client = openai.OpenAI(
# This part is not needed if you set these environment variables before import openai
# export OPENAI_API_KEY=sk-11111111111
# export OPENAI_BASE_URL=http://localhost:8000/v1
api_key = "sk-111111111",
base_url = "http://localhost:8000/v1",
)
with client.audio.speech.with_streaming_response.create(
model="tts-1",
voice="alloy",
input="Today is a wonderful day to build something people love!"
) as response:
response.stream_to_file("speech.mp3")
Also see the say.py sample application for an example of how to use the openai-python API.
# play the audio, requires 'pip install playsound'
python say.py -t "The quick brown fox jumped over the lazy dog." -p
# save to a file in flac format
python say.py -t "The quick brown fox jumped over the lazy dog." -m tts-1-hd -v onyx -f flac -o fox.flac
You can also try the included audio_reader.py for listening to longer text and streamed input.
Example usage:
python audio_reader.py -s 2 < LICENSE # read the software license - fast
OpenAI API Documentation and Guide
| Documentation | Link |
|---|---|
| OpenAI Text to speech guide | OpenAI Text to speech guide |
| OpenAI API Reference | OpenAI API Reference |
Custom Voices Howto
Piper
- Select the piper voice and model from the piper samples
- Update the
config/voice_to_speaker.yamlwith a new section for the voice, for example:
...
tts-1:
ryan:
model: voices/en_US-ryan-high.onnx
speaker: # default speaker
- New models will be downloaded as needed, of you can download them in advance with
download_voices_tts-1.sh. For example:
bash download_voices_tts-1.sh en_US-ryan-high
Coqui XTTS v2
Coqui XTTS v2 voice cloning can work with as little as 6 seconds of clear audio. To create a custom voice clone, you must prepare a WAV file sample of the voice.
Guidelines for preparing good sample files for Coqui XTTS v2
| Guideline | Description |
|---|---|
| Mono (single channel) 22050 Hz WAV file | |
| 6-30 seconds long | Longer isn't always better (I've had some good results with as little as 4 seconds) |
| Low noise | No hiss or hum |
| No partial words, breathing, laughing, music or backgrounds sounds | |
| An even speaking pace with a variety of words is best | Like in interviews or audiobooks |
| Audio longer than 30 seconds will be silently truncated |
You can use FFmpeg to prepare your audio files, here are some examples:
# convert a multi-channel audio file to mono, set sample rate to 22050 hz, trim to 6 seconds, and output as WAV file.
ffmpeg -i input.mp3 -ac 1 -ar 22050 -t 6 -y me.wav
# use a simple noise filter to clean up audio, and select a start time start for sampling.
ffmpeg -i input.wav -af "highpass=f=200, lowpass=f=3000" -ac 1 -ar 22050 -ss 00:13:26.2 -t 6 -y me.wav
# A more complex noise reduction setup, including volume adjustment
ffmpeg -i input.mkv -af "highpass=f=200, lowpass=f=3000, volume=5, afftdn=nf=25" -ac 1 -ar 22050 -ss 00:13:26.2 -t 6 -y me.wav
Once your WAV file is prepared, save it in the /voices/ directory and update the config/voice_to_speaker.yaml file with the new file name.
For example:
...
tts-1-hd:
me:
model: xtts
speaker: voices/me.wav # this could be you
You can also use a sub folder for multiple audio samples to combine small samples or to mix different samples together.
For example:
...
tts-1-hd:
mixed:
model: xtts
speaker: voices/mixed
Where the voices/mixed/ folder contains multiple wav files. The total audio length is still limited to 30 seconds.
Multilingual
Multilingual cloning support was added in version 0.11.0 and is available only with the XTTS v2 model. To use multilingual voices with piper simply download a language specific voice.
Coqui XTTSv2 has support for multiple languages: English (en), Spanish (es), French (fr), German (de), Italian (it), Portuguese (pt), Polish (pl), Turkish (tr), Russian (ru), Dutch (nl), Czech (cs), Arabic (ar), Chinese (zh-cn), Hungarian (hu), Korean (ko), Japanese (ja), and Hindi (hi). When not set, an attempt will be made to automatically detect the language, falling back to English (en).
Unfortunately the OpenAI API does not support language, but you can create your own custom speaker voice and set the language for that.
- Create the WAV file for your speaker, as in Custom Voices Howto
- Add the voice to
config/voice_to_speaker.yamland include the correct Coquilanguagecode for the speaker. For example:
xunjiang:
model: xtts
speaker: voices/xunjiang.wav
language: zh-cn
- Don't remove high unicode characters in your
config/pre_process_map.yaml! If you have these lines, you will need to remove them. For example:
Remove:
- - '[\U0001F600-\U0001F64F\U0001F300-\U0001F5FF\U0001F680-\U0001F6FF\U0001F700-\U0001F77F\U0001F780-\U0001F7FF\U0001F800-\U0001F8FF\U0001F900-\U0001F9FF\U0001FA00-\U0001FA6F\U0001FA70-\U0001FAFF\U00002702-\U000027B0\U000024C2-\U0001F251]+'
- ''
These lines were added to the config/pre_process_map.yaml config file by default before version 0.11.0:
- Your new multi-lingual speaker voice is ready to use!
Custom Fine-Tuned Model Support
Adding a custom xtts model is simple. Here is an example of how to add a custom fine-tuned 'halo' XTTS model.
- Save the model folder under
voices/(all 4 files are required, including the vocab.json from the model)
openedai-speech$ ls voices/halo/
config.json vocab.json model.pth sample.wav
- Add the custom voice entry under the
tts-1-hdsection ofconfig/voice_to_speaker.yaml:
tts-1-hd:
...
halo:
model: halo # This name is required to be unique
speaker: voices/halo/sample.wav # voice sample is required
model_path: voices/halo
- The model will be loaded when you access the voice for the first time (
--preloaddoesn't work with custom models yet)
Generation Parameters
The generation of XTTSv2 voices can be fine tuned with the following options (defaults included below):
tts-1-hd:
alloy:
model: xtts
speaker: voices/alloy.wav
enable_text_splitting: True
length_penalty: 1.0
repetition_penalty: 10
speed: 1.0
temperature: 0.75
top_k: 50
top_p: 0.85


