You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+33Lines changed: 33 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,6 +24,7 @@ Stream text with GPT-4, transcribe and translate audio with Whisper, or create i
24
24
-[Extra Headers per Client](#extra-headers-per-client)
25
25
-[Verbose Logging](#verbose-logging)
26
26
-[Azure](#azure)
27
+
-[Ollama](#ollama)
27
28
-[Counting Tokens](#counting-tokens)
28
29
-[Models](#models)
29
30
-[Examples](#examples)
@@ -191,6 +192,38 @@ To use the [Azure OpenAI Service](https://learn.microsoft.com/en-us/azure/cognit
191
192
192
193
where `AZURE_OPENAI_URI` is e.g. `https://custom-domain.openai.azure.com/openai/deployments/gpt-35-turbo`
193
194
195
+
#### Ollama
196
+
197
+
Ollama allows you to run open-source LLMs, such as Llama 3, locally. It [offers chat compatibility](https://github.com/ollama/ollama/blob/main/docs/openai.md) with the OpenAI API.
198
+
199
+
You can download Ollama [here](https://ollama.com/). On macOS you can install and run Ollama like this:
200
+
201
+
```bash
202
+
brew install ollama
203
+
ollama serve
204
+
ollama pull llama3:latest # In new terminal tab.
205
+
```
206
+
207
+
Create a client using your Ollama server and the pulled model, and stream a conversation for free:
# => Hi! It's nice to meet you. Is there something I can help you with, or would you like to chat?
225
+
```
226
+
194
227
### Counting Tokens
195
228
196
229
OpenAI parses prompt text into [tokens](https://help.openai.com/en/articles/4936856-what-are-tokens-and-how-to-count-them), which are words or portions of words. (These tokens are unrelated to your API access_token.) Counting tokens can help you estimate your [costs](https://openai.com/pricing). It can also help you ensure your prompt text size is within the max-token limits of your model's context window, and choose an appropriate [`max_tokens`](https://platform.openai.com/docs/api-reference/chat/create#chat/create-max_tokens) completion parameter so your response will fit as well.
0 commit comments