Explore semantic caching to reduce your OpenAI/LLM API bill
This repository contains a Python application that demonstrates the use of semantic caching in searching for similar questions in a cache. It compares the performance of two different embedding methods: OpenAI and ONNX.
- Streamlit web application to test and evaluate semantic caching.
- CLI for testing exact, semantic, and no cache.
- ONNX and OpenAI embeddings.
- FAISS search for fast similarity search.
To install this project, you need to have Python 3.10 installed. Then, follow these steps:
-
Clone the repository
-
Enter the project directory
-
Install the project:
poetry install -
Set up your OpenAI API key in the
.envfile.
To run the CLI, use the following command:
poetry run cli run <cache_type>Replace <cache_type> with no_cache, semantic_cache.
To run the Streamlit web app, use the following command:
poetry run webappThe app will be available at localhost:8501.
pyproject.toml: TOML file that contains the project metadata and dependencies.scripts/: Folder containing the Streamlit app and CLI scripts.semantic_caching/: Folder containing the core caching logic.cache/: Folder to store cache files (FAISS indices and SQLite databases).
- langchain
- openai
- streamlit
- python-dotenv
- gptcache
- tiktoken
- rich
- torch
- typer
We welcome contributions to this project! Please feel free to submit issues or pull requests.
This project is licensed under the MIT License.