

What helped me in my project was a RAG implementation. So instead of trying to have all the information in the prompt, which becomes inconsistent if it becomes too big (especially when working on local llm), I can have quite a big knowledge base in a RAG. Then when being promoted it’s a two step process, first search the RAG for the best match in the knowledge base, then Feed that to the LLM to generate the answers. You could also store noteworthy stuff in the rag on the fly.
I haven’t tried with API yet. With local one it’s super quick though, it adds maybe a couple of seconds at most.
If you have a GPU and want to try it with local one there is a plugin for Godot and unity called “Nobody who” they have implemented the RAG approach out of the box with examples as well. So it wasn’t something I came up with.