The only thing I can think of is that perhaps you are light on RAM, with recommendations of 32gb of ram required for running LLM efficiently. But do your own research because I don't have 100% confidence in that claim. It does sound like fun what you are doing though. I am curious how fast AI's run with 32gb of RAM vs 16 gb of RAM.
I could upgrade to 32GB, and I probably will at some point. What I have now is enough for most tasks. A few days ago I was testing some WAN2.2 video generation models, and I realized my machine simply could not handle the load very well. That led me to try quantized models, and they have been surprisingly great, even on my modest 16GB setup.
It mostly comes down to optimizing the tool to do some RAM gymnastics, sending data to the paging file, dynamically unloading unused models, and reallocating resources as needed. I am not entirely sure what is happening under the hood, but the terminal shows messages like I am sending this here and unloading this, or we have run out of memory so I am trying something else instead. Those messages at least give me an idea of what is going on. Still, I have not taken the time to look into the exact details, I'm just following tutorials and my machine handles the quantized models really great, giving me 10s outputs in like 5 minutes with very decent quality, that's great.
Edit: I just asked AI and it said 2 seconds to start a reply is quite good and extra RAM is unlikely to speed it up. What extra RAM allows is a more intelligent model.
Yes, for example, in the context of WAN2.2, I would be able to use the full models instead of the quantized models, the difference is outstanding, to be honest. The full model's outputs are incredible, and if you do image to video (I2V), it can even sharpen your image and produce a video with better quality than the source. It's amazing what these things can do nowadays.
You give it an identity, a soul.md file for who it is, a tools.md file for what tools it can use etc. You can even get ClawBot to create a whole team of different agents for you, where it can act as a meta agent who controls the sub agents. It will create all the soul.md tool.md etc files for you.
Awesome, thank you, it's what I've been looking for, I've been trying to train a LLM to output normative citations and the training takes a long time, mostly because I have to prepare it, I could set it up so it does everything for me. I'll look into it, it's pretty much all I can think of right now that I can use, I'm not sold for most of the things it's advertised for, but I should give it a try.
You can always wait, but I believe it's fun to test these tools and see what you're capable of building alone.