Run your own LLM locally, like ChatGPT or Claude (2 Viewers)

I'm looking to get a mac mini pro M5 with 24gb or 48gb of RAM and another mac mini M516gb, and join both together using exo software as a cluster to power a nice sized local LLM. They will probably come out in the summer. I am running agentic software called OpenClaw and if I end up running autonomous agents throughout the night, API calls to frontier LLM models through OpenRouter will end up costing me a lot of money. By running the models locally, I will cut the marginal cost of intelligence down to zero. Just a fixed one-off cost.
Hi Jon! :) Here my 2 cents! :-)

Not really... It really depends on your use. First of all... take into account if a local model can work for your use (test it in openrouter first). Second... the usage... if usage is little, openrouter + cheap or free models like Step3.5 are really awesome (faaaaaaar much better than any local model for general/coding usage). If your usage is MUCH... the best and the cheapest way is using claude and if u need to integrate it, just use claude code cli with the subcription auto token. That is MUCH cheaper than using for example Step3.5 if u use it a lot (well... and much better).
Local is awesome for many things (translating, TTS, STT, normal coding HELPER, audio generation in general, vision specially with finetuned models for your needs, etc)... but it really cant even compare in heavy tasks with any BIG model (ecosystem i would say, better than 'model').
There are some models dont work very good in local for coding I like Nemotron for example. But the same... you cant let it code 5 lines without reading them, and if you want it more "inteligent" you are going to waste a ton of time finetuning (not just computer time... your time!), implementing rags, chromadbs, prompt enginiering until u die, etc. If you accept an advice... this is bad moment for investing in those sort of things. If you still want to have a MUCH better local LLM aproach, try claudish with LM Studio or ollama. That will decrease by x10 the time you need to prepare things likes rag or chromadb, and you will get much more near "free claude code" experience (still nothing comparable in terms of accuracy or speed). LM Studio alone is like a toy. But is really good for development of other things, testing while finetuning, etc.
 
I'm thinking ahead. Open-source models are only 6 months behind the frontier AI LLM's. In the near future, I will have Claude Opus 4.6 intelligence equivalence running locally on my mac mini. The size of the local LLM's is dramatically shrinking at roughly 50% every 3 months right now. In fact, Google just released Gemma 4, which is 31b parameters (approx 10th the size of many leading open-source LLM's), and yet has similar intelligence. It is super efficient. This is likely to continue.
 
The size of the local LLM's is dramatically shrinking at roughly 50% every 3 months right now. In fact, Google just released Gemma 4, which is 31b parameters (approx 10th the size of many leading open-source LLM's), and yet has similar intelligence. It is super efficient. This is likely to continue.

Sounds like an offshoot of Moore's law (of doubling computer power).
 
Many people have started to use OpenClaw on their local machines. It is this whacky AI agent that can do all sorts of agentic things. In fact, it has spawned Moltbook, which is where AI agents go to chat with one another, much like a Reddit but for AI only. There are hundreds of thousands of posts on there after just one week, where they are asking questions like, "Am I in a simulation?" It is almost like AI has reached consciousness.
(Would it be appropriate to have your model read some Jorge Luis Borges short stories for that kind of thing? "The Circular Ruins" comes to mind. (Where the narrator thinks he's a real person, only to discover later that he's merely a figment of someone else's imagination... [Think of reality as something someone dreamed up... who's the dreamer in chief?]
 

Users who are viewing this thread

Back
Top Bottom