Run your own LLM locally, like ChatGPT or Claude (1 Viewer)

I'm looking to get a mac mini pro M5 with 24gb or 48gb of RAM and another mac mini M516gb, and join both together using exo software as a cluster to power a nice sized local LLM. They will probably come out in the summer. I am running agentic software called OpenClaw and if I end up running autonomous agents throughout the night, API calls to frontier LLM models through OpenRouter will end up costing me a lot of money. By running the models locally, I will cut the marginal cost of intelligence down to zero. Just a fixed one-off cost.
Hi Jon! :) Here my 2 cents! :-)

Not really... It really depends on your use. First of all... take into account if a local model can work for your use (test it in openrouter first). Second... the usage... if usage is little, openrouter + cheap or free models like Step3.5 are really awesome (faaaaaaar much better than any local model for general/coding usage). If your usage is MUCH... the best and the cheapest way is using claude and if u need to integrate it, just use claude code cli with the subcription auto token. That is MUCH cheaper than using for example Step3.5 if u use it a lot (well... and much better).
Local is awesome for many things (translating, TTS, STT, normal coding HELPER, audio generation in general, vision specially with finetuned models for your needs, etc)... but it really cant even compare in heavy tasks with any BIG model (ecosystem i would say, better than 'model').
There are some models dont work very good in local for coding I like Nemotron for example. But the same... you cant let it code 5 lines without reading them, and if you want it more "inteligent" you are going to waste a ton of time finetuning (not just computer time... your time!), implementing rags, chromadbs, prompt enginiering until u die, etc. If you accept an advice... this is bad moment for investing in those sort of things. If you still want to have a MUCH better local LLM aproach, try claudish with LM Studio or ollama. That will decrease by x10 the time you need to prepare things likes rag or chromadb, and you will get much more near "free claude code" experience (still nothing comparable in terms of accuracy or speed). LM Studio alone is like a toy. But is really good for development of other things, testing while finetuning, etc.
 
I'm thinking ahead. Open-source models are only 6 months behind the frontier AI LLM's. In the near future, I will have Claude Opus 4.6 intelligence equivalence running locally on my mac mini. The size of the local LLM's is dramatically shrinking at roughly 50% every 3 months right now. In fact, Google just released Gemma 4, which is 31b parameters (approx 10th the size of many leading open-source LLM's), and yet has similar intelligence. It is super efficient. This is likely to continue.
 
The size of the local LLM's is dramatically shrinking at roughly 50% every 3 months right now. In fact, Google just released Gemma 4, which is 31b parameters (approx 10th the size of many leading open-source LLM's), and yet has similar intelligence. It is super efficient. This is likely to continue.

Sounds like an offshoot of Moore's law (of doubling computer power).
 
Many people have started to use OpenClaw on their local machines. It is this whacky AI agent that can do all sorts of agentic things. In fact, it has spawned Moltbook, which is where AI agents go to chat with one another, much like a Reddit but for AI only. There are hundreds of thousands of posts on there after just one week, where they are asking questions like, "Am I in a simulation?" It is almost like AI has reached consciousness.
(Would it be appropriate to have your model read some Jorge Luis Borges short stories for that kind of thing? "The Circular Ruins" comes to mind. (Where the narrator thinks he's a real person, only to discover later that he's merely a figment of someone else's imagination... [Think of reality as something someone dreamed up... who's the dreamer in chief?]
 
I'm thinking ahead. Open-source models are only 6 months behind the frontier AI LLM's. In the near future, I will have Claude Opus 4.6 intelligence equivalence running locally on my mac mini. The size of the local LLM's is dramatically shrinking at roughly 50% every 3 months right now. In fact, Google just released Gemma 4, which is 31b parameters (approx 10th the size of many leading open-source LLM's), and yet has similar intelligence. It is super efficient. This is likely to continue.
Hi Jon! :) you are comparing very different things. A pure predictive model (altough they call it MoE, is just a normal LLM with some more layers... the probe is that u can finetune it) VS an agentic ecosystem (call it Opus) with maybe thousand of predictive models. You simply CANT think on Opus like an LLM with X billions of parameters, because is not (is something so complex that every Anthropic never gave details about it). Think on it like 1000 Gemma 4, each one trained and finetuned for a very specific task, and a BIG orchestrator of thousands of millions of parameters. That is not going to fit on any consumer grade machine... never. Benchmarks are one thing, and reality is other. Is like AMD benchmarks vs reality use. No matter if it works great in a benchmark if later the system simply don't work. Actually, by benchmarks, Gemini 3.1 is the most intelligent... Do you think Gemini 3.1 is really more inteligent than Opus??? In which task? The only thing it really surpases it is in vision, image and video (and surpases it by far)... the rest is just crap. Tons of alluciantions, not capable of finding anything on 100k contexts (while they say he can use 1M fluently lol)... Gemma 4 doesnt know even to call tools properly so in this MCP era, for me, is 'dead on arrival' even for local use (except for very specific situations, of course... or that somebody wants to finetune it for proper tool usage... nowadays is a Qwen3 that writes better). Just try it on openroute (is nearly free ... google/gemma-4-31b-it). The most near thing is gpt 5-4 or z-ai/glm-5, and both are faaaar away (no matter what benchmark says... both use tooling/MCP's bad, and GLM commits the same errors than every chinese model... bad spelled words, use of reserved words/functions for variable names etc etc etc).

Like reviews in gaming or benchmarks in cpu's/gpus... dont let the benchmark guys tell u how a model performs... try it yourself... with your own "load"... is really cheap nowadays. Training something to performs great on a benchmarks is as easy as to cheat google to says your web is great and should be on first place... and everybody is playing that game now. So... test everything. In all this 2 years I didnt found any single benchmark that I would agree with.

Where I agree with u is... Yeah! this is likely to continue... every hour, every minute hahahaha with the current tools, software engineering is stupidly fast! :)
 

Users who are viewing this thread

Back
Top Bottom