Run your own LLM locally, like ChatGPT or Claude

Edgar_ · Thursday at 2:02 AM

Isaac said:
Why it says 'likely too large' at 15 g, i have no idea. i have 134 gigs available on my c drive

I don't know, but I doubt it's because of the available size on your c drive, how much RAM and VRAM do you have? If it's struggling with a 15gb model, try to download a quantized model instead, a Q4 would be smaller and still produce nice results.

RAM is the biggest candidate for that message because for a model to work, it full size gets dumped in the RAM, if there is not enough available memory to withstand the model, it will complain, so try closing some apps, check your task manager and you'll fund the reason, probably. Again, quantized models exist precisely because of this.

Jon · Thursday at 2:16 PM

Edgar_ said:
It's a gaming laptop I bought 2 years ago, it came with Windows 11.

It's a TUF DASH F15, it has an i7 12th gen processor, nvidia Geforce RTX 3070 card with 8GB VRAM, 16GB RAM. I don't play games, sadly, but I do test AI tools all the time. I can generate video, images, and recently, I've been testing LLMs.

Sounds like you have a powerful setup. My 13 year old Mac is far too weak to run anything like that. I look forward to an upgrade. Currently, I feel I am missing out on the AI boom.

The only thing I can think of is that perhaps you are light on RAM, with recommendations of 32gb of ram required for running LLM efficiently. But do your own research because I don't have 100% confidence in that claim. It does sound like fun what you are doing though. I am curious how fast AI's run with 32gb of RAM vs 16 gb of RAM.

Edit: I just asked AI and it said 2 seconds to start a reply is quite good and extra RAM is unlikely to speed it up. What extra RAM allows is a more intelligent model.

Jon · Thursday at 2:26 PM

Edgar_ said:
I've been following the topic, cybersecurity analysts say it's a nightmare because it exposes all your APIs and it opens the door to your computer to the internet and it seems to be by design, I can't confirm, but that's one of the next things to test in the near future.

Some people have been buying Mac Minis just to run ClawBot. This way you can separate your own computer from the wild ClawBot. I've been watching some videos on ClawBot and it seems like a huge moment in AI. It is tantamount to an autonomous agent that runs 24/7 making its own decisions. Quite remarkable. You give it an identity, a soul.md file for who it is, a tools.md file for what tools it can use etc. You can even get ClawBot to create a whole team of different agents for you, where it can act as a meta agent who controls the sub agents. It will create all the soul.md tool.md etc files for you. Take everything I say with a pinch of salt because I am new to learning about this.

Edgar_ · Friday at 6:20 AM

Jon said:
The only thing I can think of is that perhaps you are light on RAM, with recommendations of 32gb of ram required for running LLM efficiently. But do your own research because I don't have 100% confidence in that claim. It does sound like fun what you are doing though. I am curious how fast AI's run with 32gb of RAM vs 16 gb of RAM.

I could upgrade to 32GB, and I probably will at some point. What I have now is enough for most tasks. A few days ago I was testing some WAN2.2 video generation models, and I realized my machine simply could not handle the load very well. That led me to try quantized models, and they have been surprisingly great, even on my modest 16GB setup.

It mostly comes down to optimizing the tool to do some RAM gymnastics, sending data to the paging file, dynamically unloading unused models, and reallocating resources as needed. I am not entirely sure what is happening under the hood, but the terminal shows messages like I am sending this here and unloading this, or we have run out of memory so I am trying something else instead. Those messages at least give me an idea of what is going on. Still, I have not taken the time to look into the exact details, I'm just following tutorials and my machine handles the quantized models really great, giving me 10s outputs in like 5 minutes with very decent quality, that's great.

Jon said:
Edit: I just asked AI and it said 2 seconds to start a reply is quite good and extra RAM is unlikely to speed it up. What extra RAM allows is a more intelligent model.

Yes, for example, in the context of WAN2.2, I would be able to use the full models instead of the quantized models, the difference is outstanding, to be honest. The full model's outputs are incredible, and if you do image to video (I2V), it can even sharpen your image and produce a video with better quality than the source. It's amazing what these things can do nowadays.

Jon said:
You give it an identity, a soul.md file for who it is, a tools.md file for what tools it can use etc. You can even get ClawBot to create a whole team of different agents for you, where it can act as a meta agent who controls the sub agents. It will create all the soul.md tool.md etc files for you.

Awesome, thank you, it's what I've been looking for, I've been trying to train a LLM to output normative citations and the training takes a long time, mostly because I have to prepare it, I could set it up so it does everything for me. I'll look into it, it's pretty much all I can think of right now that I can use, I'm not sold for most of the things it's advertised for, but I should give it a try.

You can always wait, but I believe it's fun to test these tools and see what you're capable of building alone.

Minty · Friday at 2:04 PM

I have downloaded it and used the Gemma engine. I have 64 Gb RAM (Lenovo P51 Laptop, i7-7820HQ CPU @ 2.90GHz, 2904 Mhz, 4 Core(s), 8 Logical Processor(s)) and it was only using about 5Gb of the available free memory (at least 30Gb Unused). Processor use was significant though.
It does have a NVIDIA Quadro M2200 as well as the inbuilt video card.

I pretty much had to shut down everything to get it to respond in a timely fashion, so I suspect processor power is more important than memory.

Isaac · Friday at 7:27 PM

well I loaded a mere 5 GB model into my LM Studio, and my computer froze to the extent where getting task manager to appear was a 20-minute process. Guess my nasty little 8 GB of RAM is a no-go for this stuff!

have to say though, I always have prided myself in low-quality laptops working perfectly fine for me. This would be about the first time in 20 years of doing that, that that approach fails me. but I would now be interested in a getting AT LEAST 16 GB of RAM in order to play with these models. Mostly for writing novels that go on Amazon KDP.
Hmm, will have to think about possibly buying one with 16 or 32 RAM. Would definitely try for the cheapest such thing I can find - likely refurbished win10 Dell or something like that.

Edgar_ · Friday at 9:58 PM

Yeah, RAM alone isn't enough. I tried this on older computers and they struggled. The CPU matters a lot if you are not using a compatible GPU. If everything runs on the CPU, it can max out and freeze apparently. 8GB of RAM will not be enough with these models.

For smoother local use, you really want 16 to 32GB of RAM and a modern GPU with at least 8GB of VRAM. If that is not possible, try smaller quantized models like Q3 or Q4. They are much easier to run, they're lighter.

Like I mentioned before, you can also try using Google Colab with a GPU. The free runtime lasts up to about 12 hours. After that, you usually have to wait before starting another long session on the same Google account, 12 hours at least. When you stop the runtime, the session resets and the data is cleared, so I guess it's private enough.

The difference between your setup and mine is probably that my RTX 3070 uses CUDA, so the GPU does the heavy AI math instead of the CPU. That makes a big difference. CPU math should still be possible, but I am not sure how well LM Studio handles that. In my image and video generation setups, I can switch to CPU if the GPU is not strong enough, but it is much slower.

Isaac · Friday at 10:06 PM

I guess one path forward would also be to rent a virtual machine and see which one works at a minimum.
I'll have to google what Google Colab GPU is

Thanks 4 the tips

Isaac · 2026-02-14T20:30:12+0000

well I'm using a VPS now and successfully loaded a model to a new chat window. I sent it a simple prompt to sort of introduce the conversation, requiring not much of anything in return, and so far it's been thinking for 4 minutes to respond!

Isaac · 2026-02-14T20:55:39+0000

After 20 minutes it's still spinning, and this is what it's come back with so far - gibberish

Edgar_ · 2026-02-15T04:15:41+0000

What are the specs of the VPS?
Have you tried the Colab notebook? The one I posted is uncensored and really quick.

Run your own LLM locally, like ChatGPT or Claude

Edgar_

Well-known member

Jon

Access World Site Owner

Jon

Access World Site Owner

Edgar_

Well-known member

Minty

AWF VIP

Isaac

Lifelong Learner

Edgar_

Well-known member

Isaac

Lifelong Learner

Isaac

Lifelong Learner

Isaac

Lifelong Learner

Edgar_

Well-known member

Similar threads

Users who are viewing this thread