Run your own LLM locally, like ChatGPT or Claude

Edgar_

Well-known member
Local time
Today, 13:22
Joined
Jul 8, 2023
Messages
780
In case any of you needs to run a Large Language Model (LLM) locally, there are a few ways to do this, I will post what I found to be the simplest way.

1. Download and install LM Studio, I use Developer mode, you can choose another option.
2. Download a model, more on this below.
3. Do this from the windows terminal, it doesn't matter where you open it from:
lms import C:/path/to/your/model.gguf
4. In LM Studio, there is a dropdown at the top "Select a model to load (Ctrl+L)", drop it down and select your model
5. Chat

You can figure out the rest of it.
 
By the way, you don't have to pay for any of this.

About models:
Depending on the models you choose, their capabilities. There are plenty of models for different purposes.

For example, I installed an unrestricted model that can talk about anything, and it can also write code. Expect sporadic sloppy code and hallucinations (depending on what you're doing), but it's nice that it runs locally.

Requirements:
I tested on a laptop with 16GB RAM, RTX 3070 with 8GB VRAM, i7 processor and enough space for the model.

How to choose a model:
There are full models, well beyond 15GB and quantized (reduced in quality and size) models, usually between 2GB and 15GB, obviously, the bigger the more precise. I'm not sure if LLM and video/image generation models look the same in terms of size and naming convention, but what I have learned so far is this:
1. The extensions are usually .safetensors and .gguf, where .gguf models are the quantized models.
2. About the quantized models, sometimes they have a naming convention, if you happen to see them named like Q2, Q3, Q4... Q8, I recommend Q4, it usually is a good trade off between output quality and size.
3. The size you choose has everything to do with your RAM and VRAM. If you have 16GB of RAM, you can run a 15GB model.
4. Most of the models are on huggingface, but there are other sources too
5. Some models don't follow a naming convention and they're named after something else
6. If you're curious, I tested a few "Forgotten Safeword" models, they're NSFW, but they can talk freely if you ask them to, it's not like it will be flirting with you first thing when you say Hi, they're polite unless you specify otherwise. I will be testing other models soon, but the proof of concept was a fun experiment.
 
When you're chatting with a model running locally, if you're on a moderately decent but humble computer like I am, you'll notice that the answers take like 2 seconds to start and they take a while to complete, depending on their length. It's nothing like the speed you're used to from Claude or ChatGPT. Looking at the CPU's temperature meters, it also goes well beyond 90°C while it's answering.

Makes you wonder how much computer power these big guys are using to provide such quick responses.
 
Makes you wonder how much computer power these big guys are using to provide such quick responses.

Enough that if they actually published the power usage, the "green Earth" groups would lynch them. One key to knowing this is in the recent discussions (that included linked articles) regarding the need to water-cool the circuitry, with enough water that they could seriously compromise water supplies for farmers. That is NOT a trivial amount of heat.
 
with enough water that they could seriously compromise water supplies for farmers
Yeah I heard they're moving their data centers to water abundant places too.

Mitigate that by a nano-fraction if you run the thing locally. Sadly, I know most people would simply prefer the convenience of just using it online without giving it any thought.
 
Yeah I heard they're moving their data centers to water abundant places too.

Mitigate that by a nano-fraction if you run the thing locally. Sadly, I know most people would simply prefer the convenience of just using it online without giving it any thought.
Is the advantage of running it online that you know you're getting the latest updates from whoever is in charge of it? Constant, presumably near real time adjustments and improvements to the thing as a whole - not just the knowledge base?

And what's the advantage of running it locally - some kind of armageddon-proofing?
 
Is the advantage of running it online that you know you're getting the latest updates from whoever is in charge of it? Constant, presumably near real time adjustments and improvements to the thing as a whole - not just the knowledge base?

And what's the advantage of running it locally - some kind of armageddon-proofing?
Running it locally gives me multiple advantages.

The main advantages:
1. No terms of service
2. No conditions
3. No feature removals
4. No paywall
5. No ads
6. Usage on demand
7. No limitations in the type of questions
8. No limits other than your own machine
9. No ad profiling
10. No spying
11. No data theft
12. Works without internet anywhere
13. No issues if the internet goes rogue or stops service
14. Can improve it if I wanted and I decide how and when
15. Can be reverse engineered and integrated in weird ways
16. It's predictable, no unexpected bs or surprises

A lot of these points overlap, but the core idea is control, and that matters a lot to people like me.

I could list the disadvantages, but if I ever need the speed or convenience of the online tools, it's not like I can't also use that. I can use the freely available services when it makes sense, and combine it with the local instance. In practice, that gives me the best of both worlds, and I don't really see a downside, other than "oh I'm running out of SSD", and that also has a solution.

Regarding how up to date it is, nothing stops you from looking for newer or different models when they're released and there's a lot of people releasing new versions all the time.
 
Last edited:
When you're chatting with a model running locally, if you're on a moderately decent but humble computer like I am, you'll notice that the answers take like 2 seconds to start and they take a while to complete, depending on their length. It's nothing like the speed you're used to from Claude or ChatGPT. Looking at the CPU's temperature meters, it also goes well beyond 90°C while it's answering.

Makes you wonder how much computer power these big guys are using to provide such quick responses.
I am curious as to the power of your PC. Do you have a Mac Mini or PC based setup? I am considering getting a Mac Mini when the new models come out, probably in mid-2026, likely with an M5 chip. I understand the amount of RAM you have is very important for these local LLM's.
 
Running it locally gives me multiple advantages.

The main advantages:
1. No terms of service
2. No conditions
3. No feature removals
4. No paywall
5. No ads
6. Usage on demand
7. No limitations in the type of questions
8. No limits other than your own machine
9. No ad profiling
10. No spying
11. No data theft
12. Works without internet anywhere
13. No issues if the internet goes rogue or stops service
14. Can improve it if I wanted and I decide how and when
15. Can be reverse engineered and integrated in weird ways
16. It's predictable, no unexpected bs or surprises

A lot of these points overlap, but the core idea is control, and that matters a lot to people like me.

I could list the disadvantages, but if I ever need the speed or convenience of the online tools, it's not like I can't also use that. I can use the freely available services when it makes sense, and combine it with the local instance. In practice, that gives me the best of both worlds, and I don't really see a downside, other than "oh I'm running out of SSD", and that also has a solution.

Regarding how up to date it is, nothing stops you from looking for newer or different models when they're released and there's a lot of people releasing new versions all the time.
so no guardrails or guidelines come baked into it? it just does literally whatever you tell it - like if you told it to build a bomb or narrate a steamy encounter for a romance novel, it would just DO it?

i could use that for some of my romance novel work
 
Many people have started to use OpenClaw on their local machines. It is this whacky AI agent that can do all sorts of agentic things. In fact, it has spawned Moltbook, which is where AI agents go to chat with one another, much like a Reddit but for AI only. There are hundreds of thousands of posts on there after just one week, where they are asking questions like, "Am I in a simulation?" It is almost like AI has reached consciousness.
 
I am curious as to the power of your PC. Do you have a Mac Mini or PC based setup? I am considering getting a Mac Mini when the new models come out, probably in mid-2026, likely with an M5 chip. I understand the amount of RAM you have is very important for these local LLM's.
It's a gaming laptop I bought 2 years ago, it came with Windows 11.

It's a TUF DASH F15, it has an i7 12th gen processor, nvidia Geforce RTX 3070 card with 8GB VRAM, 16GB RAM. I don't play games, sadly, but I do test AI tools all the time. I can generate video, images, and recently, I've been testing LLMs.

The laptop having nvidia technology is probably the most important aspect, if you can get something >= RTX 4090 you'll work very comfortably with it as it's compatible with most requirements for AI. I found this comparison between my rig and the M4, they perform similarly, so M5 should be much better, I suppose. Though I don't have a clue how much can be done with that, I've been testing only, nothing production ready yet.

Many people have started to use OpenClaw on their local machines. It is this whacky AI agent that can do all sorts of agentic things. In fact, it has spawned Moltbook, which is where AI agents go to chat with one another, much like a Reddit but for AI only. There are hundreds of thousands of posts on there after just one week, where they are asking questions like, "Am I in a simulation?" It is almost like AI has reached consciousness.
I've been following the topic, cybersecurity analysts say it's a nightmare because it exposes all your APIs and it opens the door to your computer to the internet and it seems to be by design, I can't confirm, but that's one of the next things to test in the near future.
 
so no guardrails or guidelines come baked into it? it just does literally whatever you tell it - like if you told it to build a bomb or narrate a steamy encounter for a romance novel, it would just DO it?

i could use that for some of my romance novel work
It can get very creative, I made it roleplay that it was an aroused piece of VBA code and started telling me "I can be your interpreted s*ut, daddy". It was quite the experience.
 
thanks for posting on this topic, I had wanted to learn more about how AI could work 'outside of' chatgpt.com, etc
 
what's an example of a model that's mid-range in size and can talk about anything?
 
Here's something you can play with, it's a Google Colab notebook that you can just run and it will output a chat UI that lets you talk about anything.

Instructions:
Go to Google Colab and sign in with a Google account
Extract the attached file, it's the notebook
Go to File > Upload notebook
1770840315848.png

Go to Runtime > Run All
1770840369472.png

Now just wait until it downloads all the requirements, then check cell 6 "Launch Chat Interface", it will output a link, just go to the generated link and it will open a chat application that will let you chat freely.

You just need the Google account. It runs online but it's gonna help you test what it's like. If you want to truly do it locally, follow the instructions in my first posts. In this sample, I'm using dolphin, an unrestricted model. But I've tested "Forgotten Safeword" models too. Just look for uncensored models, there are a lot. Here's a test:
1770841067157.png
 

Attachments

I meant for use within LM Studio as per your original instructions, when I am choosing a model to install
 
I recommend Forgotten Safeword, it's very expressive compared to the others I tested. The size is around 15gb. If you find smaller models, they may not be as expressive and creative, but they also generate good responses
 
Why it says 'likely too large' at 15 g, i have no idea. i have 134 gigs available on my c drive
 

Users who are viewing this thread

Back
Top Bottom