Creating reasonably realistic fake data in SQL Server

MadPiet · Jan 10, 2026

I know there are add-ins that do it (Redgate makes one), but is there an easy way to create semi-random data in SQL Server (or Access?) In my case it's bakery stuff (ingredient costs). I have a sort of "index" for wheat, so I can use that for flours (just set the value for the date I have as 100% and scale from there). I can do something like cross joining IngredientID and some time element (week/date, etc). and then I can do something like create a randomish float... that's not too hard. My problem is that I'll create the random numbers and then do a running total over time, and the values just explode. Basically, I create a running total with a windowing function and then add 1 (for the 100%) and multiply that by the original price that I have. Is there a better way to make the data less perfectly linear? Because if I start with a base value (b) and a rate of change (m), that's a flat line. (yeah, y = mx + b )

(I'm trying to simulate price changes over time during covid/war in Ukraine, where some prices increased by about 50% (like flour/wheat) and others by even more. (But flour is the big one). It's all to show the effect of the cost of ingredients (mostly flour, since about 70% of the total ingredient weight is flour in a loaf of bread) over time (well, the shrinking margins etc). I could take (say) white flour, which increases from about $24 per 50lb bag to about $36 or $38...

Or should I just do all of it in Excel? (basically add the starting price, then add a few records then fill down...). It's not going to be perfect, and that's fine. It's a one-off. How would I create an element of randomness to the price changes? (I could do something silly like generating a random int or float and anything that's above or below some threshold gets overwritten as zero (just use IIF() or similar).

I know, sounds like a silly question, but I'm finding that creating realistic fake data is way harder than I thought!

dalski · Jan 10, 2026

Sounds like you're doing a pretty good job so far. Unsure how many rows are relevant (obvs more rows advocates Access). Excel would be easier for me. I imagine you could use dynamic SQL to populate each column & set some sort of mathematical floor & ceiling in conjunction with the random fn. Or put it in a cacl'd column of a view.

Quick Google in Excel:

For Decimals/ Floating Decimal Point Numbers - use the random fn to generate a random number 0 to 1; then scale it.
For Integers - Use RandBetween fn.

Code:

=RAND() * (150 - 0.001)

=RANDBETWEEN(1, 150)

EDIT - Sorry see you've figured this out in your OP. y = mx + b; cannot see x's definition. Not wanting to offend, I'd think you'd get more responses with a rewritten concise thread, focusing on bare essential math.

GPGeorge · Jan 10, 2026

At the San Francisco AUG meeting yesterday, Crystal shared a link to a site that could be useful here. However, I didn't capture it, it appears. Maybe she'll see this and share it here?

MadPiet · Jan 10, 2026

Thanks, George. I'll keep an eye out for it.

This is how I'm doing it now... (I'm assuming an increasing price over time for now...)
SELECT
IngredientID,
UnitPrice,
WeekOffset,
Multiplier,
SUM(Multiplier) OVER (PARTITION BY IngredientID ORDER BY WeekOffset) AS rt, /* I'm trying to create a price that changes sort of randomly over time */
UnitPrice * (1 + SUM(Multiplier) OVER (PARTITION BY IngredientID ORDER BY WeekOffset)) AS NewWeeklyPrice
FROM
(
SELECT
IngredientID,
UnitPrice,
t.N AS WeekOffset,
--SomeRandomFloat = RAND(CHECKSUM(NEWID())) * @Range + @StartValue
Multiplier = RAND(CHECKSUM(NEWID())) * 100 / 10000
FROM Ingredient i
CROSS JOIN Itchy.dbo.Tally t
WHERE t.N <= 13
) q

The tally table is just a table of numbers from 1 to some number. (I stole the code from Jeff Moden... on SQLServerCentral.) I guess I could do something like setting the range of the random number higher and then if it's above a threshold, reset to zero (then the price could remain flat for random periods).

tvanstiphout · Jan 10, 2026

The link George referred to is https://www.mockaroo.com/

I am working on an Anonymizer that can generate plausible-looking random data. Send me a PM and maybe I can share it with you.

Gasman · Jan 10, 2026

If you want to anonymize data, an old member supplied this a good while,back.

Isaac · Jan 10, 2026

MadPiet said:
I know there are add-ins that do it (Redgate makes one), but is there an easy way to create semi-random data in SQL Server (or Access?) In my case it's bakery stuff (ingredient costs). I have a sort of "index" for wheat, so I can use that for flours (just set the value for the date I have as 100% and scale from there). I can do something like cross joining IngredientID and some time element (week/date, etc). and then I can do something like create a randomish float... that's not too hard. My problem is that I'll create the random numbers and then do a running total over time, and the values just explode. Basically, I create a running total with a windowing function and then add 1 (for the 100%) and multiply that by the original price that I have. Is there a better way to make the data less perfectly linear? Because if I start with a base value (b) and a rate of change (m), that's a flat line. (yeah, y = mx + b )

(I'm trying to simulate price changes over time during covid/war in Ukraine, where some prices increased by about 50% (like flour/wheat) and others by even more. (But flour is the big one). It's all to show the effect of the cost of ingredients (mostly flour, since about 70% of the total ingredient weight is flour in a loaf of bread) over time (well, the shrinking margins etc). I could take (say) white flour, which increases from about $24 per 50lb bag to about $36 or $38...

Or should I just do all of it in Excel? (basically add the starting price, then add a few records then fill down...). It's not going to be perfect, and that's fine. It's a one-off. How would I create an element of randomness to the price changes? (I could do something silly like generating a random int or float and anything that's above or below some threshold gets overwritten as zero (just use IIF() or similar).

I know, sounds like a silly question, but I'm finding that creating realistic fake data is way harder than I thought!

I like the last thing you said. That is so true. I can tell you that every company I've worked for that has had a test environment, their biggest challenge is rigging it to really be realistic like production. And the more rigged it is the more you are just rigging the tests to pass rather than having an equal chance of failing. I don't have any specific tool for you to recommend although you could probably write an interesting script in VBA that does it and populates it in Excel and then paste append it to a table that's linked to SQL server

MadPiet · Jan 11, 2026

Isaac,
(Oh, sorry, Week Number is just the number of weeks from some sort of random date... It's just a kind of surrogate for dates because the sales were always on just Saturdays)
Since I don't have the real data (yet, if ever). I was just going to fake it. I was thinking of doing something like cross joining Ingredient and Week and then generating a really small decimal value for each (IngredientID, WeekNo). Then I can just create a running total in TSQL using a windowing function and just "sample" every N weeks or whatever to simulate price changes. I think getting it just good enough to be reasonably realistic is okay. Then in my costing query, I can do similar, but use the Week number or date to figure out what the most recent price paid was for each Ingredient.

Pretty sure I'd almost never do this in Access... unless there's something I absolutely can't get done with TSQL. (once I got my head around windowing functions, that was the end of Access for me..=( )

Isaac · Jan 12, 2026

yes, windowing functions are the bomb for sure. I'm not as familiar in practice with them as I'd like, only because I lack practical applications to practice them on, but I do use row_number over partition by with a sort pretty often to figure out the first item in a group -that's the main one I use regularly. the rest if I had to use them I'd probably have to check out the documentation and chatgpt-it a bit

But great for you that you are using them, they are SO powerful. Probably come at a pretty big speed cost, but powerful as heck

MadPiet · Jan 12, 2026

They're covered in both Kathi Kellenberger's book (from a long time ago now) and Itzik Ben Gan's book on TSQL. Itzik's books are always great.

MadPiet · Jan 12, 2026

Just wondering... Does anybody know of a source for price indices for various categories of foods? I have a sort of snapshot of unit prices, but I need an easy (and reasonably realistic) set of prices for the same ingredients across time. I need prices for 2020-2022 or so. (I have recipes with weights and package sizes (so I can get price per Kg or whatever). But I only have data for a single date and wanted to generate at least reasonably realistic price data for a wider span of time.

tvanstiphout · Jan 12, 2026

I asked Chatty "where can I download ingredient cost information for bakery goods?" and got a few promising leads.

sonic8 · Jan 12, 2026

MadPiet said:
Does anybody know of a source for price indices for various categories of foods?

Look on the website of the statistics bureau of your country of interest.

For Germany you find price index information here: https://www.destatis.de/DE/Themen/W...Tabellen/sonderauswertung-nahrungsmittel.html

MadPiet · Jan 13, 2026

Thanks! Some days I'm just blindingly stupid or just plain blind.

(wow, this website is umm... not my favorite) Found a site that has grain etc prices for the Ignited States, which should work fine ... the biggest thing I wanted to find was wheat. Just gotta figure out a good threshold for % of total weight to determine which ingredients to look up. (even if I can!) I can just scale the price to the day I do have and the rest is crazy simple.

Thanks again. Back to it.

DickyP · Jan 13, 2026

MadPiet said:
Ignited States

Don't you just love predicted text

Isaac · Jan 13, 2026

DickyP said:
Don't you just love predicted text

Ahhhhhh! hopefully not

Creating reasonably realistic fake data in SQL Server

MadPiet

Member

dalski

Active member

GPGeorge

George Hepworth

MadPiet

Member

tvanstiphout

Well-known member

Gasman

Enthusiastic Amateur

Isaac

Lifelong Learner

MadPiet

Member

Isaac

Lifelong Learner

MadPiet

Member

MadPiet

Member

tvanstiphout

Well-known member

sonic8

AWF VIP

MadPiet

Member

DickyP

Active member

Isaac

Lifelong Learner

Similar threads

Users who are viewing this thread