Creating reasonably realistic fake data in SQL Server (2 Viewers)

MadPiet

Member
Local time
Today, 14:44
Joined
Oct 5, 2010
Messages
56
I know there are add-ins that do it (Redgate makes one), but is there an easy way to create semi-random data in SQL Server (or Access?) In my case it's bakery stuff (ingredient costs). I have a sort of "index" for wheat, so I can use that for flours (just set the value for the date I have as 100% and scale from there). I can do something like cross joining IngredientID and some time element (week/date, etc). and then I can do something like create a randomish float... that's not too hard. My problem is that I'll create the random numbers and then do a running total over time, and the values just explode. Basically, I create a running total with a windowing function and then add 1 (for the 100%) and multiply that by the original price that I have. Is there a better way to make the data less perfectly linear? Because if I start with a base value (b) and a rate of change (m), that's a flat line. (yeah, y = mx + b )

(I'm trying to simulate price changes over time during covid/war in Ukraine, where some prices increased by about 50% (like flour/wheat) and others by even more. (But flour is the big one). It's all to show the effect of the cost of ingredients (mostly flour, since about 70% of the total ingredient weight is flour in a loaf of bread) over time (well, the shrinking margins etc). I could take (say) white flour, which increases from about $24 per 50lb bag to about $36 or $38...

Or should I just do all of it in Excel? (basically add the starting price, then add a few records then fill down...). It's not going to be perfect, and that's fine. It's a one-off. How would I create an element of randomness to the price changes? (I could do something silly like generating a random int or float and anything that's above or below some threshold gets overwritten as zero (just use IIF() or similar).

I know, sounds like a silly question, but I'm finding that creating realistic fake data is way harder than I thought!
 
Sounds like you're doing a pretty good job so far. Unsure how many rows are relevant (obvs more rows advocates Access). Excel would be easier for me. I imagine you could use dynamic SQL to populate each column & set some sort of mathematical floor & ceiling in conjunction with the random fn. Or put it in a cacl'd column of a view.

Quick Google in Excel:
  • For Decimals/ Floating Decimal Point Numbers - use the random fn to generate a random number 0 to 1; then scale it.
  • For Integers - Use RandBetween fn.
Code:
=RAND() * (150 - 0.001)

=RANDBETWEEN(1, 150)

EDIT - Sorry see you've figured this out in your OP. y = mx + b; cannot see x's definition. Not wanting to offend, I'd think you'd get more responses with a rewritten concise thread, focusing on bare essential math.
 
Last edited:
At the San Francisco AUG meeting yesterday, Crystal shared a link to a site that could be useful here. However, I didn't capture it, it appears. Maybe she'll see this and share it here?
 
Thanks, George. I'll keep an eye out for it.

This is how I'm doing it now... (I'm assuming an increasing price over time for now...)
SELECT
IngredientID,
UnitPrice,
WeekOffset,
Multiplier,
SUM(Multiplier) OVER (PARTITION BY IngredientID ORDER BY WeekOffset) AS rt, /* I'm trying to create a price that changes sort of randomly over time */
UnitPrice * (1 + SUM(Multiplier) OVER (PARTITION BY IngredientID ORDER BY WeekOffset)) AS NewWeeklyPrice
FROM
(
SELECT
IngredientID,
UnitPrice,
t.N AS WeekOffset,
--SomeRandomFloat = RAND(CHECKSUM(NEWID())) * @Range + @StartValue
Multiplier = RAND(CHECKSUM(NEWID())) * 100 / 10000
FROM Ingredient i
CROSS JOIN Itchy.dbo.Tally t
WHERE t.N <= 13
) q

The tally table is just a table of numbers from 1 to some number. (I stole the code from Jeff Moden... on SQLServerCentral.) I guess I could do something like setting the range of the random number higher and then if it's above a threshold, reset to zero (then the price could remain flat for random periods).
 
I know there are add-ins that do it (Redgate makes one), but is there an easy way to create semi-random data in SQL Server (or Access?) In my case it's bakery stuff (ingredient costs). I have a sort of "index" for wheat, so I can use that for flours (just set the value for the date I have as 100% and scale from there). I can do something like cross joining IngredientID and some time element (week/date, etc). and then I can do something like create a randomish float... that's not too hard. My problem is that I'll create the random numbers and then do a running total over time, and the values just explode. Basically, I create a running total with a windowing function and then add 1 (for the 100%) and multiply that by the original price that I have. Is there a better way to make the data less perfectly linear? Because if I start with a base value (b) and a rate of change (m), that's a flat line. (yeah, y = mx + b )

(I'm trying to simulate price changes over time during covid/war in Ukraine, where some prices increased by about 50% (like flour/wheat) and others by even more. (But flour is the big one). It's all to show the effect of the cost of ingredients (mostly flour, since about 70% of the total ingredient weight is flour in a loaf of bread) over time (well, the shrinking margins etc). I could take (say) white flour, which increases from about $24 per 50lb bag to about $36 or $38...

Or should I just do all of it in Excel? (basically add the starting price, then add a few records then fill down...). It's not going to be perfect, and that's fine. It's a one-off. How would I create an element of randomness to the price changes? (I could do something silly like generating a random int or float and anything that's above or below some threshold gets overwritten as zero (just use IIF() or similar).

I know, sounds like a silly question, but I'm finding that creating realistic fake data is way harder than I thought!
I like the last thing you said. That is so true. I can tell you that every company I've worked for that has had a test environment, their biggest challenge is rigging it to really be realistic like production. And the more rigged it is the more you are just rigging the tests to pass rather than having an equal chance of failing. I don't have any specific tool for you to recommend although you could probably write an interesting script in VBA that does it and populates it in Excel and then paste append it to a table that's linked to SQL server
 
Isaac,
(Oh, sorry, Week Number is just the number of weeks from some sort of random date... It's just a kind of surrogate for dates because the sales were always on just Saturdays)
Since I don't have the real data (yet, if ever). I was just going to fake it. I was thinking of doing something like cross joining Ingredient and Week and then generating a really small decimal value for each (IngredientID, WeekNo). Then I can just create a running total in TSQL using a windowing function and just "sample" every N weeks or whatever to simulate price changes. I think getting it just good enough to be reasonably realistic is okay. Then in my costing query, I can do similar, but use the Week number or date to figure out what the most recent price paid was for each Ingredient.

Pretty sure I'd almost never do this in Access... unless there's something I absolutely can't get done with TSQL. (once I got my head around windowing functions, that was the end of Access for me..=( )
 

Users who are viewing this thread

Back
Top Bottom