Random sampling without replacement from an array

andreas_udby

Registered User.
Local time
Today, 15:18
Joined
May 7, 2001
Messages
76
I could certainly use some input on how to structure this.

I have a table that's 20 rows by 12 columns. Each cell has a particular value that I want to draw by random sample; each run of the sample will pull five random values.

To do this, I set up two columns, each with five cells that contain the =RANDBETWEEN() formula. The Down column cells contain =RANDBETWEEN(1,20) and the Across column cells contain =RANDBETWEEN(1,12).

Then I have five cells that each use the =INDEX() formula to look up the resulting cell. For instance, cell H6 contains =INDEX(Number_Matrix,D6,E6), where column D is the Down column and column E is the Across column.

But I realized that I'm sampling without eliminating replacements. Sure, it only happens once in a while (I ran the sample about 25 times before I came up with a duplicate value), but I'd like to eliminate it entirely.

I've found examples on the web where people use VBA to draw random samples without replacement, but only where the population data is one column only, not in a table or array. True, I could just change my table into a column that's 240 rows long, but that just seems so inelegant.

Does anyone have an idea about how to perform sampling without replacement on a table or array? I've been racking my brain on this for a while, and I just can't think about where to start. I can follow most the VBA code I see online, but the leap of logic from using a single column to using a table for the original data is leaving me behind.

Thanks,
Andreas
 
Yeah, I'd found that same article. Sadly, I just can't figure out how to apply it to my situation; the VBA is a little over my head, and I can't make out how to apply it to five different cells, one response in each, drawn from a 12 x 20 table. I'll have to sit down and try to dissect the VBA piece by piece to see if I can comprehend each segment.
 
Yeah, I'd found that same article. Sadly, I just can't figure out how to apply it to my situation; the VBA is a little over my head, and I can't make out how to apply it to five different cells, one response in each, drawn from a 12 x 20 table. I'll have to sit down and try to dissect the VBA piece by piece to see if I can comprehend each segment.

I've modiufied the code I linked to.

Code:
Function Sample(Source As Range, Optional Replacing As Boolean = False, _
    Optional Unique As Boolean = False)
     
     ' Function by Patrick Matthews
     
     ' Function looks at specified range (Source argument) and returns an array of randomly-
     ' sampled data from that range.  Take argument specifies the number of items in the sample
     
     ' Optional argument Replacing indicates whether sampling is done with replacement (i.e.,
     ' any item in the source may be selected more than once, indicated by True) or without
     ' replacement (any given item may only be selected once, indicated by False)
     
     ' Optional argument Unique indicates whether the samples are drawn from all items in the
     ' Source range (False), or just from the unique elements (True)
     
    Dim Dict As Object
    Dim Coll As Object
    Dim xItem As Long
    Dim cel As Range
    Dim Results()
    Dim Counter As Long
    Dim xKeys As Variant
    Dim Take As Long
    Dim CRows As Long
    Dim CCols As Long

    
    If IsObject(Application.Caller) Then
    
        With Application.Caller
            CRows = .Rows.Count
            CCols = .Columns.Count
        End With
    
            Take = CRows * CCols
    
    End If

     ' Reset VBA random number generator
    Randomize
    
    
    
     ' Number of items to draw cannot be larger than the population drawn from
    If Take < 1 Or Take > Source.Cells.Count Then
        Sample = CVErr(xlErrValue)
        Exit Function
    End If
     
     ' If sample is taken from just unique elements, use Dictionary object
    If Unique Then
         ' instantiate Dictionary
        Set Dict = CreateObject("Scripting.Dictionary")
         ' populate Dict with unique keys
        For Each cel In Source.Cells
            If Not Dict.Exists(cel.Value) Then Dict.Add cel.Value, cel.Value
        Next
         ' Retest to see if Take is smaller than our potentially reduced population of unique elements
        If Take > (UBound(Dict.Keys) + 1) Then
            Sample = CVErr(xlErrValue)
             ' if Take if OK, proceed with the draw
        Else
            For Counter = 1 To Take
                 ' randomly select keys from Dict and put them into dynamic array called Results
                If Counter = 1 Then ReDim Results(1 To 1) Else ReDim Preserve Results(1 To Counter)
                xKeys = Dict.Keys
                xItem = Int(Rnd * (UBound(xKeys) + 1))
                Results(Counter) = xKeys(xItem)
                 ' if we are not replacing, then remove the key we just used so it will not be picked again
                If Not Replacing Then Dict.Remove xKeys(xItem)
            Next
             ' set function equal to our array
            Sample = Results
        End If
        Set Dict = Nothing
         ' using all elements, so use collection object, which allows repeats
    Else
         ' instantiate collection
        Set Coll = New Collection
         ' populate collection
        For Each cel In Source.Cells
            Coll.Add cel
        Next
        For Counter = 1 To Take
             ' randomly select items from Coll and put them into dynamic array called Results
            If Counter = 1 Then ReDim Results(1 To 1) Else ReDim Preserve Results(1 To Counter)
            xItem = 1 + Int(Rnd * Coll.Count)
            Results(Counter) = Coll(xItem)
             ' if we are not replacing, then remove the item we just used so it will not be picked again
            If Not Replacing Then Coll.Remove xItem
        Next
         ' set function equal to our array
        



        Sample = Results
        Set Coll = Nothing
End If

     
End Function

For example if you want 5 random values all you need to do is select 5 cells in a ROW and in the first cell type "=Sample(A1:H27,false,false)". Similarly if you want 10 values then select 10 cells in a row.

Obviously replace "A1:H27" with the range where your data is and when you have typed the formula in press "Control Shift and Enter" so that excel knows it is an array formula.

If you want the results in a column you will need to type "=transpose(Sample(A1:H27,false,false))".

If you want to output the results using vba then do not use my modified code, use the original code, it would be something like:

Code:
 Sub blah()
 
 Dim rng As Range
 Dim lngArr As Variant
 Dim i As Long
 
 Set rng = ThisWorkbook.Worksheets("Sheet1").Range("A1:h27")
 
 lngArr = Sample(rng, 5, False, False)
 
 For i = LBound(lngArr) To UBound(lngArr)
 
    
    ThisWorkbook.Worksheets("Sheet1").Range("J" & i + 1).Value = lngArr(i)
    
Next i

 
 
 End Sub
 
Thanks for your help! I was getting a lot of #NAME? errors, and once I fixed those I would get no results other than 0's. Once I fixed that, I found that it was giving me the same set of results every time! So I had to do some tweaking and poking around on the internet, but I finally ended up with some code that does what I needed to do. Here's the end product:

Code:
Public Function SampleNR(rSource As Range) As Variant

      Dim vTemp As Variant
      Dim nArr() As Long
      Dim nSource As Long
      Dim nDest As Long
      Dim nRnd As Long
      Dim nTemp As Long
      Dim i As Long
      Dim j As Long
        
      Application.Volatile

      nSource = rSource.Count

      With Application.Caller
         ReDim vTemp(1 To .Rows.Count, 1 To .Columns.Count)
         nDest = .Count
      End With

      If nDest > nSource Then
         SampleNR = CVErr(xlErrNA)
      Else
         ReDim nArr(1 To nSource)
         For i = 1 To nSource
            nArr(i) = i
         Next i
         For i = 1 To nDest
            nRnd = Int(Rnd() * (nSource - i + 1)) + i
            nTemp = nArr(nRnd)
            nArr(nRnd) = nArr(i)
            nArr(i) = nTemp
         Next i
         nTemp = 1
         For i = 1 To UBound(vTemp, 1)
            For j = 1 To UBound(vTemp, 2)
               vTemp(i, j) = rSource(nArr(nTemp))
               nTemp = nTemp + 1
            Next j
         Next i
         SampleNR = vTemp
      End If
   End Function

I can tell I really need to get back into studying my Excel VBA. Thanks again for your help!

Andreas
 

Users who are viewing this thread

Back
Top Bottom