Remove Duplicates From Arrays

I have a number of arrays that can’t contain duplicates. Not just within themselves, but between them as well. It would be easy enough to dump them into a big array, sort it, and remove the duplicates, but I need to put the non-duplicate numbers back into the original array. Eventually, I need to select a limited number of random entries from each array and the number selected needs to be somewhat equally distributed among the arrays.

I don’t know of any way to remove a single element from an array, so I would have to load the non-duplicate elements into another array. My first attempt at this failed because I wasn’t able to ReDim a dynamic array that was contained in another array. For instance,

Dim aMain(0 To 0) As Variant
Dim aOne() As Variant
 
aMain(0) = aOne
 
redim amain(0)(0 to 1)

The ReDim is a syntax error. I ended up attempting to add each element of each array to a Collection to identify the duplicate elements. If the Add method of the Collection object produced an error, I knew the element was a duplicate and I changed the element to an empty string so I could identify it later. Because I would be pulling elements out of each array later and I wanted the selected elements to be roughly equally distributed among the arrays, I needed to remove the duplicates from the larger arrays. The first operation I had to perform was to sort the arrays by size. Luckily the order of the secondary arrays within the primary array doesn’t matter.

I believe that all the secondary arrays need to be Variants for this to work. Normal dynamic arrays like Dim aMyArr() as Variant won’t work because of the way the temporary array is reassigned back to the original. The variables have to be declared like Dim vaMyArr as Variant.

Sub StartArrays()
   
    Dim vaOne As Variant
    Dim vaTwo As Variant
    Dim vaThree As Variant
    Dim vaMain As Variant
   
    'Set up some secondary arrays
   vaOne = Array(1, 2, 3, 4, 5)
    vaTwo = Array(5, 6, 7)
    vaThree = Array(1, 5, 8, 9)
   
    'load the primary array with the secondary ones
   vaMain = Array(vaOne, vaTwo, vaThree)
   
    'pass the primary array to remove the dupes
   'and display the results
   RemoveDuplicates vaMain
    ShowResults vaMain
   
End Sub
 
Sub RemoveDuplicates(ByRef vaMain As Variant)
   
    Dim i As Long, j As Long, k As Long
    Dim vItm As Variant
    Dim cDupes As Collection
    Dim aTemp() As Variant
    Dim vaSort As Variant
   
    'check for array arguments
   If Not IsArray(vaMain) Then Exit Sub
   
    For i = LBound(vaMain) To UBound(vaMain)
        If Not IsArray(vaMain(i)) Then Exit Sub
    Next i
   
    'Sort arrays by number of elements so as to remove duplicates from
   'the most populous arrays
   For i = LBound(vaMain) To UBound(vaMain) - 1
        For j = i + 1 To UBound(vaMain)
            If UBound(vaMain(i)) > UBound(vaMain(j)) Then
                vaSort = vaMain(i)
                vaMain(i) = vaMain(j)
                vaMain(j) = vaSort
            End If
        Next j
    Next i
   
    'blank out array elements that are duplicates.  Later
   'I'll test the length of the element to remove those
   'elements that were duplicates and are now empty strings
   Set cDupes = New Collection
   
    For i = LBound(vaMain) To UBound(vaMain)
        For j = LBound(vaMain(i)) To UBound(vaMain(i))
            On Error Resume Next
                cDupes.Add vaMain(i)(j), CStr(vaMain(i)(j))
               
                If Err.Number > 0 Then
                    vaMain(i)(j) = ""
                    Err.Clear
                End If
            On Error GoTo 0
        Next j
    Next i
   
    'put non-blank elements in a temp array and reassign
   'the temporary array back to the secondary array
   For i = LBound(vaMain) To UBound(vaMain)
   
        'reinitialize variables for each secondary array
       ReDim aTemp(0 To 0)
        k = 0
        For j = LBound(vaMain(i)) To UBound(vaMain(i))
       
            'skip elements that are empty strings
           If Len(vaMain(i)(j)) > 0 Then
           
                'assign non-empty strings to the temporary array and
               'increment the counter
               ReDim Preserve aTemp(0 To k)
                aTemp(k) = vaMain(i)(j)
                k = k + 1
            End If
        Next j
       
        'clear out the secondary array once the temp array is filled
       Erase vaMain(i)
       
        'assign the temporary array back to the recently-erased secondary array
       vaMain(i) = aTemp
    Next i
   
End Sub
 
Sub ShowResults(ByRef vaMain As Variant)
   
    Dim i As Long, j As Long
   
    For i = LBound(vaMain) To UBound(vaMain)
        For j = LBound(vaMain(i)) To UBound(vaMain(i))
            Debug.Print i, vaMain(i)(j)
        Next j
        Debug.Print "---------------"
    Next i
   
End Sub

The actual secondary arrays I’m working with are two dimensional, so my next task is revise this code to work with two dimensional arrays.

4 Comments

  1. Tushar Mehta says:

    Interesting exercise. FWIW, I wrote up an alternative approach to the task and posted the result to he programming NG.

    I would have posted it here but the code always gets all messed up. And, yes, if there is a way to fix that I don’t know it. :)

    The post’s ID is MPG.1e399e4a4bdd1b3598b2fa@msnews.microsoft.com

    The direct google.com archive is http://groups.google.com/group/microsoft.public.excel.programming/msg/b923181ccbdff1e1

  2. ross says:

    test some code:

    ”’Checks if SO, PO, YO, CO need to be corrected
    ”’ Function is passed and will ruten the WHOLE outcode
    Function O_Excprtions(sOutcode As String)
    Dim sFirstLetter As String
    sFirstLetter = UCase(Left(sOutcode, 1))
    ‘ if one of the do-dars
    If sFirstLetter = “S” Or sFirstLetter = “P” Or _
    sFirstLetter = “Y” Or sFirstLetter = “C” Then
    ‘replace with O
    O_Excprtions = WorksheetFunction.Replace(sOutcode, 2, 1, “O”)
    Else
    O_Excprtions = sOutcode
    End If
    End Function

  3. How to make a cell constant during sum.

    For example:

    Date Files Done Daily Total Sum

    1/20/2006 5 30
    1/21/2006 5
    1/22/2006 5
    1/23/2006 5
    1/24/2006 5
    1/25/2006 5

    I review daily certain number of files.
    I want to make the total sum constant.
    that whenever i review more files daily it should be added automatically to Total Sum.

    And its also should not effect the Total Sum, if i a value under, Files Done Daily.

    Thanks,

  4. J.O. says:

    This code does NOT remove duplicates!

Posting code or formulas in your comment? Use <code> tags!

  • <code lang="vb">Block of code goes here</code>
  • <code lang="vb" inline="true">Inline code goes here</code>
  • <code>Formula goes here</code>

Leave a Reply