Euler Problem 102

Euler Problem 102 asks:

'Three distinct points are plotted at random on a Cartesian plane, for which
'-1000 LTE x, y LTE 1000, such that a triangle is formed.
'
'Consider the following two triangles:
'
'A(-340,495), B(-153,-910), C(835,-947)
'
'X(-175,41), Y(-421,-714), Z(574,-645)
'
'It can be verified that triangle ABC contains the origin, whereas
'triangle XYZ does not.
'
'Using triangles.txt (right click and 'Save Link/Target As...'), a 27K text
'file containing the co-ordinates of one thousand "random" triangles, find
'the number of triangles for which the interior contains the origin.
'
'NOTE: The first two examples in the file represent the triangles in the
'example given above.

In the above, LTE is "less than or equal", to outsmart the html bugs.

Any triangle that has all-positive x-coordinates, or all-negative x-coordinates, or similarly all-positive or all-negative y-coordinates, can not contain the origin. A pre-screen will throw all those out. When I first solved this problem, I looked for triangles with both a positive and a negative y-intercept, coupled with both a positive and negative x-intercept, and counted those triangles. That was a successful strategy. LINEST() will return x-intercepts it you swap known-xs and known-ys in the formula. The code looked like this:

YIntercept(3, 1) = Application.WorksheetFunction.Index( _
                            Application.WorksheetFunction.LinEst(Known_Ys, Known_Xs), 2)
 
XIntercept(3, 1) = Application.WorksheetFunction.Index( _
                            Application.WorksheetFunction.LinEst(Known_Xs, Known_Ys), 2)

It's right out of the Help for LINEST(), or at least the y-intercept part is. However, after I checked my answer in I read another strategy that was just so flat-out neat that I coded it up. It uses Heron's Law which calculates the area of a triangle based on a calculation of the semi-perimeter, or 1/2 the sum of the sides: The area of a triangle, given the lengths a,b,c of the sides, is A = sqrt s*(s-a)*(s-b)*(s-c), where s is the semiperimeter 0.5*(a+b+c). If we calculate the area of the suspect triangle, and then the area of the three sub-triangles with a common vertex of the origin, if the sum of the sub-triangle areas equals the area of the whole, then the origin must be within the suspect triangle. That code looked like this:

Option Explicit
Option Base 1
Sub Problem_102B()
   Dim Triangle(1000) As Variant
   Dim i       As Long
   Dim j       As Long
   Dim Most    As Long
   Dim Impossible As Boolean
   Dim IsTest  As Boolean
   Dim x(3)    As Long  'X Coordinates
   Dim Y(3)    As Long  'Y Coordinates
   Dim d(3)    As Double   'Distance between points
   Dim O(3)    As Double   'Distance from points to (0,0)
   Dim Area    As Double   'Triangle area
   Dim a(3)    As Double   'Sub-triangle area
   Dim T       As Single
   Dim Answer  As Long
   Const text  As String = "D:\Downloads\Euler\triangles.txt"
 
   T = Timer
   IsTest = False
   If IsTest Then
      Most = 2
   Else
      Most = 1000
   End If
 
   i = 1
   Open text For Input As #1   '1000 lines, comma delimited
   Do While Not EOF(1)
      Line Input #1, Triangle(i)
      Triangle(i) = Split(Triangle(i), ",")   'zero-based array
      i = i + 1
   Loop
   Close #1
 
   For i = 1 To Most
      Impossible = False
 
      x(1) = Triangle(i)(0)   'zero-based array
      Y(1) = Triangle(i)(1)
      x(2) = Triangle(i)(2)
      Y(2) = Triangle(i)(3)
      x(3) = Triangle(i)(4)
      Y(3) = Triangle(i)(5)
 
   'For Triangles all above or all below the X-axis
      If Y(1)> 0 And Y(2)> 0 And Y(3)> 0 Then Impossible = True
      If Y(1) <0 And Y(2) <0 And Y(3) <0 Then Impossible = True
   'For Triangles all to left of or all to right of the Y-axis
      If Not Impossible Then 'yet
         If x(1)> 0 And x(2)> 0 And x(3)> 0 Then Impossible = True
         If x(1) <0 And x(2) <0 And x(3) <0 Then Impossible = True
      End If
 
      If Impossible Then GoTo Next_i
 
      d(1) = TriSides(CDbl(x(1) - x(2)), CDbl(Y(1) - Y(2)))
      d(2) = TriSides(CDbl(x(2) - x(3)), CDbl(Y(2) - Y(3)))
      d(3) = TriSides(CDbl(x(1) - x(3)), CDbl(Y(1) - Y(3)))
      Area = Heron(d(1), d(2), d(3))
 
      O(1) = TriSides(CDbl(x(1)), CDbl(Y(1)))
      O(2) = TriSides(CDbl(x(2)), CDbl(Y(2)))
      O(3) = TriSides(CDbl(x(3)), CDbl(Y(3)))
      a(1) = Heron(d(1), O(1), O(2))
      a(2) = Heron(d(2), O(2), O(3))
      a(3) = Heron(d(3), O(1), O(3))
 
      If CSng(Area) = CSng(a(1) + a(2) + a(3)) Then
         Answer = Answer + 1
      End If
 
Next_i:
   Next i
 
   Debug.Print Answer; "  Time:"; Timer - T
End Sub

Those are the escape code for > and < . Can't figure out how to get it right inside the vb blocks. We can escape it outside, but we truncate if we use the angle brackets, and don't render the brackets right if we escape. It's been a conundrum for a while.

I used two functions, TriSides(), which returns the square root of the sum or difference of two squares, so I could do Pythagorean math, and Heron(), which implements Heron's law:

Function TriSides(a As Double, b As Double, Optional Sum)
   If IsMissing(Sum) Then Sum = True
   If Sum Then
      TriSides = Sqr(a ^ 2 + b ^ 2)
   Else
      TriSides = Sqr(Abs(a ^ 2 - b ^ 2)) ' order doesn't matter
   End If
End Function
 
Function Heron(a As Double, b As Double, C As Double) As Double
'Use Heron 's formula for the area of a triangle
'given the lengths a,b,c of the sides
'A = sqrt s*(s-a)*(s-b)*(s-c)
'where s is the semiperimeter 0.5*(a+b+c).
 
   Dim s       As Double
   s = 0.5 * (a + b + C)
   Heron = Sqr(s * (s - a) * (s - b) * (s - C))
End Function

I admire those who saw the application of Heron going in. One thing that I don't understand is why I had to convert doubles to singles at the end for the areas to total per Heron. I presume is has to do with the square root routine, but I invite comment. Code ran in one-tenth the time of 102A, at about .01 seconds.

...mrt

Recursion as a performance boost!

A common misconception some (many?) have about recursive solutions is that they are not fast. There are many reasons to use recursion, both in code and in data, mainly that solutions based on recursion are easy to implement, understand, and maintain. For an introduction see Recursion (http://www.tushar-mehta.com/publish_train/book_vba/07_recursion.htm).

What is interesting is that recursive solutions can also provide an improvement over a non-recursive solution. This happened recently when I finally decided to tackle Project Euler problem 14 (http://projecteuler.net/index.php?section=problems&id=14). In the problem, one had to find the number under 1,000,000 that had the longest Collatz chain.

I had postponed addressing the problem because until earlier today I could think of only a "brute force" approach1. But, then, I realized something. If one started from 1 and worked their way up to 999,999, one could build a database of answers for all the numbers encountered in earlier calculations. Then, whenever one encountered a number already in the database, a simple look up of the associated would short circuit the need for any further computations.

For the recursive implementation as well as the intuitive approach see Project Euler - Problem 14 (http://www.tushar-mehta.com/misc_tutorials/project_euler/euler014.html)

1 A approach that should work but that I consider as somewhat unethical is to assume that the solution is a high number. So, work backwards from 999,999 to 900,000. Check this solution with the Project Euler system. If it rejects your answer, try the range 899,999 to 800,000. If that isn't OK, try the next range down. Sooner or later you will find the correct answer. Just keep in mind that the PE website has a 20 minute wait before one can resubmit an answer.

Euler Problem 22

Euler Problem 22 asks:

'Using names.txt (right click and 'Save Link/Target As...'), a 46K text file containing over
'five-thousand first names, begin by sorting it into alphabetical order. Then working out the
'alphabetical value for each name, multiply this value by its alphabetical position in the list
'to obtain a name score.
 
'For example, when the list is sorted into alphabetical order, COLIN, which is worth
'3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. So, COLIN would obtain a score of
'938 * 53 = 49714.
 
'What is the total of all the name scores in the file?

The general task is to time the calculation.
The specific tasks are:

  1. Open the file
  2. Clean it up (it's one long line of data, with names wrapped in quotes, and comma-delimited, as in ...,"COLIN",...)
  3. Sort the names
  4. Determine each name's alphabetical value
  5. Multiply the position by the value
  6. Sum the scores

Here is my code:

Option Explicit
Sub Problem_022()
Dim NameArray As Variant
   Dim TEMP    As String
   Dim T       As Single
   Dim i       As Long
   Dim j       As Long
   Dim Score   As Long
   Dim Answer  As Long
   Const namestext As String = "D:\Downloads\Euler\names.txt"
 
   T = Timer 'start timing
   Open namestext For Input As #1 ' open the file
   Do While Not EOF(1)
      Line Input #1, TEMP
   Loop
   Close #1
   TEMP = VBA.Replace(TEMP, Chr(34), vbNullString) 'strip quotes -- chr(34)
 
   NameArray = Split(TEMP, ",")   'TEMP a comma delimited file, split on the comma
   'creating an array to sort
   'BubbleSort
   For i = LBound(NameArray) To UBound(NameArray) - 1
      For j = i To UBound(NameArray)
         If NameArray(i)&gt; NameArray(j) Then
            TEMP = NameArray(j)
            NameArray(j) = NameArray(i)
            NameArray(i) = TEMP
         End If
      Next j
   Next i
 
   For i = LBound(NameArray) To UBound(NameArray)
      Score = LexValue(CStr(NameArray(i))) 'computes the alphabetic value
      Answer = Answer + (Score * (i + 1))   ' NameArray is zero-based
      'multiplies and sums
      Score = 0
   Next i
 
   Debug.Print Answer; "  Time:"; Timer - T
 
End Sub

NameArray is zero-based. Euler's names aren't. The first name has position 1. We need to offset by 1.
The alphabetic value is just the sum of the ascii codes (offset by 64, so "A" gets 1) for each letter. This little function does that. It'll be used in other Euler problems.

Function LexValue(Word As String) As Long
   Dim i       As Long
   For i = 1 To Len(Word)
      LexValue = LexValue + (Asc(Mid(Word, i, 1)) - 64)
   Next i
End Function

This runs in 15 seconds
...mrt

Bubble Sorts

Two problems so far, Eulers 22 and 29, have required sorts, at least for my implementations. Euler 22 sorts over 5000 name strings, and Euler 29 sorts nearly 10,000 numerics. In both cases I used a bubble sort. Bubble sorts get their name from the image of the greater-valued item "bubbling up" to its place in line.

Bubble sorts have some advantages and disadvantages. The BIG disadvantage is that no matter how nearly-sorted the list is at start, you will still go through it (n-1)^2 times, n being the number of items. The advantages of bubble sorts are:

  1. They're easy to code. Typically just nine lines. And
  2. To make a descending sort, you reverse just one inequality

To do a bubble sort, you need three things:

  1. An indexed list or array with a sortable value or property
  2. Explicit or implicit knowledge of the count or quantity of items to sort. You may know the count (n) because you set it, or the computer knows it via Item.count or UBound(Item), for example
  3. A TEMP variable of the same type as being sorted, often also called SWAP

If you think of sorting a deck of cards, the outer loop sorts cards from 1 to 51, and the inner loop compares those values with cards from 2 to 52. With fast computers, bubble sorts are probably "fast enough." Both problems for me took less than 20 seconds.

This is my implementation of a bubble sort. You'll see it used in the next post, on Euler 22.

Sub BubbleItUp()   'an ascending sort
 
   Const n     As Long = 5
   Dim i       As Long
   Dim j       As Long
   Dim Char(n) As String * 1
   Dim TEMP    As String * 1
 
   For i = 1 To n
      Char(i) = CStr(n + 1 - i)
   Next i
 
   Debug.Print Char(1); Char(2); Char(3); Char(4); Char(5)
   For i = 1 To n - 1
      For j = i + 1 To n
         If Char(i)&gt; Char(j) Then   'flip the inequality for a descending sort
            TEMP = Char(j)
            Char(j) = Char(i)
            Char(i) = TEMP
         End If
      Next j
      Debug.Print Char(1); Char(2); Char(3); Char(4); Char(5)
   Next i
 
End Sub

...mrt

Excel Sample Data

I love finding big Excel tables, mostly because I hate manually creating sample data for testing. I've written about Contextures Sample Data. Then there's the IRS Tax Statistics page that has many downloadable Excel files. The IRS tables aren't always clean tables. Sometimes they are over-formatted, but they still work.

My latest find was a couple of tables used in the book Irrational Exuberance. Robert Shiller says:

# One can access an Excel file with the data set (used and described in the book) on stock prices, earnings, dividends and interest rates since 1871, updated.
# One can access an Excel file with the data set (used and described in the book) on home prices, building costs, population and interest rates since 1890, updated.

I haven't read the book, but it sounds interesting.

I Resolve

For 2009, I resolve to stop keeping detailed track of my time at work. It started as a small experiment to see where I was spending my time. There weren't any real surprises in the data. I guess one surprise was how easy it was to use the tool I built to keep track. So easy, in fact, that I just kept on recording my time. Until today, that is. Here's how I've spent the last four months.

I further resolve to make this tool a proper add-in. I'll be taking J-Walk's advice and storing the data in a csv file and some other suggestions I got about it. I'll add some simple reporting options for pulling the data into Excel. And menus, ribbons, what-have-you. Then sell it. I'm thinking free for personal use and $20 for commercial use. Do you think anyone would buy it?