Carolina Godiva Track Club Summer Track Series 2018: Corrected (but not validated) Max Hamlyn Results

Max_W_Final
CORRECTED AT BOTTOM! However, still: Awards, I believe, go to the top person below and at or above age 40, so congratulations Cindy Peters-Freeman and Roxanne Springer!
Max_Men_Final
CORRECTED AT BOTTOM! However, still:  wards, I believe, go to the top person below and at or above age 40, so congratulations Leif Rasmussen and Tom Hoerger!

These data will be used in my segment of a Linear Regression basics talk I am doing Tuesday, September 25, 2018 with Kevin Feasel at Vaco Raleigh. Register for free HERE (may have to join the group first, also free).

A few data concerns that caused my prior output to be flawed:

  • Walking events caused distances to appear more than once (e.g., 1500m walk vs. 1500m run) in both the results data and the age-graded tables. I distinguish between these now by adding a marginal amount to the distance for walks.
mutate(dist_m = case_when(
grepl("WALK", toupper(Event)) ~ (1000 * `dist(km)`) + 0.11,
TRUE ~ 1000 * `dist(km)`))
  • Names occasionally appeared more than once for the same date and event, which is not correct

dupe_result_godiva.png

  • I previously only displayed adults.
  • Also, a note that I used the dplyr dense_rank function so ties caused the same ranking.
tie_top_n.png
dplyr dense_rank function used so we see at the top that the Max Hamlyn points (third column from the right) is equivalent for Rick Pack and Evan Nelsen given the first-place tie. Not indicated in the online results is that Evan ran faster by a fraction of a second.

[Late day 2018-09-24 update]

Kevin McCabe, Carolina Godiva fellow member, indicated that my calculations might be wrong. He calculated Leif Rasmussen as scoring “much fewer points as the sprints were usually dominated by the 20’s and 30’s age groups”.

Let’s look at how I calculated Leif’s points with a focus on a particular day and event.

  • R code that produces the track_res data frame is available at:

https://github.com/RickPack/TriPASS_DataSci_LinearReg_201809/blob/master/R/SumTrack2018.Rmd

  • The track_res data frame as a CSV is located here:

https://github.com/RickPack/TriPASS_DataSci_LinearReg_201809/blob/master/godiva_summer_track_res_2018.csv

First, overall points for Leif, showing only where the points exceed 0:

track_res %
dplyr::filter(grepl("LEIF", toupper(Name))) %>%
arrange(Date_Meet, ct_evt) %>%
select(Name, Sex, Age, Date_Meet, Event, dist_m, ct_evt, Date_Meet, Max_Hamlyn_pts) %>%
dplyr::filter(Max_Hamlyn_pts > 0) %>%
print(tbl_df(.), n = nrow(track_res))

# A tibble: 25 x 8
Name Sex Age Date_Meet Event dist_m ct_evt Max_Hamlyn_pts

1 Leif Rasmussen M 15 2018-05-30 Mile Run 1609 1000 2
2 Leif Rasmussen M 15 2018-05-30 200 Meter Run 200 2000 2
3 Leif Rasmussen M 15 2018-05-30 Mile Racewalk 1609 3000 5
4 Leif Rasmussen M 15 2018-06-06 1500 M Run 1500 1000 1
5 Leif Rasmussen M 15 2018-06-06 100 M 100 2000 2
6 Leif Rasmussen M 15 2018-06-06 1500 M Race Walk 1500 3000 5
7 Leif Rasmussen M 15 2018-06-06 400 M Run 400 4000 2
8 Leif Rasmussen M 15 2018-06-13 200 M Run 200 2000 2
9 Leif Rasmussen M 15 2018-06-13 Mile Racewalk 1609 3000 5
10 Leif Rasmussen M 15 2018-06-20 100 M 100 2000 4
11 Leif Rasmussen M 15 2018-06-20 1500 M Racewalk 1500 3000 5
12 Leif Rasmussen M 15 2018-06-20 400 M 400 4000 2
13 Leif Rasmussen M 15 2018-06-27 Mile Racewalk 1609 3000 5
14 Leif Rasmussen M 15 2018-07-04 1500 M Run 1500 1000 4
15 Leif Rasmussen M 15 2018-07-04 100 M Run 100 2000 3
16 Leif Rasmussen M 15 2018-07-11 Mile Run 1609 1000 5
17 Leif Rasmussen M 15 2018-07-11 200 Meter Run 200 2000 1
18 Leif Rasmussen M 15 2018-07-11 Mile Racewalk 1609 3000 5
19 Leif Rasmussen M 15 2018-07-18 1500 M Run 1500 1000 4
20 Leif Rasmussen M 15 2018-07-18 100 M Run 100 2000 2
21 Leif Rasmussen M 15 2018-07-18 1500 M Racewalk 1500 3000 5
22 Leif Rasmussen M 15 2018-07-25 1K Run NA 1000 4
23 Leif Rasmussen M 15 2018-08-01 1 Mile Run 1609 1000 4
24 Leif Rasmussen M 15 2018-08-01 1 Mile Racewalk 1609 3000 5
25 Leif Rasmussen M 15 2018-08-01 800 M Run 800 4000 3

I already see one problem. I am scoring points for an event based on a grouping that is NA (dist_m).

track_eval %
    select(Name, Sex, Age, Date_Meet, Event, dist_m, ct_evt, Date_Meet, Max_Hamlyn_pts) %>%
    arrange(Event, Date_Meet, Max_Hamlyn_pts) %>%
    dplyr::filter(Max_Hamlyn_pts > 0 & is.na(dist_m))

print(track_eval)

# A tibble: 166 x 15
   Name             Sex     Age Time  ct_evt Event   mins secs  track_time temp_dist dist_m Place Max_Hamlyn_pts Date_Meet  agegrp

 1 Jackson Steffens M        12 3:23    1000 1K Run     3 23           203 1K            NA     6              0 2018-07-25 Age %
    arrange(desc(n))

   Event  n   percent
1 5K Run 51 0.5604396
2 3K Run 30 0.3296703
3 1K Run 10 0.1098901

This is a problem. How bad is the problem?


track_res %>% distinct(dist_m, Event) %>% print(tbl.df(), n = 21)

# A tibble: 21 x 2
   Event            dist_m

 1 200 Meter Run       200
 2 800 M Run           800
 3 Mile Run           1609
 4 Mile Racewalk      1609
 5 5K Run               NA
 6 100 M               100
 7 400 M Run           400
 8 1500 M Run         1500
 9 1500 M Race Walk   1500
10 3K Run               NA
11 200 M Run           200
12 400 M               400
13 1500 M             1500
14 1500 M Racewalk    1500
15 3 K Run               3
16 100 M Run           100
17 200.2 M Run         200
18 1K Run               NA
19 1998 M Racewalk    1998
20 1 Mile Run         1609
21 1 Mile Racewalk    1609

This means the fix for walking events (see above where “Walking events caused distances” appears) was not implemented. Also, where K appears, I did not calculate dist_m.

I replaced the dist_m code so Walks received the same 0.11 addition as my age-graded calculator work and I worked with the various event appearances (“3 K Run”, “1K Run”) better.

dist_m0    = case_when(
               str_detect(toupper(Event), "K RUN") ~ as.numeric(str_extract(Event, "(\\d)+")) * 1000,
               str_detect(temp_dist, "Mile") & !is.na(as.numeric(str_extract(Event, "(\\d)+"))) ~
                   as.numeric(str_extract(Event, "(\\d)+")) * 1609,
               # No number found for Mile then assume 1 mile
               str_detect(Event, "Mile") ~ 1609,
               # to make Max Hamlyn special 200.2 m into 200m
               str_detect(Event, "200.2") ~ 200,
               TRUE ~ as.numeric(temp_dist)),
           dist_m    = case_when(
               grepl("WALK", toupper(Event)) ~ dist_m0 + 0.11,
               TRUE ~ dist_m0)) %>%

Let’s check.

print(track_res %>% distinct(Event, dist_m) %>% arrange(dist_m))
              Event  dist_m
1             100 M  100.00
2         100 M Run  100.00
3     200 Meter Run  200.00
4         200 M Run  200.00
5       200.2 M Run  200.00
6         400 M Run  400.00
7             400 M  400.00
8         800 M Run  800.00
9            1K Run 1000.00
10       1500 M Run 1500.00
11           1500 M 1500.00
12 1500 M Race Walk 1500.11
13  1500 M Racewalk 1500.11
14         Mile Run 1609.00
15       1 Mile Run 1609.00
16    Mile Racewalk 1609.11
17  1 Mile Racewalk 1609.11
18  1998 M Racewalk 1998.11
19           3K Run 3000.00
20          3 K Run 3000.00
21           5K Run 5000.00

Also, as Kevin prompted me to consider, dropping athletes missing an age also caused a loss of scoring athletes. So these data were kept but just not provided the age-graded calculating functions.


# A tibble: 10 x 16
   Name           Sex     Age Time  ct_evt Event           mins  secs  track_time temp_dist dist_m0 dist_m Place Max_Hamlyn_pts Date_Meet  agegrp

 1 Shauna Griffin F        NA 15:28   5000 3 K Run         15    28           928 3            3000  3000      1              5 2018-07-04 NA
 2 Alex Ramirez   M        NA 13.00   2000 100 M Run       0     13.00         13 100           100   100      2              4 2018-07-04 NA
 3 Alex Ramirez   M        NA 1:05    4000 400 M Run       1     05            65 400           400   400      2              4 2018-07-04 NA
 4 Shauna Griffin F        NA 7:07    1000 1500 M Run      7     07           427 1500         1500  1500      2              4 2018-07-04 NA
 5 Jim Clabuesch  M        NA 13:54   5000 3 K Run         13    54           834 3            3000  3000      2              4 2018-07-04 NA
 6 Shauna Griffin F        NA 12:48   3000 1500 M Racewalk 12    48           768 1500         1500  1500.     4              2 2018-07-04 NA
 7 Dana Lorelle   F        NA 6:25    1000 1500 M Run      6     25           385 1500         1500  1500      5              1 2018-07-18 NA
 8 Dana Lorelle   F        NA 1:21    4000 400 M Run       1     21            81 400           400   400      6              0 2018-07-18 NA
 9 Jim Clabuesch  M        NA 7:24    1000 1500 M Run      7     24           444 1500         1500  1500      9              0 2018-07-04 NA
10 Ed Davis       M        NA 3:55    4000 800 M Run       3     55           235 800           800   800     20              0 2018-05-30 NA   

I also saw a need to move the Max_Hamlyn_pts code out of the date-iterating function and make a few other rapidly identified edits.

As I studied the data, scoring athletes with missing ages appeared. Should they have scored points?

track_res_all %>% dplyr::filter(is.na(agegrp))

# A tibble: 10 x 19
   Name           Sex     Age Time  ct_evt Event           mins  secs  track_time temp_dist dist_m0 dist_m Place Max_Hamlyn_pts Date_Meet  agegrp rownum age_grade_track_time agefactor

 1 Ed Davis       M        NA 3:55    4000 800 M Run       3     55           235 800           800   800     20              0 2018-05-30 NA          1                   NA        NA
 2 Alex Ramirez   M        NA 13.00   2000 100 M Run       0     13.00         13 100           100   100      2              4 2018-07-04 NA          1                   NA        NA
 3 Alex Ramirez   M        NA 1:05    4000 400 M Run       1     05            65 400           400   400      2              4 2018-07-04 NA          1                   NA        NA
 4 Shauna Griffin F        NA 7:07    1000 1500 M Run      7     07           427 1500         1500  1500      2              4 2018-07-04 NA          1                   NA        NA
 5 Jim Clabuesch  M        NA 7:24    1000 1500 M Run      7     24           444 1500         1500  1500      9              0 2018-07-04 NA          1                   NA        NA
 6 Shauna Griffin F        NA 12:48   3000 1500 M Racewalk 12    48           768 1500         1500  1500.     4              2 2018-07-04 NA          1                   NA        NA
 7 Jim Clabuesch  M        NA 13:54   5000 3 K Run         13    54           834 3            3000  3000      2              4 2018-07-04 NA          1                   NA        NA
 8 Shauna Griffin F        NA 15:28   5000 3 K Run         15    28           928 3            3000  3000      1              5 2018-07-04 NA          1                   NA        NA
 9 Dana Lorelle   F        NA 1:21    4000 400 M Run       1     21            81 400           400   400      6              0 2018-07-18 NA          1                   NA        NA
10 Dana Lorelle   F        NA 6:25    1000 1500 M Run      6     25           385 1500         1500  1500      5              1 2018-07-18 NA          1                   NA        NA

Does Leif’s data look better?


# now need track_res2 because code above removed duplicate rows, which
# could be cleanly removed with row_number because Max_Hamlyn points scored
# the same
track_res2 %>%
dplyr::filter(grepl("LEIF", toupper(Name))) %>%
arrange(Date_Meet, ct_evt) %>%
select(Name, Sex, Age, Date_Meet, Event, dist_m, ct_evt, Date_Meet, Max_Hamlyn_pts) %>%
dplyr::filter(Max_Hamlyn_pts > 0) %>%
print(tbl_df(.), n = nrow(track_res2))

# A tibble: 25 x 8
   Name           Sex     Age Date_Meet  Event            dist_m ct_evt Max_Hamlyn_pts

 1 Leif Rasmussen M        15 2018-05-30 Mile Run          1609    1000              2
 2 Leif Rasmussen M        15 2018-05-30 200 Meter Run      200    2000              2
 3 Leif Rasmussen M        15 2018-05-30 Mile Racewalk     1609.   3000              5
 4 Leif Rasmussen M        15 2018-06-06 1500 M Run        1500    1000              1
 5 Leif Rasmussen M        15 2018-06-06 100 M              100    2000              2
 6 Leif Rasmussen M        15 2018-06-06 1500 M Race Walk  1500.   3000              5
 7 Leif Rasmussen M        15 2018-06-06 400 M Run          400    4000              2
 8 Leif Rasmussen M        15 2018-06-13 200 M Run          200    2000              2
 9 Leif Rasmussen M        15 2018-06-13 Mile Racewalk     1609.   3000              5
10 Leif Rasmussen M        15 2018-06-20 100 M              100    2000              4
11 Leif Rasmussen M        15 2018-06-20 1500 M Racewalk   1500.   3000              5
12 Leif Rasmussen M        15 2018-06-20 400 M              400    4000              2
13 Leif Rasmussen M        15 2018-06-27 Mile Racewalk     1609.   3000              5
14 Leif Rasmussen M        15 2018-07-04 1500 M Run        1500    1000              4
15 Leif Rasmussen M        15 2018-07-04 100 M Run          100    2000              3
16 Leif Rasmussen M        15 2018-07-11 Mile Run          1609    1000              5
17 Leif Rasmussen M        15 2018-07-11 200 Meter Run      200    2000              1
18 Leif Rasmussen M        15 2018-07-11 Mile Racewalk     1609.   3000              5
19 Leif Rasmussen M        15 2018-07-18 1500 M Run        1500    1000              4
20 Leif Rasmussen M        15 2018-07-18 100 M Run          100    2000              2
21 Leif Rasmussen M        15 2018-07-18 1500 M Racewalk   1500.   3000              5
22 Leif Rasmussen M        15 2018-07-25 1K Run            1000    1000              4
23 Leif Rasmussen M        15 2018-08-01 1 Mile Run        1609    1000              4
24 Leif Rasmussen M        15 2018-08-01 1 Mile Racewalk   1609.   3000              5
25 Leif Rasmussen M        15 2018-08-01 800 M Run          800    4000              3

So you cannot tell, but the period that appears at the end of dist_m values like 1609. means that the number is not a whole number, but has a decimal (the 0.11 for walking adjustment).

Let’s examine the 2018-08-01 1 Mile points runner, for the run and racewalk.

milecheck %
dplyr::filter(Date_Meet == ymd('2018-08-01') & grepl("1", Event)) %>%
arrange(dist_m, Event) %>%
dplyr::filter(Max_Hamlyn_pts > 0 )

print(milecheck, n = nrow(milecheck))

# A tibble: 22 x 16
   Name              Sex     Age Time  ct_evt Event           mins  secs  track_time temp_dist dist_m0 dist_m Place Max_Hamlyn_pts Date_Meet  agegrp

 1 Brendan Murray    M        30 5:22    1000 1 Mile Run      5     22           322 1            1609  1609      1              5 2018-08-01 Age 30 - 34
 2 Leif Rasmussen    M        15 5:31    1000 1 Mile Run      5     31           331 1            1609  1609      2              4 2018-08-01 Age < 30
 3 Ted Richardson    M        48 5:45    1000 1 Mile Run      5     45           345 1            1609  1609      3              3 2018-08-01 Age 45 - 50
 4 Michael Fields    M        26 5:49    1000 1 Mile Run      5     49           349 1            1609  1609      4              2 2018-08-01 Age < 30
 5 Nicholas Min      M        15 5:50    1000 1 Mile Run      5     50           350 1            1609  1609      5              1 2018-08-01 Age < 30
 6 Kevin Nickodem    M        61 5:50    1000 1 Mile Run      5     50           350 1            1609  1609      5              1 2018-08-01 Age 61 - 64
 7 Brian Stull       M        47 5:50    1000 1 Mile Run      5     50           350 1            1609  1609      5              1 2018-08-01 Age 45 - 50
 8 Roxanne Springer  F        54 6:17    1000 1 Mile Run      6     17           377 1            1609  1609      1              5 2018-08-01 Age 51 - 55
 9 Kaina Morey       F        17 6:19    1000 1 Mile Run      6     19           379 1            1609  1609      2              4 2018-08-01 Age < 30
10 Robin Richardson  F        48 6:34    1000 1 Mile Run      6     34           394 1            1609  1609      3              3 2018-08-01 Age 45 - 50
11 Amy Cummings      F        44 7:25    1000 1 Mile Run      7     25           445 1            1609  1609      4              2 2018-08-01 Age 40 - 44
12 Amy Lowman        F        41 7:36    1000 1 Mile Run      7     36           456 1            1609  1609      5              1 2018-08-01 Age 40 - 44
13 Leif Rasmussen    M        15 9:40    3000 1 Mile Racewalk 9     40           580 1            1609  1609.     1              5 2018-08-01 Age < 30
14 Roxanne Springer  F        54 9:42    3000 1 Mile Racewalk 9     42           582 1            1609  1609.     1              5 2018-08-01 Age 51 - 55
15 Deb Springer      F        44 10:26   3000 1 Mile Racewalk 10    26           626 1            1609  1609.     2              4 2018-08-01 Age 40 - 44
16 John Min          M        48 11:01   3000 1 Mile Racewalk 11    01           661 1            1609  1609.     2              4 2018-08-01 Age 45 - 50
17 Tim O'Brien       M        66 11:06   3000 1 Mile Racewalk 11    06           666 1            1609  1609.     3              3 2018-08-01 Age 65 - 69
18 Isaac Mathias     M        14 11:17   3000 1 Mile Racewalk 11    17           677 1            1609  1609.     4              2 2018-08-01 Age < 30
19 Barbara Hindenach F        67 11:22   3000 1 Mile Racewalk 11    22           682 1            1609  1609.     3              3 2018-08-01 Age 65 - 69
20 Kaina Morey       F        17 11:42   3000 1 Mile Racewalk 11    42           702 1            1609  1609.     4              2 2018-08-01 Age <span id="mce_SELREST_start" style="overflow:hidden;line-height:0;"></span>%
    arrange(dist_m, Event, Sex, track_time) %&gt;%
    dplyr::filter(Max_Hamlyn_pts &gt; 0 ) %&gt;%
    select(Name, Sex, Age, Event, dist_m, track_time, Place, Max_Hamlyn_pts)

print(milecheck_all, n = nrow(milecheck_all))

# A tibble: 22 x 8
   Name              Sex     Age Event           dist_m track_time Place Max_Hamlyn_pts

 1 Roxanne Springer  F        54 1 Mile Run       1609         377     1              5
 2 Kaina Morey       F        17 1 Mile Run       1609         379     2              4
 3 Robin Richardson  F        48 1 Mile Run       1609         394     3              3
 4 Amy Cummings      F        44 1 Mile Run       1609         445     4              2
 5 Amy Lowman        F        41 1 Mile Run       1609         456     5              1
 6 Brendan Murray    M        30 1 Mile Run       1609         322     1              5
 7 Leif Rasmussen    M        15 1 Mile Run       1609         331     2              4
 8 Ted Richardson    M        48 1 Mile Run       1609         345     3              3
 9 Michael Fields    M        26 1 Mile Run       1609         349     4              2
10 Nicholas Min      M        15 1 Mile Run       1609         350     5              1
11 Kevin Nickodem    M        61 1 Mile Run       1609         350     5              1
12 Brian Stull       M        47 1 Mile Run       1609         350     5              1
13 Roxanne Springer  F        54 1 Mile Racewalk  1609.        582     1              5
14 Deb Springer      F        44 1 Mile Racewalk  1609.        626     2              4
15 Barbara Hindenach F        67 1 Mile Racewalk  1609.        682     3              3
16 Kaina Morey       F        17 1 Mile Racewalk  1609.        702     4              2
17 Robin Richardson  F        48 1 Mile Racewalk  1609.        727     5              1
18 Leif Rasmussen    M        15 1 Mile Racewalk  1609.        580     1              5
19 John Min          M        48 1 Mile Racewalk  1609.        661     2              4
20 Tim O'Brien       M        66 1 Mile Racewalk  1609.        666     3              3
21 Isaac Mathias     M        14 1 Mile Racewalk  1609.        677     4              2
22 Brendan Murray    M        30 1 Mile Racewalk  1609.        705     5              1

Interesting that all 3 of Nicholas Min, Kevin Nickodem, and Brian Stull scored a point with their 5th place finish in the mile run, but it looks like they all ran 350 seconds on the same day (5 minutes and 50 seconds on 2018-08-01).

Aug1_sametime

Max_W_Final_updatedMax_Men_Final_updated

[2018-09-26 UPDATE]

This is what I manually calculated using the results web pages, and it suggests a problem.

ManCalc_20180926

Advertisements

3 thoughts on “Carolina Godiva Track Club Summer Track Series 2018: Corrected (but not validated) Max Hamlyn Results

  1. Cool, analysis (I’ve always wanted to get started in R-I’ve always used SPSS myself) but I’m curious about how you calculated the points, as they are very different from the scores I had calculated. Did you award points within individual age groups prior to aggregations for the +/- 40 categorization? I had Leif with much fewer points as the sprints were usually dominated by the 20’s and 30’s age groups (while he would still be the top under 20). Also, did you only include people with a minimum number of appearances? I know there were a lot of one hit wonders that affected the standings.

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s