High-volume search of 2013 Boston Marathon runners to rapidly identify community members
For the full list of Boston Marathon runners produced by the first script, click here.
In the hours following the 2013 Boston Marathon tragedy, a friend from my school paper–the Daily Princetonian– asked if I could compile a list of any Princeton students who might have run in the race. I admired the paper’s effort to confirm the safety of community members: as of Tuesday morning, all known university affiliates are confirmed safe.
The news of the Boston Marathon tragedy shocked me as much as other members of the Princeton University community, and my thoughts go out to the city of Boston and anyone affected by the events. Here, I’ve documented how to rapidly search for a large number of names in the participants list with hopes that it might help other communities confirm the safety of their members.
Scraping Marathon Runners
The first task was to compile a complete list of marathon runners from the Boston Marathon database. It seemed that one could list 1000 runners at a time, so scraper.py fetches data across 27 results pages and stores it in a file, runners.txt, including the runner details URL in the final field for looking up age in the final step below
The second task was to prepare the list of Princeton University undergraduates for easy matching with the Boston Marathon name format (“Last Name, First Name”). This is performed by names.py to produce a file called names.txt.
Finally, grep.py runs ‘grep’ regular expression matching to print those lines in runners.txt that contain any name in names.txt.
Because there were several common names in the grep output, one final script called consolidate.py fetches the ages of the matched runners. This proved useful when searching for undergraduates because most students were in the 18-22 age range.
You can find the source code for this project on GitHub.