Rebuilding StravaDataAnalysis: From First Python Script to Proper Architecture
I started StravaDataAnalysis in 2021 as a way to learn Python. And by learn, I mean literally my first Python scripts. I had never written a line before. At my day job, we were looking to transition our datawarehouse from SQL Server and Analysis Services to SQL Server, AWS Lambdas, Redshift and NiFi, so I figured I should get comfortable with the Python before I found myself debugging a production pipeline at 2AM.
I’ve now had over a year of real-world Python experience, and I recently came back to my Strava project. The goal? Apply what I’ve learned to clean up, restructure, and harden the code. This post covers what I changed, why I changed it, and how it’s shaped the project for future improvements.
What Did the Old Code Look Like?
The original project worked, but it was a mess:
StravaDataAnalysis/
├── LICENSE
├── README.md
├── dataPredication.py # not being used
├── visualiseData.py # 1000+ lines of pretty graphing
├── getData.py # Gather data from Strava
├── getTokens.py # Get initial token from Strava
├── processData.py # Gather data from SQLite database
├── refreshTokens.py # Get token from SQLite database or refresh from Strava if expired
├── databaseAccess.py # SQLite queries to save and retrieve data
└── strava.sqlite # SQLite database
Everything was tightly coupled. I had matplotlib setup, SQL queries, API auth, and file IO all jumbled into a few monolithic scripts. There was no clear flow, and even small changes (like tweaking a chart title or refactoring a metric) became a pain. The project grew organically as I figured things out. Useful as a learning tool, but not something I’d want to maintain, and I was embarrassed that it had been forked and used by another engineer as the basis of their own Strava project.
Why Rewrite It?
Now that I’ve spent time working with Python in an enterprise capacity, I’ve learned the difference between scripting and architecture. I wanted the new version of this project to have clear modular separation, single-responsibility files, and shared utilities. I also wanted to stop repeating myself every time I added a chart.
New Architecture
The rewritten structure is far more modular and maintainable:
StravaDataAnalysis/
└──src/
│ ├── strava_data/
│ │ ├── db/
│ │ │ ├── __init__.py
│ │ │ ├── dao.py
│ │ │ └── models.py
│ │ ├── strava_api/
│ │ │ ├── processing/
│ │ │ │ ├── __init__.py
│ │ │ │ └── transform.py
│ │ │ ├── visualisation/
│ │ │ │ ├── __init__.py
│ │ │ │ ├── graphs_distance.py
│ │ │ │ ├── graphs_distribution.py
│ │ │ │ ├── graphs_effort.py
│ │ │ │ ├── graphs_pace.py
│ │ │ │ └── utils.py
│ │ │ ├── __init__.py
│ │ │ └── client.py
│ │ ├── __init__.py
│ │ ├── auth.py
│ │ └── config.py
│ ├── utils/
│ │ └── logger.py
│ ├── generate_readme.py
│ ├── get_tokens.py
│ └── main.py
├── LICENSE
├── README.md
└── strava.sqlite
Let’s walk through what’s changed.
Chart Module Refactor
The old visualiseData.py was over 1,000 lines. Now, charts are grouped by purpose:
- graphs_distance.py - distance-based stats
- graphs_distribution.py - histograms and heatmaps
- graphs_effort.py - TRIMP (training impulse) and load.
- graphs_pace.py - pace trends
Reusable logic (axis formatting, saving plots) is in utils.py. For example, plot_with_common_setup() standardises figure layout and output. This means all charts now follow the same conventions — and I only need to change that logic in one place.
Config as a First-Class Citizen
Previously, I was hardcoding paths and toggles. Now:
- All constants live in a dedicated Config class
- Everything from save paths to data windows is defined centrally
- Modules explicitly receive config — no more magic globals
- Modules that need config take it as an argument rather than importing globals.
This makes the project easier to configure, test, and extend.
Strava API Cleanup
The original getData.py file handled everything from OAuth to inserting data into SQLite. Now I have:
- client.py — wraps the Strava API and handles pagination
- dao.py — abstracts SQLite access
- transform.py — processes API data into model objects
Each part has one job. I can now test API auth independently of database code.
What I Learned
- Python rewards clarity. Modular code > clever code.
- Common setup functions make visualisation code readable.
- Treat config as data, not code.
- Even personal projects deserve structure.
What’s Next?
I’m working on:
- ML-based pace forecasting
- Classifying run types based on intensity and duration
- Generating a weekly natural language summary: volume, intensity, comparisons, and suggestions
Those are likely to evolve in their own branches and might not be worth a full blog post.
For now, I’m just pleased that the project finally reflects how I actually write Python in 2025 — not how I stumbled through it in 2021.
Repo is at “github.com/c-wilkinson/StravaDataAnalysis” if you fancy a browse.
Share on: