TrainerRoad's Big Data


#21

I’m not a lawyer either, so there is little difference to me.


#22

@bbarrera A large number of athlete’s have open Strava profiles and their activity data can be downloaded via Strava’s API.

Nice addition @DaveWh! Yes, that’s exactly what my first bullet was getting at. It would augment athletes’ profiles and also allow for tailoered training plans based on your rider type, strengths, weknesses, etc.

@bherbers that’s exactly the case. Plenty of people use TSS and just call it something else, for example I believe this is what Strava’s weighted average power is.


#24

I could be wrong but I believe the big data is publicly available right now if you search for it.


#25

The difference is that you can’t stop someone from using the underlying formula you come up with.


#26

The TR data set is really strong on workout-specific data (Power, HR, cadence data etc.), but weak on “big picture” type data.

So, why don’t TR send out a survey to all users to aim to collect such data as:

  • How many years they’ve been cycling.
  • What their cycling background is (road, TT, MTB etc.).
  • What other training they carry out, plus the volume for each discipline.
  • Typical diet (if on LCHF etc.).

If enough data is collected, some useful trends might start to emerge (when tied with the workout data)… experienced cyclists respond better to plans x/y/z. Those cyclists on a LCHF diet are getting more gains from certain plans. If you’ve got a big road cycling background certain plans aren’t very effective etc…

No-one likes completing surveys, but if there’s a decent prize at stake (win a smart trainer, or a free year TR subscription) then enough people are likely to complete it :slight_smile:

Based on this data, TR could implement a new “Plan Builder” option into the website; so instead of selecting an off-the-shelf plan, you’d complete a series of questions (required phase / available training hours per week / training experience & background / diet etc.) that would generate a custom plan for you… all based on what the data has shown as been most effective for you.

As time goes on this “big picture” data set would become more and more comprehensive (based on the data collected in the “Plan Builder”), leading to better custom plans.


The Bell curve of cylists - how fast are the average TR users?
#27

Only after using oauth to authenticate the user, as far as I know. I don’t believe you can randomly download fit files.


#28

This is correct


#29

What your suggesting is essentially a fairly basic qualitative questionnaire. Unfortunately there are plenty of sources of bias introduced in that sort of approach which means whatever is collected probably won’t necessarily complement the data they already hold.

Only a proportion of users would respond (?representative) and there are reasons to doubt the truthfulness/utility of self-assessment answers.

EDIT: also, a potentially more fruitful area to pursue (without needing new data) is to build some kind of algorithm which can automatically adjust plans/intensities/workout type on the fly depending on your history (and data from everyone else based on what seems to be effective for the whole population).


#30

After posing the original question I started thinking about what you could usefully get out of the data…

One of the first big steps that could be taken is to look at the effectiveness of plans: FTP increase, completion rate etc. On it’s own that might not be too useful to the end user but it could be the basis of a ‘plan chooser’, an idea I really like the sound of.

The problem is that there may not be enough data on where you started - Plan X is very effective, but only when you start from State Y. How are we going to know this?


#31

True. Good point.

I think the “Plan Builder” functionality could really work though… over time the data collected in the process of generating a custom plan would become pretty comprehensive. People are likely to be truthful/accurate in that process as it’d be in their interest to answer truthfully in order to get the best plan possible.


#32

I agree, knowing the FTP of someone in isolation is not too useful. It’s when you understand their past training history/experience that you can start automating the selection of the best workouts for a custom plan.

Capturing training experience / training background, diet etc. when generating a custom plan would start to build this data. The more this is used, the more accurate it becomes.


#33

Fair point. There are no checks on who can register an app but there are rate limits.


#34
  1. Register an app.
  2. The athlete has to give app permission to access data
  3. Strava rate limits apply

You can’t just start downloading data from Strava without the permission of each user.


#35

Ah of course, even public activities will need user authorisation if you’re going via the API, and writing a scraper woud be a lot of faff. I didn’t think that through thoroughly!


#36

Without quickly testing, I’m not even sure a screen scraper would give you all/most/enough data.


#37

Actually that’s easy. If you go to https://www.strava.com/activities/<activity-id>/streams you get a nice JSON! You do need to be logged in though.


#38

Hire a lawyer? Screen scraping violates the Strava terms of service. And I don’t know about privacy laws across the world, but surely screen scraping has crossed ethical boundaries.

A few reasons why I proposed building an opt-in community of like-minded individuals.


#39

Ha ha, I wasn’t suggesting I’ve done it. My point was it could be done if you so chose to. I imagine screen scraping is against every companies terms and conditions, yet there is an entire industry based around selling scraped data.

Anyway, you’re right, an opt-in community is indeed a much better idea.


#40

Given that TR has accomplished as much as you have, my guess is that you already have plenty of ideas. But, simple & valuable things would be optimizing the training plans for riders after they’ve logged a given amount of riding. This might rely more on getting ALL of a rider’s power+HR data into the system, assuming that most cyclists don’t exclusively do TR, that they’re riding outside plenty, racing etc. You should be able to see what sort of approaches tend to yield what sort of results.

For instance, I’m sure there are riders who’ve read all the papers lionizing polarized training who select TR workouts and do their own plans. You can likely identify what sort of results (i.e., changes to FTP) they’re getting.

OR, you might even up-end certain training assumptions, like the hegemony of FTP in the first place :wink:


#41

I’d like to know the percentage of users whose TSS drops at the 3rd week of a mesocycle.

I seem to do well with the first 2 weeks then I tend to have trouble meeting the 3rd week TSS before a rest period in the 4th week.