Reverse Engineering YouTube’s Demonetization Algorithm

This is a relatively simple, though time consuming, way to gather information on what metadata YouTube is using to demonetize videos.

First lets look over all the data available to YouTube and the likelihood that said data is used by their algorithms for determining monetization or status.

MetadataLikelihood of use in analysis when demonetizing
video titleHIGH
video descriptionHIGH
video tagsHIGH
what playlists video is in (and their metadata)MEDIUM
upload timeLOW
video durationLOW
raw filenameMEDIUM
comment countLOW
public | private | unlisted statusLOW
enhancements appliedLOW
audio tweaksLOW
end screens / annotationsMEDIUM
subtitles / closed captioningVERY HIGH
user reportsUNSURE (possible but they'd likely want to run the algo against all videos even if they weren't currently)

Given recent comments from Dave Rubin and others that private uploads without metadata have been demonetized it seems very likely that the automatic subtitling is a primary source of information for the algo.

While it’s possible that certain channels may receive additional scrutiny by the algorithms due to user reporting I would not expect that to result in a difference of end result. Only in the priority in which videos are scanned.

It’s also of course possible that YouTube is targeting channels directly regardless of metadata but I think we need to first assume they are exclusively using their algos.

Reverse engineering the algorithm

  1. Upload a typical video
    1. The video should have a random filename
    2. The video should be set at upload time to ‘private’
    3. Fill in NO metadata
    4. The video ideally would be medium or long form in which well known controversial topics are discussed
  2. If the video becomes demonetized then it’s clear that:
    1. The channel is targeted OR
    2. The subtitles were used as input to the algo and it was determined that the video is not advertiser friendly
  3. Assuming the video is now demonetized: Go to the subtitle editor and look through the subtitle. Identify any and all words which could be considered controversial.
  4. Create new videos (based on the original cut) with each controversial word independently removed and one video with all of them removed. Meaning if there are 5 different words there should be 6 new videos.
  5. Upload each video with random names and wait to see which of those become demonetized.
  6. Document which of the edited videos were demonetized and correlate them with words removed.
  7. It is possible that the similarity between other demonetized videos could play a role in the algorithm. If so it would be necessary to replace the actual video with something else or modify it in some way (invert, rotate, inverse, etc.).
  8. If even the completely censored version becomes demonetized it’s possible that:
    1. There is controversial video content which is being picked up. Words within the video itself. Signs, posters, billboards, burned in subtitles, etc. In those cases block them out.
    2. The algorithm uses the similarity to other demonetized videos to help in determining advertiser friendliness. If so it would be necessary to replace the actual video with something else entirely or modify it in some way (invert, rotate, inverse, etc.). The audio similarity could also be a factor in which case adding white noise to parts of the video, perhaps to cover up the spoken controversial words could be used.
    3. The analysis of the content is more than keyword searches and is determining tone or topic from secondary information. The context of the sentences rather then keywords. This would require removing full sentences from the original video or reordering the conversations to test further.
    4. YouTube is manually targeting channels.

While it could take a significant amount of time to perform every permutation described above (and to do so more than once to confirm any results) it should shed a lot of light on how YouTube’s algorithm works and what information it acts on. If after performing the above a channel’s videos continue to be demonetized, and we’ve not missed any other possible sources of information, it’d probably be safe to say there is something more nefarious being done.

