SCG Banner
Today's birthdays : yam sandwich

The Payload Bug


The Payload bug, also known as the KOTH bug, was a long-standing bug on the SCG servers, arguably the oldest bug ever, believed to have existed as early as 2010. It was officially marked as fixed by Rowdy and Fraeven on March 26th, 2019. The bug was caused by an oversight in the donator plugin that would delete random game entities. The bug itself had two crash conditions, but was believed to be the cause of several other issues as well.

  1. Description of the issue
  2. Related issues
  3. Cause
  4. Difficulty in investigation
  5. Discovery
  6. Fix
  7. References

Description of the issue

The Payload bug got its name due to the frequency of it noticably occuring on payload maps. It would present itself as an irregularity in the payload cart's tracks. It would appear as if the cart was further along the path than it actually was, or perhaps it had reached the end on the hud and would no longer move forward. This almost always came with a server crash. Proper crash logs weren't available in the early days of SCG, and when they were, they provided zero insight as to what happened.

In 2017, a new symptom was introduced. A new crash condition was occasionally found in KOTH maps, giving this bug the alternative name, the KOTH bug. The issue would present itself as one of the team timers in a KOTH round displaying as [0:00] at round start. If that team captured the point, the server would crash. A temporary fix was eventually created to automatically restart the round if a timer began at [0:00]. This became significant as every other symptom had some sort of explanation somewhere, this one did not.

Related issues

The following issues were believed to be caused by the bug overall.

  1. Payload tracks being broken occasionally (crash potential)
    payload_track_broken.png
    An example of the Payload track bug
  2. KOTH timers being broken occasionally (crash potential)
  3. Doors sometimes would be missing, allowing players to prematurely leave their spawns during setup
  4. HDR lighting would randomly break as the start of a round, and the map would look as if it was extremely bright
    hdr_lighting_broken.jpg
    An example of the HDR lighting breaking
  5. Players would occasionally be missing cosmetics
  6. Players would rarely spawn missing weapons

In theory, there were possible other issues with missing entities, or even server crashes that had no explanation.

Cause

The cause of the bug was an oversight in the donator plugin, specifically, the plugin that handles donator sprites. When a new sprite is created and assigned to a player, it assigned the entity index. Entity indexes can shift, meaning that an assigned index in one round may not be the same in the next. This means that when a round ends and donator sprite entities were deleted they were probably deleting something important instead.

Entity indexes are given out to every entity loaded at a given time, which can be completely random. How many entities would a given map use, how many players were connected, how many cosmetics they were using, how many were donators, etc. It is speculated that the KOTH timer bug showing up as late as 2017 was due to the overall increase in the number of entities that may have been running at a given time, thus introducing more possibilities for failure.

The fix to this was to assign entity references to players instead rather than raw indexes, as is the proper method of dealing with entity indexes.

Difficulty in investigation

There were several problems in attempting to fix the bug over the years. First, the symptoms of the bug were never believed to be related until the day the bug was fixed. Each individual issue was assumed to be a general TF2 bug, as players would comment they've seen it happen before elsewhere, even in Valve servers. The original Payload issue was assumed to be a TF2 bug, as it is listed on the Valve Wiki as quote, "Payload carts occasionally derail on their own. Even on stock maps!", and "May cause a crash if the payload cart is pushed onto a node that isn't enabled."[1]. And crash logs indicated that other servers outside of SCG had the same KOTH timer crashes, so it was assumed it may have been a general issue.

Rowdy had sought insight in the Alliedmodders forums, the Alliedmodders discord, and the Team Fortress 2 community discord, all of which yeilded nothing. Rowdy also opened a ticket for the KOTH timer crash on the Valve github, but never got any comments on it before the bug was fixed.

Additionally, as the conditions to cause any of the problems were based on current map (As well as the previous map), connected players, player loadouts, the amount of donators and if they were using sprites or not, the overall state of TF2 and what was occupying the entity list, and the randomly assigned index numbers, it was near impossible to try and reproduce this issue without knowing what to be looking for.

The donator plugin wasn't used on many servers other than our own, so there was a lack of those with similar issues. Only two posts were written talking about the issue, neither of which detailed enough to connect their issues to the same ones we had.

Discovery

Repeated attempts were made to try and recreate the issue(s) with no success. If a consistent way was found, it was then be possible to experiment with removing plugins until the issue stopped. Trial and error testing, the use of bots, multiple maps, pouring through the newly leaked TF2 source code, and luck eventually culminated in a success.

According to Rowdy...

"It's actually a funny story even, we almost missed it entirely. I wasn't paying attention to my screen and caught it at the last moment during the waiting for players screen on pl_halfacre. With coffee in my mouth I made a silly noise and slammed down on my screenshot button"

With a way to cause the issue in sight, the following discoveries were made.

  1. The issue was caused by a number of precise conditions.
  2. The map before and after (Before payload, then the actual payload map) was a factor.
  3. The number of players on the server were a factor, though it is unknown to what degree, it was not possible to reproduce the issue on an empty server.
  4. It was sometimes caused by a round being won and having the game abruptly ended.
  5. The team winning was totally irrelevant. (Noted because it was believed to be relevant in the case of the KOTH timer issue.)

Ultimately, the method used to re-create bug was as follows.

  1. Playing on koth_arctice_b3 with two players.
  2. Winning one round and forcing a map change with the use of mp_timelimit 1 and mp_match_end_at_timelimit
  3. Having the map be changed to pl_halfacre

Other combinations were discovered and used to test were as follows.

  1. koth_arctic_b3 -> pl_upward : Cart broken at ~45% mark on track, persisted through waiting for players
  2. koth_sawmill -> pl_upward : Cart broken at ~10% mark on track

Fix

The bug was fixed using entity index references[2], and patched on the servers on March 25th, 2019. It was marked at fixed on March 26th, 2019. A post was shared to the donator plugin thread on the AlliedModders forums detailing the issue and providing a patch.

References