Input lag experiment Blog
HalfwayDead here to report that I am publishing my thesis today! In case you forgot (which is totally expected with how long it's been), it's called "Effects of induced latency on performance and perception in video games" and uses the data from the Rocket League experiment. First of all, I'd like to apologize to you all for stopping the updates. I am not the fastest writer, so the updates took as much as an entire afternoon. I also felt like it wouldn't be responsible to release results that I haven't triple checked yet. It just didn't feel worth it to report on.
The thesis overall still required a much greater effort than what I had expected at the time of the last update. Thankfully, COVID rules gave me extra time. I ended up spending most of it on teaching myself advanced statistical methods, and reading 150+ papers related to the subject area. Originally I wasn't aware of the majority of the research, due to the way it is labeled and categorized. But once I started going down every cited paper the amount of relevant sources expanded quite a bit. Unfortunately, that doesn't mean that any of those sources has given a satisfactory complete model of latency (and neither have I). I was also shocked that some of the research published in renowned conferences has grave errors. Multiple articles confused network and local latency. Although historically some games have had network handling where the network latency causes local latency, this wasn't always the case for the games in the research. Aside from the related work summary that appears in the thesis, I originally wrote a longer chronological summary of the research. If you are interested in reading it, just email me. I haven't double-checked it, but it has sources attached.
On November 8, 2021 I handed in my thesis and had to wait quite a while to hear back the result. This week it was ready. I got a 1.3 for my effort. The German grading system allows for grades between 1.0 (best) to 4.0 (worst) in steps of 0.3. Anything not passing is considered 5.0. I am very happy about this.
The thesis is over 100 pages with all material included, so I totally understand if you are not interested in reading that much. I am absolutely planning on releasing the results in video form. However, in order to make it work for a YouTube audience, the content will obviously be adapted, and any implementation details will be left out. I also don't have an ETA when I will get that done. I will not provide a written summary (beyond the abstract).
This is actually not latency related, but I did already make a video about shot power using the data of the experiment. I forgot to make a post about that. However, I am assuming that most of the people reading this are subscribed to my YouTube anyway.
I will post the next update when the video summarizing the thesis is out. Beyond that I will likely kill the email "newsletter" as it has served its purpose. If you think it would be great to keep it running for specific kind of updates feel free to email me, and tell me what you want to see on it.
If you want to get a mail on new updates you can still send me a mail with the subject "Notify me" to email@example.com to get added to the mailing list. You can find the previous update posts below this one.
A month and a half has passed since the last update and there has been some interesting progress. Still not the final data, which I might not provide at all before I publish the thesis itself, but at least something. If you want to get a mail on new updates you can still send me a mail with the subject "Notify me" to firstname.lastname@example.org to get added to the mailing list. You can find the previous update post below this one.
Progress was slow over the Christmas days but other than that, quite a bit happened. I've been especially productive in the last 2 weeks. For a while now I've been working on a list of your monitors. If you recall, during the experiment many things were collected, like your monitor, your input device (controllers, etc.), personal info, and even graphics and performance data. Some of those things were standard survey information, but the other information serves mainly one purpose: estimating the baseline input lag of your system. The most important pieces are the delay of the monitor, input device, and the time the game needs to calculate and render everything. I can only figure out the input lag of your monitor by using data from trusted reviewers. Due to different measurement methods which can give different results, this further restrains the usable sources.
The most common lag testing method is the Leo Bodnar Input Lag Tester. This is a great all in one device. It creates a video signal and measures the response with a sensor for reliable single spot measurements. Unfortunately, it can only generate a 60 Hz signal. Even though a monitor could technically have the same signal delay at 60 Hz as at 144 Hz or more, measurements often find that on top of the extra delay you get from refreshing the screen only 60 times a second, monitors do not scan the image out immediately, but wait for a few ms, so you have extra delay. This doesn't happen if you use Freesync or GSync and the refresh rate lowers to 60 because of the framerate. It only happens when you select 60 Hz in the control panel.
The solution: rtings. They have created their own very similar tool which connects to a PC. A PC with a proper graphics card can then drive the monitor at any refresh rate that a user would be able to. Hardware Unboxed have something of the same kind since a bit more than a year. So these 2 sources were my gold standard when available, and their variance for the same monitors is rather minor. For 60 and 75 Hz monitors, I allowed any reputable site using a Leo Bodnar testing tool. Other than that, I'm using tftcentral.co.uk and pcmonitors.info who determine input lag with via SMTT 2.0. Their numbers should not be compared 1 to 1 to those of rtings, etc.! So why am I doing that anyway? Because I know how both measuring technologies, pixel response times, and monitor scanouts work. That will allow me to compensate the data, and additionally, I'm not trying to get 1 ms accuracy.
To get back to the main point of the progress. At the end of the day, looking this up is manual work, and with 763 submissions, how much work could that possibly be... ... ....... Turns out you guys use 449 unique monitors. That was more than I expected, but worse was the problem of identifying the true model based on the information collected. I collected the monitor you chose from a list, or you could submit your own name. There was additional info collected which is where the name in the list came from. There is something called a DeviceID e.g. BNQ7F51, which will tell me that you're using a BenQ ZOWIE XL 2540 on DisplayPort. So much for the theory. In reality, many manufacturers don't seem to care too much about making these unique or anything. I even occasionally found monitors of completely different sizes using the same ID. There is also no database that will just give you the names of the monitors by ID. The name field can be equally useless. The Asus VG279 could be a 144 Hz, 165 Hz, or 280 Hz monitor. Those models will obviously be significantly different, but the name is just a few letters different, and those letters are missing in the name that I could read out from the monitor's info. Regardless, I was able to identify the majority of monitors or narrow them down to seemingly identical versions, which is still useful.
At the end of the day, here is where I stand: I have 361 datapoints with a monitor of which I know the input lag. That is only half, but still a few hundred. More importantly, the vast majority of you are using 120 Hz+ gaming monitors. There is only 1 of those monitors tested, that has above 10 ms of input lag (and it's still less than 11). The plan is, that I can compare the results of the monitors of which I have good data (and low input lag) and compare the results from the people using them to the results of the people using the untested gaming monitors. If the 2 groups have no statistical difference, then it can be assumed that including the group of people using untested gaming monitors doesn't skew results.
Ok, enough about monitors for now. Let's do the same with controllers. 90% of you used a controller and only 10% keyboard/mouse. I don't have the full statistics regarding the exact input devices right now, because that data is only in a raw state right now, but it should be over 3/4 DS4 and different Xbox One controllers, nearly all of which I have tested. There were still 152 unique controller entries, but it turns out that the majority of the manually entered ones were DS4s that didn't show in the HID devices. I hope that most people who do not have an original Xbox One controller, entered that fact and didn't just select the Xbox option, because it is possible for a manufacturer to imitate it 100% on a driver level. Thus, I have no way of knowing that for certain. Other different versions were just due to different connection methods, which shows a different product id. If you call those different controllers, I apparently own 29 different controllers :D.
Now for the juicy part. I have made some progress on the processing, and thankfully, when I tell the computer to add up 160 shots each for 763 people, it takes less than a second. First of all though, out of 763 submissions, 16 people submitted twice, nobody more than twice. The second submission of those people will obviously be removed which leaves 747. Someone fiddled with the graphics settings despite being told not to, so that will have to be removed. I haven't ran all the tiny checks yet, but I'll continue with that, after I remove people for input lag reasons.
This time around I focused calculating the scores for everyone. I only collected raw data, so the processing will all be done on my end and not on the users, which would make it easier to cheat. So with that done, I can now show a highscore list:
Most of you will have forgotten your score by now. If you haven't reinstalled BakkesMod since then, it should be at the top of the
results.json player_data.json file that you can find in the bakkesmod/data/inputlagexperiment/results folder. The average score was 605, the standard deviation was 122. The average rating of participants was 1350, aka Champ 2. Not sure if any of them are reading this, but shoutout to the 4 people with negative score who still finished the experiment. If you didn't touch the ball at all, it reset and subtracted points, so those people had to have played longer than anyone else. I am aware that the experiment was targeted towards very experienced players, so I appreciate it all the more when people are pushing through to give me data on the lower end.
I think that's all the data I will show for now. On the next update I will probably dive into some fun things like ranks vs hours, ranks vs score etc. Of course rank vs score will be tainted by the fact that players had input lag. It's why I'm calling them "fun things". They wouldn't be findings that should be taken as scientific fact, but something worth researching and verifying further. As I already said, it would be inappropriate to share any of the scientific experiment results before I have triple checked them and am certain about their validity.
It's been quite some time since I made the original video about my input lag experiment. I wanted to give out an update on progress and give some extra information that I was deliberately withholding before the experiment was finished. I'll also talk about the methodology I'll be using to analyze the dataset. If you want to get a mail on new updates you can still send me a mail with the subject "Notify me" to email@example.com to get added to the mailing list.
Overall, the experiment has been a success. There were over 700 submitted results which is more than what you'd reasonably be able to get in any laboratory setting. Thank you for your participation! Unfortunately, there isn't too much I can report on the progress. It's been going pretty slow, mostly due to personal reasons, but I also don't want to proclaim anything too early. As such, I cannot even tell you so far, how many of the submitted results are usable data points. More on what that entails is in the section about the methodology. Without that part being done, I have of course not calculated any of the interesting tests on the results.
While the experiment was ongoing, I was asked many times about how much input lag I added, as well as whether the user could be given a score on how well they guessed input lag rather than just how well they did. I kept those values a secret, as I didn't want users to be influenced in any way. There are those that claim they can sense 1 ms of input lag, so if they knew how much I added, they might end up just choosing only 0 or 6 (on the 0-6 scoring scale). There are those that claim anything below 100 ms is irrelevant and might only select 0 because of what they know. And lastly, if some people knew how poorly they're doing and they were telling others, then those might conclude that the amount is unnoticeable anyway and therefore end up choosing something random/not trying.
So how much did I add? I added 0, 1, 2, 4, or 6 physics ticks of input lag. That is (ms) 0, 8.3, 16.7, 33.3, 50. If you recall, the way the experiment was structured, there were 3 different graphics settings in random order: your own, minimum, maximum. Each had 50 shots. One scenario equated to the 5 shots of the training pack done in random order with the same input lag. Then you were asked what you thought the input lag was. So that's 10 different scenarios and there were 5 different input lag amounts. Each showed up exactly twice for everyone, but the order is completely random, and getting the same input lag twice in a row is very possible.
The idea is, of course, that all of these random orders balance each other out over the large dataset, so it's not biased towards any graphics setting, input lag scenario, or shot. An individual's experiment does not have this measure of safety. It's another reason I shied away from giving any individual statistics beside a total score.
There were a couple of suggestions and criticisms made about the experiment. The most common was in regards to the test not being optimal to notice input lag. I was very well aware of that, as I discussed this with my supervisor. Unfortunately, there is no such thing as a perfect experiment and the primary goal of the experiment is not whether or not you can notice it. The goal is to see how performance degrades and whether different levels of visual effects (graphics) change how input lag effects the player. The experiment was chosen in a way that the player has a clear goal to execute and the extent of which they're able to do it has to be easily measurable. This is not really possible with dribbles or something of the kind.
Other points were regarding the length. Too little warmup, too few shots per input lag scenario, etc. Again, unfortunately it's not really possible to design the perfect experiment. The longer I make it, the less people are going to participate, which is also problematic in many ways. The randomization is supposed to take care of the learning that goes on over the course of the experiment. Although a single person's performance will change over time, each scenario has the same chance of being affected by that. This does of course not get rid of the problem of constantly changing input lag. A player is likely going to perform better at a constant input lag that they have time to get used to. To set up an experiment that takes this into account, however, is almost impossible. I'd argue that if your input lag changes, it could take days of readjusting your muscle memory until you're at the peak performance that you could be at that input lag. Although I would love to see such a long term study with players subjecting themselves to different input lag over time, I doubt it will ever happen. This experiment just focuses on short term effects, and I will acknowledge that players can likely perform better if they can have time to adjust to input lag.
Explaining the analysis
This will contain quite a few terms that you won't know if you don't have a scientific background, so you might have to do some googling. I don't think that I would do a particularly good job at explaining those anyway, so I'll leave it like that.
There are a couple of things that have to be taken into account for the analysis. Because we are not in a laboratory setting, we cannot control for every variable and thus the baseline input lag that everyone has on their system is not going to be the same. Furthermore, we don't exactly know what each player's input lag is, as we can't measure it for everyone. The baseline input lag is still very important. As you can imagine, if I play with 1 second of lag and add a further 50 ms, it will not decrease my likely terrible performance much further. On a system with an incredibly low input lag of 5 ms, an extra 50 ms will make a huge difference.
So we have to estimate baseline input lag in some form. For that, I've collected multiple performance metrics of the game while the experiment is going. These should, with my additional testing, allow me to very accurately determine the amount of in-game/in-engine lag there is. That's only one aspect though. Monitors and input devices are a large part of input lag too. That's why I tracked those too. Through my own testing and a couple of other high quality sources, I hope to get an accurate estimate for many of the used devices. There will be a large number of results that have to be thrown out due to having no way to reliably estimate the input lag.
With the input lag out of the way, let's talk about the things that are going to be evaluated. First the players will be split into groups of different skill levels. How large of a range one skill group is will depend on the final count of usable results at each skill level. Then we'll run a mixed design ANOVA on the three factors that we want to check for "impact" on the player: Latency, graphics, FOV. FOV here refers to the real-life field of view of the player. This depends on how large the monitor is and how far you sit away from it. Since it's constant for a player, this is our between-subject variable. Graphics and input lag are within-subject variables since every subject plays with every setting. The "impact" on the player will be 4 factors we're testing against: The score (which is a large portion accuracy with a bonus for power), just accuracy, just power, and the subjective input lag perception. A significance level of 0.05 will be used.
There is a chance that I will run further analysis beyond my thesis. However, there is obviously a good scientific reason for why anything beyond should be taken with 2 grains of salt. When you throw a bunch of tests at a dataset, you're bound to find something that isn't actually there and just random variance. It's also not a good idea to try and extract data that is clearly not about what the experiment was meant to do. So anything that I do check should not be treated as a scientific proof but as an indicator of what a proper experiment in that direction might return.