This was definitely one of those projects that seems straightforward in theory, but every step had me fumbling around in the dark trying to get things working.
The idea: 1) detect objects on screen, 2) plan movement/action, and 3) send commands to Logitech receiver. Easy, right?
This video intentionally doesn't go into too much technical detail - not sure if that's something people want or not. I tried to present enough so that you can at least understand what this bot can and can't do, and also understand some of the problems it's having.
I used off-the-shelf CV models to do the object detection. Specifically, Faster RCNN which I then abandoned for YOLOv5.
The advantage of this is that most of the work is done for you, but then you can do "transfer learning" to adapt the model to custom objects. I think the theory behind this is that a lot of the layers in the NN learns to handle feature extraction and shape recognition, and transfer learning reuses that foundation.
Anyways, it worked really well because I only trained the model for 2 hours on either my GTX 1070 or my laptop's mobile RTX 2070 (don't remember which, to be honest). If this is something you want to try yourself, YOLOv5 comes with a tutorial on doing custom training. Don't have a good GPU? I realize that as of May 2021, Nvidia GPUs are still in short supply, but Google Colab gives you access to a powerful GPU for free (at least for now). Plus, people often share their ML tutorials in the form of Google Colab notebooks which saves users from worrying about setting up their software environment.
But Python is still the PITA it was yesterday. I wrote a bunch more about my Python takeaways from this project if you want to read it.
Likie I mentioned in the video, it's not just Logitech wireless devices that can be vulnerable to eavesdropping and hijacking. The reason I used Logitech is because there is a lot of Logitech hardware out there, the protocol is relatively well understood, and it's easy* to talk to them using the CrazyRadio PA which uses the same type of Nordic chip (nrf24 series) that speaks their "Enhanced ShockBurst" ("ESB") protocol.
To be clear, even the latest Logitech wireless mice do not use encryption, making it trivial to eavesdrop and hijack. Their logic is that mouse movements don't give away sensitive data like passwords, so it's not a big deal. The security hole (now patched) that I used is that Unifying Receiver dongles will accept unencrypted keyboard keystrokes even if no keyboard is paired. Using this, you would even be able to send Ctrl-Alt-Del to a victim's computer, avoiding the issue of Ctrl-Alt-Del being a privileged shortcut.
If you want security, go wired.
* not actually easy
I think the video captured card added somewhere around 20-70ms of latency. I didn't put much effort into choosing a device - I got some generic-ish USB 3 device. (I actually bought mine off eBay. Reduce e-waste! Buy used!)
Theoretically, there should be some kind of PCI-Express capture card that can DMA frames with very low latency so I might explore that in the future.
Latency makes everything worse, especially shooting moving targets, because the actual target location has moved by the time you process the frame. Even worse, moving your mouse towards the target also changes where the target is in relation to the center of the screen, and it can get very tricky to compensate for that.
Faster RCNN: https://github.com/ShaoqingRen/faster_rcnn
How to control games with OpenCV: https://learncodebygaming.com/blog/tutorial/opencv-object-detection-in-games
Reddit: https://www.reddit.com/r/VALORANT/comments/nmb9c7/i_tried_teaching_a_computer_to_play_valorant/ and https://www.reddit.com/r/programming/comments/nmavks/i_tried_making_a_valorant_ai_using_pytorch_opencv/
This was actually me covering up some technical difficulties... I couldn't actually get the USB radio working on Windows, because Windows doesn't seem to let me directly talk to a USB device if it doesn't have a working driver, which isn't a problem on Linux. So for the footage showing computer control, the USB radio was actually plugged into another computer (off screen) running Linux which takes commands over UDP and then blasts them out with the radio. The added latency of having the separate computer is less than 2ms.
At 13:13, I talk about how "the AI thought it was looking downwards when it wasn't" but the AI actually thought it was looking upwards when it wasn't. The φ indicator on the right side shows its believed up/down pitch in degrees, where negative is up. Yet another case of getting positive and negative confused.