Teaching a Computer to Play Mario Bros… And The Security Implications

Guy teaches his computer to play Super Mario Bros… neat!

He does his teaching via a method that’s a little unusual. The program looks at some “success metrics” (things like score and level count) and then just mashes buttons until it figures out what works best.

In fact, this method is very reminiscent of the “FPGAs learning to exploit quantum effects” article I linked to a while back… and the results are eerily similar.

Like the FPGAs, the Super Mario Bros algorithm isn’t bad at playing the game. Unlike a human player, though, the algorithm starts discovering bugs in the game, and exploiting them. Kind of like how the FPGAs started exploiting bizarre on-chip phenomena that no human designer could devise.

Security implications: It’s not hard to imagine [redacted] building an algorithm on this principle to [redacted] [redacted] [use your imagination].

Prediction: if AIs get prevalent in the “real world” — I’m looking at you, Big Data — either we humans learn to exploit the “bugs” in the real world better than they do, or we die :/

http://hackaday.com/2013/04/14/teaching-a-computer-to-play-mario-seemingly-through-voodoo/ http://www.cs.cmu.edu/~tom7/mario/

“Some people know [Tom Murphy] as [Dr. Tom Murphy VII Ph.D.] and this hack makes it obvious that he earned those accolades. He decided to see if he could teach a computer to win at Super Mario Bros. But he went about it in a way that we’d bet is different that 99.9% of readers would first think of. The game doesn’t care about Mario, power-ups, or really even about enemies. It’s simply looking at the metrics which indicate you’re doing well at the game, namely score and world/level.

The link above includes his whitepaper, but we think you’ll want to watch the 16-minute video (after the break) before trying to tackle that. In the clip he explains the process in laymen’s terms which so far is the only part we really understand (hence the reference to voodoo in the title). His program uses heuristics to assemble a set of evolving controller inputs to drive the scores ever higher. In other words, instead of following in the footstep of Minesweeper solvers or Bejeweled Blitz bots which play as a human would by observing the game space, his software plays the game over and over, learning what combinations of controller inputs result in success and which do not. “

%d bloggers like this: