游戏邦在:
杂志专栏:
gamerboom.com订阅到鲜果订阅到抓虾google reader订阅到有道订阅到QQ邮箱订阅到帮看

有关遗传编程的介绍及其作用

发布时间:2013-11-13 13:48:14 Tags:,,,,

作者:Petter Hansson

几年前我将自己在中学时便知道的一款纸牌游戏变成了手机应用。我为这款游戏编写了AI,但却不满意它的技术水平。因为不确信它是否值得通过进一步的手动编码而寻找更好的解决方法,我便着眼于AI算法并最终选定了遗传编程(GP)。简而言之,这帮助AI能够更加有效地运行。从概念上来讲GP很容易理解,并且从基本水平来看也很容易执行,而这便是本篇文章的要点。我还需要提醒读者们的是,这篇文章并不是为了教授你们如何编写程序。

本文由两部分组成:第一部分是介绍GP,而另一部分则是描写例子项目。

介绍GP

GP是从“遗传”开始,因为它是受到名为进化的自然过程的启发。这是通过树图处理“编程”的遗传算法(GA)的子类。通过运行一个遗传程序,你可以执行计算,作决策或广泛地执行一些事件序列。以下便是计算A + B * 7的一个遗传程序。

ccs(from gamedev)

ccs(from gamedev)

以下是另一个遗传程序,即执行了8个《星际争霸2》创建次序介绍(免责声明:这是我基于SC2网站上的描述自己理解的)。我们可以注意到这仍是一个树型,尽管它是基于一个非标准模式。除了决策节点外(其条件较难执行),对于这棵树的评估应该直截了当;“高破坏潜能”是一个模糊的概念。

ccs(from gamedev)

ccs(from gamedev)

“遗传”部分是源自这些树/程序可以被当成基于多种方式“突变”的基因组。通过不同基因组/程序连续突变,并使用适应度/选择标准,我们便能够影响怎样的任务能够推动遗传程序变成有效的解决方法。让这些独立程序解决任务将很大影响着下一代—-这便是解决方法!

下图呈现的是4个个体通过随机颜色变化和节点添加/删除而经历了5代的进化,这里的任务是关于获取更多蓝色节点。当然了,这是基于我们知道如何创造一颗巨大的蓝色之树的人为例子。

ccs(from gamedev)

ccs(from gamedev)

需要注意的是最棒的树型可能是克隆的也可能是遭受到突变,所以可能/不可能创造出一个更健康的孩子。克隆能够确保好的树型可以在不变的形式下幸存下来,但却会减慢突变(巨大的种群克隆是不必要的)。同样需要注意的是选择规则并非100%公正;这点并不重要。

最后,让我明确GP不同于遗传算法的原因。从整体上来看GA是关于通过进化而优化参数,这是一种随机搜索形式。这些参数是如何结合并不是GA所担忧的,除了像GP这样的特殊例子。另一方面GP能够提供一种不仅会进化参数值,同时还能传达这些参数值是如何结合的方法,使其成为一种更强大的学习方法。即使你不知道一个特定计算的表达式,GP将为你掌握该表达式,而不只是优化一套输入参数。

关于GP树可传达什么的例子

算术树

我想最简单也是最常用的GP用法便是让程序传达算术运算式,即基于“真实”数字的最常见表达方式。我们在很早前便着眼于这一点,但我想说的是基于一些目的而演变数学函数是一种非常强大的能力。以下是Lua面向我的游戏中的一个“中等”AI进化的评估函数:

(((((env.rounds)+((env.suit)*(env.rank)))+(env.safe))+((env.block)*(env.block)))*(((-4.0745883)+(0.025403308))+((((env.selfblock)*(env.dist))+((env.selfblock)*(env.safe)))+(math.random()))))

需要注意的是这里存在终止节点,如env.rounds,mathrandom()和常量,以及二进制非终结符+和*。注意算术树中的非终结符通常都不需要任何额外的参数—-只将其当成是输入分支。在一个线性文本文件中看到这么多括号真的太让人困惑了,所以让我们绘制出这棵树:

ccs(from gamedev)

ccs(from gamedev)

这棵树的形状是储存在内存中,并在此运行进化和模拟;线性码只是面向我的目标游戏进行优化的一个输出内容。另外一个需要注意的是常量将产生-4.0745883 + 0.025403308计算,如多余或过度增长将受到适应度函数的控制。

执行树

另外一个选择便是让一个遗传程序基于更加传统的意义代表一个计算机程序,如控制流图像。当然了,一个主要的限制与一个常见的计算机程序相比较便明确了图像必须是一棵树的事实。这可能/不可能是个问题,并且也有方法通过创造性节点类型而解决缺少循环的情况。

之前的《星际争霸2》建置指令例子便是有关执行树的典例;该例子中的执行是从根到树叶,并在中间维系着一些决策。节点可能是决策(需要带有注释的参数)或只是执行带有线性结果的指令,如“建造工人”。有些指令节点可能根本就没有孩子,因此也可以扮演树叶的角色。

一种较没用但却值得一提的选择便是,你可以颠倒流向的定义,并基于可预测的线性方向从特定的树叶到根这一过程去执行它。如果那样的话,我们便可以从更具有针对性以及更加泛化的结论中看到这种执行,而在此不同的特定开始条件将分享着同一个结果。

决策树

决策树是从根到叶子进行评估的树形图,即遵循着源自根的决策路线并最终到达树叶的结果/分类。这在非终结符中并没有任何不标准的执行—-所有的非终结符都是纯粹的决策。需要注意的是决策树算是广为人知的,如果你真的想要创造一棵决策树,那有可能存在比GP更好的算法。但是我们也值得在此提一提。

我们可以注意在决策树流向图中,非终结符代表着多个图像元素:

ccs(from gamedev)

ccs(from gamedev)

与执行树一样,决策树中的非终结符比算法树的更复杂,因为需要注释像条件或可能性等额外参数。为了进行更有效的学习,这些参数也需要进行直接突变,而不只是在节点创造中进行随机预置。

遗传操作

尽管到目前为止我们只提到克隆和突变,还有许多基本的遗传操作可以用于GP树上。

克隆

最容易理解的操作便是克隆,即送回父母/源代码树的相同副本。我将避免使用单数父母这一词,因为它与树图术语相矛盾,相反地我会在此将其称为父母源代码树。同样地我也会避免使用孩子这一词(尽管这些术语会出现在源代码中)。以下是源代码树中有关克隆操作的一个例子:

ccs(from gamedev)

ccs(from gamedev)

突变

突变包含面向一个单源树创造一些改变。在GP树中,我们能够在带有随机树的源代码树的分支上替换子树。这将覆盖大多数你能够想到的树变更情况:这将基于生成的随机树的最大深度而推动树的生长或收缩。

ccs(from gamedev)

ccs(from gamedev)

关于随机子树的最大深度似乎是2(允许最多一个非终结符)。如果是1的话,树将永远不会通过突变而生长,如果数字越大,你的树将更快速地生长并填满各种随机废话,借口与表达。

对于要求非终结符中的参数的例子,你也许会想在不替换树的前提下整合代码去改变这些参数。同样地,如果你在一棵算法树上拥有参数终止节点,你便会想要变换这些节点。考虑一个更好的解决方法经常出现在现有的有效解决方法附近,并彻底丢出一个参数而将其换成一个完全随机值伴随着子树更换并不可能取得特别的成功。

ccs(from gamedev)

ccs(from gamedev)

上面是我所使用的常数更新规则。需要注意的是大小是以指数方式改变,即从0.5到2。而符号永远都不会改变。在像A*B的表达式中,改变其中一个操作对象的符号将创造出巨大的影响—-它将彻底改变结果的符号。为了改变符号,我选择尽早使用节点更换。

交叉

你是否曾经好奇为什么自己有是有两个父母(即一个父亲一个母亲)而非只有一个?现在你便会知道答案!

突变的限制在于随机数据不可能做任何有用的事。同时,你的种群已经包含了有用的遗传程序的多样性。这里的关键在于遗传树的子树也许是这一任务有用的次要解决方法。因此,比起只是简单地用随即数据替换分支,我们可以将其换成是经过证实的有用的源代码树的子树。

就像我们所知道的,交叉就代表结合两个源代码的特征。结果可能不会始终有效,但却会生成一些相当有趣的结果。交叉将加速进化,因为不同的成功次要解决方法将在无需通过突变进行独立发展的前提下在不同的血统中被共享。

我并未在此包含相关图像,因为交叉与突变很像,除了供体树不是任意的而是随机选择的个体子树之外。

适应度与限制过度生长

通过本篇文章我们已经假设在进化背后存在一股强大的推力,引导着遗传程序的进化朝着解决一个特定任务而发展。该推理便是关于任务的适应度。

如果定义适应度具有领域特殊性,但对于GP而言,一个好的开始便是作为一个能够用于排序种群的函数。一个个体的适应度函数应该是基于它在任务中的效能(不管是为了获胜还是生存等等)。举个例子来说吧,在带有胜利/失败结果的游戏中,你可能会基于N_wins / (N_wins + N_losses)去决定适应度。

关于适应度函数还有另外一个重要元素;它可以被用于抵制过度生长。我们在之前简单地提了下过度生长,这对于一些GP设置中的进化效能来说具有危害性。考虑表达式 0 * (abs(sin(x)) ^ 5.88 + 72.21 – 28.01 * min(-827.17, x))的真正效能。实际上,比起准确的0,你可能会在进化尝试着摆脱无用的子树时看到参数进化成一线较小的值。其完整的效果可以是无用的子树贯穿种群的无限制生长,并因此减缓了进化的速度,甚至有可能更糟。

当然,问题在于游戏模拟中的AI代码的无用算法并不会遭到任何处罚,但比起尝试将其添加到模拟中(可能会打破游戏规则),我们可以添加处罚到适应度函数中。一个好的开始便是基于GP树的规格去设置处罚。举个例子来说,以下便是我在《Sevens》中所使用的适应度函数:

float getFitness() {
int size = play.getTreeSize() + send.getTreeSize();
int factor = Math.max(0, size – 20);
return getWinRatio() – factor * 0.0004f;
}

比起算法树,对于执行树来说过度生长可能不是多大的问题,因为适应度将直接受到执行指示的模拟成本的影响;在《星际争霸2》中创造一个工人并不是带有零资源。另一方面,决策节点将始终留有一个未访问的分支,在这里分支上的子树将是固定负载的智能内存。

进化

我们还要讨论另外一个重要的细节,这是关于我们如何选择遗传操作而用于每一代的个体中。首先,以下是我关于每一代较为粗糙的执行做法:

New population := {}, population = (already initialized)
Run simulation on population
Sort population according to fitness in descending order
For each genetic operation O:
N := split_O * N_population
For i = 1, N:
Source := population[i]
Emit individual in new population using O(Source)
Population := new population

split_O值应该合计为1,以确保世代间的种群具有同样规模。这些值将让你能够分配应该递交给每个遗传操作类型的每个种群大小。实际上,我使用的是10%的克隆,45%的参数突变,22.5%的树突变以及22.5%的交叉。你可以尝试看看怎样的分配更适合自己。

为什么不进行GP

因为能够真正且独立地掌握完整的函数,所以GP真的很棒不是吗?但是这却只是在某种情况下而言。让我们将面向AI代理评估函数使用算法树作为一个例子:首先,你需要明确你的函数输出意味着什么,它们将能够用于怎样的环境中。其次,你需要基于面向代理的终端节点描述环境的状态。再而,你需要思考非终端节点的设置。在数学运算(游戏邦注:你可能认为它是不必要的)的实际数值中包含所有可能的二元函数,基于有限的非终结符操作者,进化可能会变得更快且更稳定。

遗传程序带有与GA一样的劣势;为了进行学习/优化,我们必须计算带有大量个体的大规模世代。这便意味着GP在在线调整基因组方面会太慢。你需要能够通过批量处理去预先执行大量实验(离线),并包含世界的模拟。因此比起像机器人学,GP更适合游戏(因为它们拥有批量处理模拟核心)。

GP和GA都是随机搜索,将远胜于那些更适合问题空间的搜索算法,并应该被当成是杀手锏或最后的选择。AI中没有其它高招了;你必须在使用自己选择的方法或浪费时间之前进行大量调查。

实例项目

本篇文章中的代码是我在开发自己的应用时使用的Java/eclipse中的命令行模拟和进化工具。我最初应用的Lua源可能不是大家想看的。

游戏

在《Sevens》中,需要通过代理做出的决策具有非常基本的属性:当“传递”一张卡片给缺少可游戏卡片的玩家时,在两个不同环境中选择最佳卡片。所以我们可以为特定行动(可游戏的卡片)计算(实用)数值,并通过最佳数值选择行动。这是为游戏创建决策架构的有效方法,其简单性意味着我们可以通过GP进化基本的算法功能,并基于游戏状态为特定的行动和环境计算数值。

ccs(from gamedev)

ccs(from gamedev)

《Sevens》是一款高度随机化的游戏,其结果通常是根据卡片分配所决定,所以玩家碰巧赢得一轮游戏并不表示他具有优势。但是根据大数定律,如果你玩了1000次游戏,即代理A战胜B和C并赢得500,你便可以假设A是更棒的代理。警示:有些游戏在比较不同代理的技能水平时会呈现出非传递性关系,尽管我在《Sevens》中找不到这点,但是你却需要非常小心地选择竞争对手以确保他们能够匹配不同的反对派。

运行工具

工具只是通过SevenGP.main()而运行,无需使用任何参数或输入。因此在Eclipse内部运行的话你便可以修改需要的主要函数。如果你因为某些原因对此感到不满的话,请想想如果多次改变输入参数,你便不能执行这类型工具。

工具在项目目录中生成一个名为computer.txt的文件,包含Lua输出评估函数。我已经留下一个你可能感兴趣的默认生成文件。需要注意简单和中等程度,许多函数都是完全包含一个终端—-这是取决于游戏的随机属性,而0vs.1的世代便是基于这些技能水平而执行。只有基于11个世代你才能看到更多有趣的评估函数。我将专业水平标记为没有演示目的,并且它将伴随着111个世代。

如何适应自己的需求

我建议你基于自己喜欢的语言并面向自己的游戏重新编写代码(同时进行完善),如此你才能更好地进行理解。在执行中尝试着避免像Java/Lua等语言矛盾的情况,因为比起基于两个语言进行重写,分享游戏模拟代码会更有效。如果你真的关心中间树型的效能,那便没什么能够阻止你从C#中生成C#代码等内容。

你也许会发现能在JVM项目中直接重用这一代码—-尽管我们也仍会建议你们重整。我并不想要将其作为程序库,我也不想自己独自重新创造。

说到程序库,你还可以在网上找到许多现成的解决方法。如果你真的完全理解GP或需要创造标准游戏的解决方法,这便是会是更好的选择。

代码结构

sevengp包:包含主类SevenGP,即你可能想要立刻读到或用到的。

sevengp.game包:包含游戏模拟。如果你想要清楚地理解游戏规则,仔细研究这一点,但事实上你却并不需要这么做。

gp包:有些基本类和面向GP开发的界面,不大可能成为广泛用法。个体类很有趣,因为它包含了构造函数中遗传操作的执行。

sevengp.gp包:扩展至基本GP包和《Sevens》的玩家类。

sevengp.gp.alphabet包:《Sevens》的终端和非终结符。许多操作都是发生在这里,即面向对象的趋势(面向方面程序也许会更适合)。非常建议你们能够阅读这些内容。

关于“个体”术语的代码还存在一些疑惑。个体类只代表一棵树,因此Specimen类是一个真正的个体,特别是为带有两棵不同的树(游戏邦注:用于游戏和发送环境)的《Sevens》所量身定制的。不过你们可能会发现更多难以理解的术语。

结论

希望本篇文章能够帮助新手们更好地理解GP,并解决所有的困惑而更好地执行GP。在我看来,GP并不是个复杂的理念,并且在游戏玩法程序员的“军械库”中可以算得上是一个非常有用的工具。而你需要做的便是在花时间解决问题前明确这是否是最佳选择。

本文为游戏邦/gamerboom.com编译,拒绝任何不保留版权的转载,如需转载请联系:游戏邦

Light Introduction to Genetic Programming

By Petter Hansson

A few years ago I implemented a card game I knew from my high school years into a mobile app. I had written an AI for it, but I wasn’t satisfied with its skill level. Not convinced it was worth to further manually code a better solution, I looked into AI algorithms and ended up at Genetic Programming (GP). To make a long story short, it worked so well that the AI plays better than I do. GP is easy to conceptually understand and reasonably simple to implement at a basic level, which is what this article is about. Be warned, this article does not teach you how to program in general.

This article has two sections: One section to introduce GP, and one section to describe the example project.

Introduction to GP

GP starts with ‘genetic’ because it is inspired from a process in nature called evolution. It is a subclass of Genetic Algorithms (GA) that deals with “programs” represented by tree graphs. By running a genetic program, you can perform a computation, make a decision, or generally execute some sequence of events. An example is shown below of a genetic program that calculates A + B * 7.

Here’s another genetic program that executes an 8 pool Starcraft 2 build order intro (disclaimer: this is my interpretation based on a description on a popular SC2 site). Note that this is still a tree structure, although it’s drawn in a non-standard manner. How to evaluate this tree should be straightforward except for the decision nodes, whose conditions might be difficult to implement; “high damage potential” is a fuzzy concept.

The ‘genetic’ part comes from the fact that these trees/programs can be considered genomes that can be “mutated” (changed) in various ways. By having a population of different genomes/programs mutating through successive generations, while applying fitness/selection criteria, we can influence what tasks the genetic programs become good at solving. Let those individual programs that best solve the task have the largest genetic influence on the next generation – this is evolution!

The image below shows an (unrealistically tiny) population of four individuals undergoing evolution for five generations through random color change and node addition/removal, where the task is to have as many blue nodes as possible. Of course, this is a contrived example given that we know how to make a huge blue tree.

Note how the best trees are either cloned or subjected to mutation, which may or may not result in a more fit child. Cloning ensures good trees survive in unaltered form, but slow down mutation (in huge populations cloning isn’t really necessary). Also note how selection rules are not necessarily 100% fair; it doesn’t matter.

Finally, let me clarify on what makes GP different from genetic algorithms. GA in general is about optimization of parameters through evolution, a form of stochastic search. How those parameters are combined is nothing GA concerns itself with, except in specific cases such as GP. GP on the other hand provides a way of not only evolving parameter values, but also how those parameters are combined, making it a more powerful learning approach. Even if you don’t know the expression for a certain computation, GP can learn it for you rather than just optimize a set of input parameters!

Some examples of what a GP tree can represent

Arithmetic tree

The easiest and I suspect most common use of GP is to let programs represent arithmetic expressions, most commonly expressions on “real” numbers (OK, computer floating precision aside). We have already looked at this earlier, but suffice to say that the ability to evolve a mathematical function for some purpose is a powerful one. Below shows an evaluation function in Lua evolved for a “medium” level AI in my game:

(((((env.rounds)+((env.suit)*(env.rank)))+(env.safe))+((env.block)*(env.block)))*(((-4.0745883)+(0.025403308))+((((env.selfblock)*(env.dist))+((env.selfblock)*(env.safe)))+(math.random()))))

Note how there are terminals (leaves) such as env.rounds, math.random() and constants, and the binary nonterminals + and *. Note that nonterminals in arithmetic trees don’t in general need any extra parameters – just receive them as input branches. The whole thing is very confusing when read with parentheses in a linear text file, so let’s draw the tree:

This tree form is what’s actually stored in memory while evolution and simulation are running; the linear code is only an export optimized for my target game. Another interesting thing to note is the constant result computation -4.0745883 + 0.025403308, such redundancy and overgrowth (huge sub-trees of dubious utility) may be controlled with the fitness function as we shall see later.

Execution tree

Another option is to let a genetic program represent a computer program in a more traditional sense, i.e. a control flow graph. Of course, a major restriction compared to a normal computer program is the fact that the graph must be a tree. This may or may not be a problem, and there are ways to work around the lack of loops through innovative node types.

The Starcraft 2 build order example earlier is an example of an execution tree; execution in that case occurs from root to leaves with decisions in-between. Nodes may either be decisions (needing annotated parameters) or simply execute an instruction with a linear result, such as “build worker”. Some instruction nodes may have no children at all and thus effectively can function as leaves as well.

A less generally useful but mentionable option is you might actually reverse definition of flow direction and execute from a certain start leaf to the root in a predictable linear path. In that case, execution might be seen as going from a specific to a more generalized conclusion, in which different specific start conditions share a consequence.

Decision tree

Decision trees are tree graphs that are evaluated from root to leaves, following a path of decisions from the root that eventually end up with a leaf consequence/classification. This is without any non-standard execution in the nonterminals – all nonterminals are pure decisions. Note that decision trees have been studied extensively and that there are likely better algorithms than GP if you need to build a decision tree. Yet, it’s worth mentioning the possibility.

Note that in decision tree flow graph notation, nonterminals are indicated with multiple graphical elements:

Like for execution trees, nonterminals in decision trees are more complex than for arithmetic trees since extra parameters such as conditions or probabilities need to be annotated. Possibly, these parameters also need to be mutated directly for more efficient learning, rather than just being initialized randomly at node creation.

Genetic operations

Although only cloning and mutation were mentioned so far, there are in fact several fundamental genetic operations that can be applied on a GP tree.

Cloning

The easiest operation to understand is cloning, in which an identical copy of the parent/source tree is returned. I will avoid the term “parent” because it conflicts with tree graph terminology, and instead call parents source trees further on in this text. Similarly, I will avoid the term child (even though these terms are present in the source code). Anyway, here is an example of a cloning operation on a source tree:

Mutation

Mutation involves making some sort of change to a single source tree. In a GP tree, it is possible to replace a subtree on a branch of the source tree with a random tree. This will in fact cover most cases of tree alteration you can think of: It can both grow the tree or shrink the tree, depending on the maximum depth of the random tree generated.

A good max depth for the random subtree seems to be 2 (allowing at most one nonterminal). If it were 1, trees would never grow through mutation, and much larger and your trees will rapidly grow to be filled with loads of random crap, excuse the expression.

For cases where parameters on nonterminals are required, you might want to include code to mutate these parameters without replacing trees. Similarly, if you have constant terminals in an arithmetic tree, you might want to controllably mutate these. Consider that a better solution often lies close to an already good solution, and completely throwing a constant out of the window to replace it with a completely random value as occurs with subtree replacement is not likely to be particularly successful.

Above is the constant update rule I use. Note that the magnitude is exponentially altered, from 0.5 to 2. The sign is intentionally never changed. In expressions such as A * B, it turns out that changing the sign of one of the operands has a rather dramatic effect – it changes the sign of the result as well. For changing signs, I instead rely on the node replacement covered earlier.

Crossover

Did you ever wonder why you have two biological parents and not one? Now you will know!

The limitation of mutation is that random data isn’t likely to do anything useful. Meanwhile, your population hopefully already contains a diversity of useful genetic programs. A key insight here is that the subtrees of a genetic tree may be useful sub-solutions to the task. Thus, rather than replacing a branch with random data, we replace it with a subtree of a proven useful source tree.

Crossover, as this is known, means you combine the traits of two source trees. The results will not always be successful, but overall the chances of something interesting resulting are comparatively good. Crossover will speed up evolution since different successful sub-solutions will be shared among different lineages without having to develop independently through mutation.

I didn’t include an image because crossover looks similar to mutation, except the donor tree isn’t random, but a randomly selected subtree of an individual.

Fitness and limiting overgrowth

Throughout this article we have assumed that there is a driving force behind evolution, guiding the evolution of genetic programs towards solving a particular task. That force is the fitness for the task.

How you define fitness is domain specific, but for GP a good start is as a function that can be used to order the population. The fitness function of an individual should be based on how efficient it is at the task (whether it is to win, to survive, etc). For instance, in a game with win/lose outcomes you might base fitness on N_wins / (N_wins + N_losses).

There is another important aspect of the fitness function; it can be used to protect against overgrowth. Overgrowth was briefly mentioned earlier, and it is a huge danger to the efficiency of evolution in some GP setups. Consider how efficient the expression 0 * (abs(sin(x)) ^ 5.88 + 72.21 – 28.01 * min(-827.17, x)) really is. In practice, rather than exactly 0, you might see constants evolving to some very small values when evolution tries to get rid of useless subtrees. The full effect of this can be unlimited growth of useless subtrees throughout the population, slowing down evolution to snail’s pace, or worse.

The problem, of course, is that there’s no penalty for useless arithmetic in the AI code included in the game simulation. But rather than trying to add that to the simulation (which would break the game rules), we can add the penalty to the fitness function. A good start (if rough) is basing the penalty on the size of the GP tree. For instance, here’s the fitness function I use in Sevens:

float getFitness() {
int size = play.getTreeSize() + send.getTreeSize();
int factor = Math.max(0, size – 20);
return getWinRatio() – factor * 0.0004f;
}

Overgrowth might be less of a problem with execution trees than arithmetic trees since fitness will be influenced directly by the simulation cost of executing instructions; building a worker does not take zero resources in reference to Starcraft 2. On the other hand, decision nodes might consistently leave a branch unvisited in which case the subtree on that branch will be dead weight memory-wise.

Evolution

There’s another important detail yet to discuss, and it is how we select what genetic operations to apply to the individuals in each generation. First, here’s how my rather crude implementation works (approximately) for each generation:

New population := {}, population = (already initialized)
Run simulation on population
Sort population according to fitness in descending order
For each genetic operation O:
N := split_O * N_population
For i = 1, N:
Source := population[i]
Emit individual in new population using O(Source)
Population := new population

The split_O values should add up to 1 to keep the population at the same size between generations. These values let you assign how large of a percentage of each population should be submitted to each type of genetic operation. In practice, I use 10% cloning, 45% constant mutation, 22.5% tree mutation and 22.5% crossover. You’ll have to experiment with what works for you.

Why not do GP

GP is great since it (for example) can really learn entire functions independently, right? Yes, but only under certain conditions. To use arithmetic trees for AI agent evaluation functions as an example: First, you need to specify what the outputs of your functions mean, and in what context they will apply. Secondly, you need to describe the state of the environment with terminal nodes accurately for agents to be able to make informed decisions. Thirdly, you need to think about the set of nonterminal nodes. Including all possible binary functions on real numbers in mathematics you can think of is not necessarily useful and evolution will likely be much faster and stable with a limited set of nonterminal operators.

Genetic programs share the same weakness with GA in general; a large number of generations with a large number of individuals must be computed for any learning/optimization to happen. This pretty much means GP is too slow for any online adjustment to the genomes. You need to be able to do batch-processing to execute a huge number of trials in advance (offline), and that entails having a simulation of the world. Consequently, GP is more suitable for games (on the condition they have batch processable simulation cores) than e.g. robotics, for which it is harder to produce simulations of the environment the agents operate in.

GP and GA are random searches that will be outclassed by search algorithms that are better suited to the problem space, and should be seen as a last resort or “I don’t care if it takes longer”-option. There are no silver bullets in AI; always investigate a lot before picking your approach or you end up wasting your time.

Example project

The code attached with this article is the command line simulation and evolution tool in Java/Eclipse I developed in tandem with my app. The Lua source of my original app is not something anyone would like to view anyhow.

The game

In the game of Sevens, the decisions that need to be made by an agent are of a very basic nature: Select the best card in two different contexts, when playing or “sending” a card to a player lacking a playable card. So, we can calculate a (utility) value for a certain action (a playable card), and we can pick the action with the best value. This is a reasonable way of modelling the decision structure of the game, and its simplicity means we can evolve basic arithmetic functions with GP for calculating the values from the state of the game for a particular action and context.

Which one???

Sevens is a highly stochastic game in which the outcome is often decided by the cards dealt, so the fact that a player happens to win a single round does not imply superiority. However, the law of large numbers applies and if you play 1000 games of which agent A pitted against B and C wins 500 you can reasonably assume A is the better agent. A word of warning though: Some games display non-transitive relations when comparing skill levels of various agents, and although I couldn’t find this to be the case in Sevens, it may make it necessary to be very careful in how you select agents to be pitted against one-another to ensure they meet varied opposition.

Running the tool

The tool simply executes from SevenGP.main() without taking any parameters or input. Thus, it’s preferable to work from within Eclipse so you can modify the main function as needed. If you feel an urge to rant about this, consider that you will not be executing this kind of tool with changed input parameters very many times. You can download Eclipse for free here.

The tool generates a file called computer.txt in the project directory, containing the Lua exported evaluation functions. I’ve left a default generated file there for your interest. Notice how at both easy and medium, many of the functions are downright trivial consisting of a single terminal – this is due to the stochastic nature of the game and the fact 0 versus 1 generations were executed for these skill levels. Only at hard with 11 generations you start seeing more interesting evaluation functions. I commented out the expert level as it’s not necessary for demo purposes and it takes a while with 111 generations.

How to adapt for own needs

I would recommend you rewrite the code in your favorite language and for your own game (while making improvements), in order to understand it. Try to avoid language mismatches like the Java/Lua mismatch in this implementation, since it is certainly preferable to be able to share game simulation code rather than rewriting it in two languages. Nothing prevents you from generating e.g. C# code from C# (heck, in .NET even at runtime if desirable) if you’re concerned about the efficiency of the intermediate tree representation.

You might find it possible to directly reuse this code in JVM projects – feel free to do so, although refactoring is still recommended. I did not intend this to be used as a library and I have no desire to start reworking it into one myself.

Speaking of libraries, there probably are a couple to be found on the Internet in case you want to use a ready-made solution rather than do-it-yourself. If you feel confident you understand GP sufficiently or need a solution for a production level game, it’s likely the better option.

Structure of code

sevengp package: Contains the main class SevenGP which you’ll probably want to get reading/playing around with right away.

sevengp.game package: Contains the game simulation. Read up on this if you want to understand the game rules clearly, but you don’t really need to.

gp package: Some base classes and interfaces for GP development, not particularly likely to be of generic use nonetheless. The Individual class is quite interesting, since it contains implementations of the genetic operations in its constructors.

sevengp.gp package: Extensions to the base GP package and player class for Sevens.

sevengp.gp.alphabet package: Terminals and nonterminals for Sevens. Many operations reside here in object oriented fashion (aspect oriented programming might have been more appropriate). Reading these are quite recommended.

There’s some confusion in the code regarding the term “individual”. The Individual class only represents a single tree, whereas the Specimen class is a true individual, specifically tailored for the game of Sevens with two different trees for play and send contexts. There are more terms that might be confused, as a disclaimer.

Final words

Hopefully this helps beginners to understand GP a bit better, and possibly giving them that final nudge that causes all pieces of the puzzle to fall in the right places and give sufficient understanding to implement GP. In my (subjective) opinion, GP isn’t a difficult concept at all, and can be a very useful tool in the gameplay programmer’s arsenal. Just make sure it’s the best option before you spend time applying it to a problem.(source:gamedev)


上一篇:

下一篇: