Marcus Orochena

Keeping things simple

Goal Oriented Action Planning Concepts

2018-01-25

Goal Oriented Action Planning is a technique used in AI development to help reduce complexity for the programmer when designing complex state transitions. GOAP allows a programmer to specify goals for an AI, and lets it figure out the best sequence of actions to achieve that goal instead of explicitly telling it what to do. In the context of game development it's very suited for games focused on emergent gameplay with complex AI interactions. That being said, it's not a silver bullet. It's relatively costly to compute and it won't add behaviors that you wouldn't be able to otherwise implement. One way to look at it is that you're trading CPU time for development time.

Commonly game AI developers use Finite State Machines to model the execution of AI behavior. FSMs are computational models that describe a series of states that a computational machine can be in, and transitions between them. A machine can only be in one state at a time, and transitions are conditionally activated, so it's easy to understand the execution flow of the machine.

To illustrate, this is an example of a very simple Zombie AI modeled as a FSM.

  • A zombie will chase a human, unless it's eating
  • A zombie will attack a human until it dies
  • If a zombie loses sight of a human, it will go back to wandering
  • If a zombie sees a corpse it will begin eating it

These states and transitions are coded directly, and an example state could look like this:

class WanderState < FSMState

# called when entering the state
def on_enter(actor)

end

# the current state's execute() method is called every frame
def execute(actor)
if actor.sees_food?
actor.transition(EatState)

elsif actor.sees_human?
actor.transition(ChaseState)

else
actor.continue_wander()
end

# called when transitioning out of the state
def on_exit(actor)
actor.speed += 10 # he's hungry!
end
end

If the zombie is in this state, he will wander until he sees food or a human and then transition into the appropriate state and increase his speed. Easy!

This is a great technique and is widely used in AI programming. However, you can already see that to add a new state, you would have to account for and create all possible transitions to and from all of the existing states. This might be fine for the example, but what if you have 20, 30, or 100? Also, any more complex sequences of transitions (for "smarter" AI) would also need to be hard coded. It's definitely possible, but can get complicated quickly.

This is where GOAP comes in. It builds upon FSMs - you're still creating the states, but you're removing the hard coded transitions. Instead, you're simply giving the AI Goals, and arming them with Actions to achieve them. GOAP will generate the sequence of actions necessary to achieve the AI's goals, without hard coding the behavior in.

How does it work? We'll go into detail below, but simple terms you can think of GOAP dynamically generating FSMs based on its goals, environment, and available actions.

Components of GOAP

Agent

This is the 'brain' of the entity that is being managed by the AI. It has a list of goals and available actions. It also usually keeps track of its environment, often implemented in terms of 'sensors', which is outside the scope of this post.

Goals

These are the primary directives of the agent. When the agent is planning it's action sequence, it's attempts to meet these goals.

Goal Examples:

Name: Has a Sword
Name: Not Hungry

Actions

These are things that an agent can do which change the state of the the world. Generally the consist of a list of preconditons which must be met to execute the action, as well as a list of postconditions (or 'effects') that happen once they've done the action. They also have an associated 'energy' cost which the Planner will use to to help determine the most efficient series of actions to achieve a goal.

Action Example:

Action: 'Mine Ore', 
Preconditions: ['Has Pick', 'Not Tired', 'Inventory not Full'], 
Effects: ['Adds 1 Ore to Inventory']

Planner

The Planner is the control center of GOAP. It takes the agent's goals, actions, and the current state of the world, then calculates the best sequence of actions to fulfill the chosen goal. Usually, this is done through a search algorithm, like A* search, which examines different sequences of actions and their outcomes to find the most efficient path to the goal.

How does the Planner pick a goal from multiple possibilities? It usually evaluates the 'urgency' or priority of each goal, and selects the one most critical to the agent at that moment. For instance, if the agent is low on health, a 'Survive' goal might take precedence over a 'Collect Resources' goal.

The Planner also incorporates 'costs' associated with each action. Costs can be represented in terms of energy, time, or any other resource. The aim is to find not just any sequence of actions but the most cost-effective one.

Planner.plan(agent):
1. Initialize open_list and closed_list as empty lists
2. Add initial_plan containing agent's current world_state to open_list
3. While open_list is not empty:
a. Pick current_plan with lowest cost from open_list
b. If current_plan meets agent's goals, return actions in current_plan
c. Generate successors of current_plan based on agent's available actions
d. Add valid successors to open_list if not already in closed_list
e. Move current_plan from open_list to closed_list
4. Return nil (if no plan found)

Implementation

class Plan
attr_reader :world_state, :actions, :cost

def initialize(world_state, actions, cost)
@world_state = world_state
@actions = actions
@cost = cost
end
end

class Planner
def plan(agent)
open_list = []
closed_list = []

initial_plan = Plan.new(agent.world_state.clone, [], 0)
open_list.push(initial_plan)

while !open_list.empty?
current_plan = open_list.min_by(&:cost)

if meets_goal?(current_plan.world_state, agent.goals)
return current_plan.actions
end

generate_successors(current_plan, agent.actions).each do |successor|
unless closed_list.include?(successor)
open_list.push(successor)
end
end

open_list.delete(current_plan)
closed_list.push(current_plan)
end

return nil
end

def meets_goal?(world_state, goals)
goals.all? { |goal| world_state[goal.name] }
end

def generate_successors(current_plan, available_actions)
successors = []
available_actions.each do |action|
if action.preconditions.all? { |pre| current_plan.world_state[pre] }
new_world_state = current_plan.world_state.clone
action.effects.each { |eff| new_world_state[eff] = true }

new_actions = current_plan.actions.clone.push(action.name)
new_cost = current_plan.cost + 1 # Assuming a uniform cost for simplicity

successors.push(Plan.new(new_world_state, new_actions, new_cost))
end
end
successors
end
end

class Agent
attr_accessor :goals, :actions, :world_state

def initialize
@goals = []
@actions = []
@world_state = {}
end
end

class Goal
attr_reader :name

def initialize(name)
@name = name
end
end

class Action
attr_reader :name, :preconditions, :effects

def initialize(name, preconditions, effects)
@name = name
@preconditions = preconditions
@effects = effects
end
end

# Initialize Agent, Goals, Actions, and World State
agent = Agent.new
agent.world_state = { 'Has Pick' => true, 'Not Tired' => true, 'Inventory not Full' => true }
agent.goals << Goal.new("Has a Sword")
agent.goals << Goal.new("Not Hungry")
agent.actions << Action.new('Mine Ore', ['Has Pick', 'Not Tired', 'Inventory not Full'], ['Adds 1 Ore to Inventory'])

# Initialize Planner and Generate Plan
planner = Planner.new
plan = planner.plan(agent)

# Output
puts "Agent's Goals:"
agent.goals.each { |goal| puts " - #{goal.name}" }

puts "\nAgent's Actions:"
agent.actions.each do |action|
puts " - #{action.name}"
puts " Preconditions: #{action.preconditions.join(', ')}"
puts " Effects: #{action.effects.join(', ')}"
end

puts "\nGenerated Plan:"
if plan
plan.each { |action_name| puts " - #{action_name}" }
else
puts "No plan could be generated."
end