Non-deterministic planning

actions with uncertain outcomes where we don’t have probabilities

Motivation

Classical Planning
- ~~Deterministic events~~
- ~~Environments change only as the result of an action~~
- ~~Perfect knowledge (omniscience, the state of knowing everything)~~
- ~~Single actor (omnipotence, the quality of having unlimited or very great power)~~
MDPs
- Drop deterministism
FOND, Fully-Observable Non-Deterministic Planning
- Drop probabilities further ~~, assume nothing about the probabilities between these multiple outcomes~~
  - ~~why~~
    - ~~don’t have enough data to learn them using reinforcement learning~~
    - ~~probabilities non-stationary~~
      - ~~change by the time we learn them~~
        
        ~~games, wars~~
        
        ~~opponent can exploit figures out probability distributions~~
    - ~~a new environment~~
    - ~~Actions outcomes are not random~~
      - ~~but don’t know which will occur~~
  - ~~examples~~
    - ~~Asking someone to make you a coffee~~
      - ~~the environment could be entirely deterministic, but we just do not know what outcome will occur, so our model of the environment must be non-deterministic.~~
    - ~~Sending a drone out to scout over a hill for enemy combatans~~
    - ~~Sending a query to a web-server for a page~~
    - ~~The Triangle Tireworld~~
      - ~~a canonical example of non-deterministic (non-probabilistic) planning~~
      - ~~when the car moves, it can get a flat tire~~
      - ~~some locations have a spare tire, others do not~~
Classical vs MDP vs FOND

FOND

Definition
Non-deterministic PDDL
- (:requirements :typing :non-deterministic)
Solutions
- produce a policy, rather than a plan
  - like MDPs
  - ~~a policy, π : S → A, is a mapping from states to actions~~
    - ~~specifies which action to choose in every possible state~~
- types
  - Weak plan
    - achieves the goal for at least one possible execution
  - Strong cyclic plan
    - achieves the goal for all possible executions, possibly revisiting states
  - Strong acyclic plan
    - achieves the goal and never visits a state more than once
- fairness
  - Not all problems have strong solutions.
  - Some problems have strong cyclic solutions but not strong acyclic solutions.
    - relies on the assumption of fairness
      - If we revisit apply the same action in the same state an infinite amount of times, we will encounter each non-deterministic effect an infinite amount of times.
  - Some problems are inherently unfair
    - e.g. if the non-determinism is caused by an adversary trying to stop you reaching your goal

Solving FOND Problems

current state-of-the-art approaches
- use heuristic search to enumerate weak plans until a strong plan results
All-outcomes Determinisation of a FOND problem
- a deterministic version of the FOND problem
  - new set of actions
    - by creating a unique action for each possible outcome of a non-deterministic action
- weak plans
  - solve the all-outcomes determinisation using classical search
- strong plans
  - iterate over many search problems to ﬁnd a set of solutions that form a strong plan if one exists, or a stronger plan (compared to a weak plan) if a strong plan does not exist.
  - deadends
    - classical search algorithm close a deadend state and never re-visit it
    - FOND algorithm re-starts classical search many times
      - could re-visit a deadend state
        
        mark it and never expand the deadend states during the classical search
  - improvements
    - Using the policy
      - use the previous policy for the previous weaker plan
    - Planning locally
      - search from unexpected states towards the expected states
        
        expected states
        
        states on a weak plan
        
        unexpected states
        
        states resulting from the other non-deterministic outcomes
      - effectively the same as Using-the-policy
        
        but explicitly search for a state that in the policy
        
        search the expected states as the goal
      - not working for tireworld
    - Using relevance
    - etc.
interesting points
- the use of classical search
  - state-of-the-art FOND can make use of state-of-the-art classical search planner
- use classical search to find weak plans, iteratively build up strong plans (if possible), and can solve a non-deterministic problem,

Complexity

EXPSPACE
- Definition
  - Decision problems for which there exists a deterministic Turing machine that runs in space exponential in the size of its input
    - runs in exponential time as well
- why
  - Some of the complexity comes from the all-outcomes determinisation
    - N actions, averaged k outcomes -> N^{k} actions for all-outcomes determinisation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

9.FOND.md

9.FOND.md

Non-deterministic planning

Motivation

FOND

Solving FOND Problems

Complexity

Files

9.FOND.md

Latest commit

History

9.FOND.md

File metadata and controls

Non-deterministic planning

Motivation

FOND

Solving FOND Problems

Complexity