actions with uncertain outcomes where we don’t have probabilities
- Classical Planning
Deterministic eventsEnvironments change only as the result of an actionPerfect knowledge (omniscience, the state of knowing everything)Single actor (omnipotence, the quality of having unlimited or very great power)
- MDPs
- Drop deterministism
- FOND, Fully-Observable Non-Deterministic Planning
- Drop probabilities further
, assume nothing about the probabilities between these multiple outcomeswhydon’t have enough data to learn them using reinforcement learningprobabilities non-stationarychange by the time we learn themgames, warsopponent can exploit figures out probability distributions
a new environmentActions outcomes are not randombut don’t know which will occur
examplesAsking someone to make you a coffeethe environment could be entirely deterministic, but we just do not know what outcome will occur, so our model of the environment must be non-deterministic.
Sending a drone out to scout over a hill for enemy combatansSending a query to a web-server for a pageThe Triangle Tireworlda canonical example of non-deterministic (non-probabilistic) planningwhen the car moves, it can get a flat tiresome locations have a spare tire, others do not
- Drop probabilities further
- Classical vs MDP vs FOND
- Definition
- Non-deterministic PDDL
(:requirements :typing :non-deterministic)
- Solutions
- produce a policy, rather than a plan
- like MDPs
a policy, π : S → A, is a mapping from states to actionsspecifies which action to choose in every possible state
- types
- Weak plan
- achieves the goal for at least one possible execution
- Strong cyclic plan
- achieves the goal for all possible executions, possibly revisiting states
- Strong acyclic plan
- achieves the goal and never visits a state more than once
- Weak plan
- fairness
- Not all problems have strong solutions.
- Some problems have strong cyclic solutions but not strong acyclic solutions.
- relies on the assumption of fairness
- If we revisit apply the same action in the same state an infinite amount of times, we will encounter each non-deterministic effect an infinite amount of times.
- relies on the assumption of fairness
- Some problems are inherently unfair
- e.g. if the non-determinism is caused by an adversary trying to stop you reaching your goal
- produce a policy, rather than a plan
- current state-of-the-art approaches
- use heuristic search to enumerate weak plans until a strong plan results
- All-outcomes Determinisation of a FOND problem
- a deterministic version of the FOND problem
- new set of actions
- by creating a unique action for each possible outcome of a non-deterministic action
- new set of actions
- weak plans
- solve the all-outcomes determinisation using classical search
- strong plans
- iterate over many search problems to find a set of solutions that form a strong plan if one exists, or a stronger plan (compared to a weak plan) if a strong plan does not exist.
- deadends
- classical search algorithm close a deadend state and never re-visit it
- FOND algorithm re-starts classical search many times
- could re-visit a deadend state
- mark it and never expand the deadend states during the classical search
- could re-visit a deadend state
- improvements
- Using the policy
- use the previous policy for the previous weaker plan
- Planning locally
- search from unexpected states towards the expected states
- expected states
- states on a weak plan
- unexpected states
- states resulting from the other non-deterministic outcomes
- expected states
- effectively the same as Using-the-policy
- but explicitly search for a state that in the policy
- search the expected states as the goal
- but explicitly search for a state that in the policy
- not working for tireworld
- search from unexpected states towards the expected states
- Using relevance
- etc.
- Using the policy
- a deterministic version of the FOND problem
- interesting points
- the use of classical search
- state-of-the-art FOND can make use of state-of-the-art classical search planner
- use classical search to find weak plans, iteratively build up strong plans (if possible), and can solve a non-deterministic problem,
- the use of classical search
- EXPSPACE
- Definition
- Decision problems for which there exists a deterministic Turing machine that runs in space exponential in the size of its input
- runs in exponential time as well
- Decision problems for which there exists a deterministic Turing machine that runs in space exponential in the size of its input
- why
- Some of the complexity comes from the all-outcomes determinisation
- N actions, averaged k outcomes -> N^{k} actions for all-outcomes determinisation
- Some of the complexity comes from the all-outcomes determinisation
- Definition