Synthesis of Provably Correct Autonomy Protocolsfor Shared ControlMurat Cubuktepe, Nils Jansen, Mohammed Alsiekh, Ufuk TopcuAbstract—We develop algorithms to synthesize shared controlprotocols subject to probabilistic temporal logic specifications.More specifically, we develop a framework in which a humanand an autonomy protocol can issue commands to carry outa certain task. We blend these commands into a joint inputto the robot. We model the interaction between the humanand the robot as a Markov decision process that representsthe shared control scenario. Additionally, we use randomizedstrategies in a Markov decision process to incorporate potentialrandomness in human’s decisions, which is caused by factors suchas complexity of the task specifications and imperfect interfaces.Using inverse reinforcement learning, we obtain an abstractionof the human’s behavior as a so-called randomized strategy. Wedesign the autonomy protocol to ensure that the robot behavior—resulting from the blending of the commands—satisfies givensafety and performance specifications in probabilistic temporallogic.

Additionally, the resulting strategies generate behavior assimilar to the behavior induced by the human’s commands aspossible. We formulate the underlying problem as a quasiconvexoptimization problem, which is solved by checking feasibilityof a number of linear programming problems. We show theapplicability of the approach through case studies involvingautonomous wheelchair navigation and unmanned aerial vehiclemission planning.

- Thesis Statement
- Structure and Outline
- Voice and Grammar
- Conclusion

I. INTRODUCTIONWe study the problem of shared control, where a robotis to execute a task according to the goals of a humanoperator while adhering to additional safety and performancerequirements. Applications of such human-robot interactioninclude remotely operated semi-autonomous wheelchairs 13,robotic teleoperation 16, and human-in-the-loop unmannedaerial vehicle mission planning 9. Basically, the humanoperator issues a command through an input interface, whichmaps the command directly to an action for the robot. Theproblem is that a sequence of such actions may fail toaccomplish the task at hand, due to limitations of the interfaceor failure of the human in comprehending the complexity ofthe problem. Therefore, a so-called autonomy protocol providesassistance for the human in order to complete the task accordingto the given requirements.At the heart of the shared control problem is the design ofan autonomy protocol. In the literature, there are two maindirections, based on either switching the control authorityM.

Cubuktepe and U. Topcu are with the Department of Aerospace Engi-neering and Engineering Mechanics, University of Texas at Austin, 201 E 24thSt, Austin, TX 78712, USA. Nils Jansen is with the Department of SoftwareScience, Radboud University Nijmegen, Comeniuslaan 4, 6525 HP Nijmegen,Netherlands.

Mohammed Alsiekh is with the Institute for ComputationalEngineering and Sciences, University of Texas at Austin, 201 E 24th St,Austin, TX 78712, USA. email:({mcubuktepe,malsiekh,utopcu}@utexas.edu,n.

[email protected]).between human and autonomy protocol 26, or on blendingtheir commands towards joined inputs for the robot 7, 15.One approach to switching the authority first determinesthe desired goal of the human operator with high confidence,and then assists towards exactly this goal 8, 18. In 12,switching the control authority between the human and au-tonomy protocol ensures satisfaction of specifications that areformally expressed in temporal logic. In general, the switchingof authority may cause a decrease in human’s satisfaction, whousually prefers to retain as much control as possible 17.The second direction is to provide another command inaddition to the one of the human operator.

To introduce amore flexible trade-off between the human’s control authorityand the level of autonomous assistance, both commands arethen blended to form a joined input for the robot. A blendingfunction determines the emphasis that is put on the autonomyprotocol in the blending, that is, regulating the amount ofassistance provided to the human. Switching of authority canbe seen as a special case of blending, as the blending functionmay assign full control to the autonomy protocol or to thehuman. In general, putting more emphasis on the autonomyprotocol in blending leads to greater accuracy in accomplishingthe task 6, 7, 20.

However, as before, humans may preferto retain control of the robot 16, 17. None of the existingblending approaches provide formal correctness guaranteesthat go beyond statistical confidence bounds. Correctness hererefers to ensuring safety and optimizing performance accordingto the given requirements. Our goal is to design an autonomyprotocol that admits formal correctness while rendering therobot behavior as close to the human’s inputs as possible.We model the behavior of the robot as a Markov decisionprocess (MDP) 23, which captures the robot’s actions inside apotentially stochastic environment. Problem formulations withMDPs typically focus on maximizing an expected reward (or,minimizing the expected cost).

However, such formulations maynot be sufficient to ensure safety or performance guarantees ina task that includes a human operator. Therefore, we designthe autonomy protocol such that the resulting robot behavioris formally verified with respect to probabilistic temporal logicspecifications. Such verification problems have been extensivelystudied for MDPs 2.Take as an example a scenario involving a semi-autonomouswheelchair 13 whose navigation has to account for a randomlymoving autonomous vacuum cleaner, see Fig. 1.

The wheelchairneeds to navigate to the exit of a room, and the vacuumcleaner moves in the room according to a probabilistic transitionfunction. The task of the wheelchair is to reach the exit gatewhile not crashing into the vacuum cleaner. The human may not(a) Autonomy perspective0.40.

40.20.2(b) Human perspectiveFig. 1. A wheelchair in a shared control setting.fully perceive the motion of the vacuum cleaner.

Note that thehuman’s commands, depicted with the solid red line in Fig 1(a),may cause the wheelchair to crash into the vacuum cleaner.The autonomy protocol provides another set of commands,which is indicated by the solid red line in Fig 1(b), to carryout the task safely without crashing. However, the autonomyprotocol’s commands deviate highly from the commands ofthe human. The two sets of commands are then blended intoa new set of commands, depicted using the dashed red linein Fig 1(b).

The blended commands perform the task safelywhile generating behavior as similar to the behavior inducedby the human’s commands as possible.In 15, we formulated the problem of designing theautonomy protocol as a nonlinear programming problem. How-ever, solving nonlinear programs is generally intractable 3.Therefore, we proposed a greedy algorithm that iterativelyrepairs the human strategy such that the specifications aresatisfied without guaranteeing optimality, based on 22. Here,we propose an alternative approach for the blending of the twostrategies. We follow the approach of repairing the strategyof the human to compute an autonomy protocol. We ensurethat the resulting behavior induced by the repaired strategy (1)deviates minimally from the human strategy and (2) satisfiesgiven temporal logic specifications after blending. We formallydefine the problem as a quasiconvex optimization problem,which can be solved efficiently by checking feasibility of anumber of convex optimization problems 4.

A human may be uncertain about which command to issuein order to accomplish a task. Moreover, a typical interfaceused to parse human’s commands, such as a brain-computerinterface, is inherently imperfect. To capture such uncertaintiesand imperfections in the human’s decisions, we introducerandomness to the commands issued by humans. It may notbe possible to blend two different deterministic commands. Ifthe human’s command is “up” and the autonomy protocol’scommand is “right”, we cannot blend these two commandsto obtain another deterministic command.

By introducingrandomness to the commands of the human and the autonomyprotocol, we therefore ensure that the blending is always well-defined. In what follows, we call a formal interpretation of asequence of the human’s commands the human strategy, andthe sequence of commands issued by the autonomy protocolthe autonomy strategy.The question remains how to obtain the human strategy inthe first place. It may be unrealistic that a human can providethe strategy for an MDP that models a realistic scenario. To thisend, we create a virtual simulation environment that capturesthe behavior of the MDP. We ask humans to participate in twocase studies to collect data about typical human behavior. Weuse inverse reinforcement learning to get a formal interpretationas a strategy based on human’s inputs 1, 28.

We modela typical shared control scenario based on an autonomouswheelchair navigation 13 in our first case study. In our secondcase study, we consider an unmanned aerial vehicle missionplanning scenario, where the human operator is to patrol certainregions while keeping away from enemy aerial vehicles.In summary, the main contribution this paper is to synthesizethe autonomy protocol such that the resulting blended orrepaired strategy meets all given specifications while onlyminimally deviating from the human strategy.

We introduce allformal foundations that we need in Section II. We provide anoverview of the general shared control concept in Section III.We present the shared control synthesis problem and providea solution based on linear programming in Section IV. Weindicate the applicability and scalability of our approach onexperiments in Section V and draw a conclusion and critiqueof our approach in Section VI.II.

PRELIMINARIESIn this section, we introduce the required formal models andspecifications that we use to synthesize the autonomy protocol,and we give a short example illustrating the main concepts.1) Markov Decision Processes: A probability distributionover a finite set X is a function µ : X ? 0, 1 ? R with?x?X µ(x) = µ(X) = 1. The set X of all distributions isDistr(X).Definition 1 (MDP). A Markov decision process (MDP)M =(S, sI ,A,P) has a finite set S of states, an initial state sI ? S,a finite set A of actions, and a transition probability functionP : S ×A? Distr(S).MDPs have nondeterministic choices of actions at thestates; the successors are determined probabilistically via theassociated probability distribution.

We assume that all actionsare available at each state and that the MDP contains nodeadlock states. A cost function C : S ×A? R?0 associatescost to transitions. If there one single action available at eachstate, the MDP reduces to a discrete-time Markov chain (MC).We use strategies resolve the choices of actions in order todefine a probability measure and expected cost on MDPs.Definition 2 (Strategy).

A randomized strategy for an MDPMis a function ? : S ? Distr(A). If ?(s, ?) = 1 for ? ? A and?(s, ?) = 0 for all ? ? A {?}, the strategy is deterministic.The set of all strategies over M is StrM.Resolving all the nondeterminism for an MDP M with astrategy ? ? StrM yields an induced Markov chain M? .s0 s1 s2s3 s4abcd0.60.

40.40.60.60.40.40.

6111(a) MDP Ms0 s1 s2s3 s40.50.50.50.5111(b) Induced MC M?1Fig. 2. MDP M with target state s2 and induced MC for strategy ?unifDefinition 3 (Induced MC). For an MDP M = (S, sI ,A,P)and strategy ? ? StrM, the MC induced by M and ? isM? = (S, sI ,A,P?) whereP?(s, s?) =???A(s)?(s, ?) · P(s, ?, s?) for all s, s? ? S .

The occupancy measure of a strategy ? gives the expectednumber of taking an action at a state in an MDP. In ourformulation, we use the occupancy measure of a strategy tocompute an autonomy protocol.Definition 4 (Occupancy Measure). The occupancy measurex? of a strategy ? is defined asx?(s, ?) = E ??t=0P (?t = ?|st = s)(1)where st and ?t denote the state and action in an MDP attime step t.

The occupancy measure x?(s, ?) is the expectednumber of taking the action ? at state s with the strategy ?.2) Specifications: A probabilistic reachability specificationP??(?T ) with the threshold ? ? 0, 1 ? Q and the set oftarget states T ? S puts an upper bound ? on the probabilityto reach T from sI in M. Similarly, expected cost propertiesE??(?G) restrict the expected cost to reach the set G ? Sof goal states by an upper bound ? ? Q. We also use untilproperties of the form Pr??(¬T U G), which assert that theprobability of reaching the set of states G while not reachingthe set of target states T beforehand is at least ?.

The synthesis problem is to find one particular strategy ?for an MDP M such that given specifications ?1, . . . , ?n aresatisfied in the induced MC M? , written ? |= ?1, . .

. , ?n.Example 1. Fig. 2(a) depicts an MDP M with initial states0. In state s0, the avaliable actions are a and b. Similarlyfor state s1, the two avaliable actions are c and d. If action ais selected in state s0, the agent transitions to s1 and s3 withprobabilities 0.

6 and 0.4. For states s2, s3 and s4 we omitactions, because of the self loops.For a safety specification ? = P?0.21(?s2), the deterministicstrategy ?1 ? StrM with ?1(s0, a) = 1 and ?1(s1, c) =1 induces the probability 0.36 of reaching s2. Therefore,the specification is not satisfied, see the induced MC inFig.

2(b). Likewise, the randomized strategy ?unif ? StrMwith ?unif(s0, a) = ?unif(s0, b) = 0.5 and ?unif(s1, c) =HumanStrategyAutonomyStrategyBlendedStrategyRobotexecutioncommandcommandblended commandBlendingfunction bFormalmodel MrSpecifications?1, . .

. , ?nHumanstrategyFig. 3.

Shared control architecture.?unif(s1, d) = 0.5 violates the specification, as the probabilityof reaching s2 is 0.25. However, the deterministic strategy?safe ? StrM with ?safe(s0, b) = 1 and ?safe(s1, d) = 1 inducesthe probability 0.16, thus ?safe |= ?.III.

CONCEPTUAL DESCRIPTION OF SHARED CONTROLWe now detail the general shared control concept adopted inthis paper. Consider the setting in Fig. 3.

As inputs, we have aset of task specifications, a model Mr for the robot behavior,and a blending function b. The given robot task is described bycertain performance and safety specifications ?1, . . . , ?n. Forexample, it may not be safe to take the shortest route becausethere may be too many obstacles in that route. In order tosatisfy performance considerations, the robot should prefer totake the shortest route possible while not violating the safetyspecifications.

We model the behavior of the robot inside astochastic environment as an MDP Mr.In out setting, a human issues a set of commands for therobot to execute. It may be unrealistic that a human cangrasp an MDP that models a realistic shared control scenario.Indeed, a human will likely have difficulties interpreting alarge number of possibilities and the associated probabilityof paths and payoffs 11, and it may be impractical for thehuman to provide the human strategy to the autonomy protocol,due to possibly large state space of the MDP. Therefore,we compute a human strategy ?h as an abstraction of asequence of human’s commands, which we obtain using inversereinforcement learning 1, 28.We design an autonomy protocol that provides anotherstrategy ?a, which we call the autonomy strategy. Then, weblend the two strategies according to the blending function binto the blended strategy ?ha.

The blending function reflectspreference over the human strategy or the autonomy strategy.We ensure that the blended strategy deviates minimally fromthe human strategy.At runtime, we can then blend commands of the human withcommands of the autonomy strategy. The resulting “blended”commands will induce same behavior as with the blendedstrategy ?ha, and the specifications are satisfied. This procedurerequires to check feasibility of a number of linear programmingproblems, which can be expensive, but it is very efficient toimplement at run time.The shared control synthesis problem is then the synthesisof the repaired strategy such that, for the repaired strategy ?ha,it holds ?ha |= ?1, . .

. , ?n while deviating minimally from ?h.The deviation between the human strategy ?h and the repairedstrategy ?ha is measured by the maximal difference betweenthe two strategies.