Tag Archives: speech

iMorphia – Navigation

At the last workshop, a number of participants expressed the desire to be able to  enter into the virtual scene. This would be difficult in the 2D environment of PopUpPlay but totally feasible with iMorphia, implemented in the 3D Games Engine, Unity.

Frank Abbott, one of the participants, suggested the idea of architectural  landscape navigation, with a guide acting as a story teller and that the short story “The Domain of Arnheim” by Edgar Allen Poe might be inspirational  in developing navigation within iMorphia.

The discussion continued with recollections of the effectiveness of early narrative  and navigational driven computer games such as “Myst”.

Steve Dixon in “Digital Performance ” suggests four types of performative interaction  with technology (Dixon, 2007, p. 563):

  1. Navigation
  2. Participation
  3. Conversation
  4. Collaboration.

The categories are ordered in terms of complexity and depth of interaction, 1 being the simplest and 4 the more complex. Navigation is where the performer steers through the content,  this might be spatially as in a video game or via hyper links. Participation is where the performer undergoes an exchange with the medium. Conversation is where the performer and the medium undergo a back and forth dialogue. Collaboration is where participants and media interact produce surprising outcomes, as in improvisation.

It is with these ideas I began investigating the possibility of realising performative navigation in iMorphia. First I added a three dimensional landscape, ‘Tropical Paradise’ an asset supplied with an early version  of Unity (v2.6, 2010).


Some work was required fixing shaders and scripts in order to make the asset run with the later version of Unity (v4.2, 2013) I was using.

I then began implementing control scripts that would enable a performer to navigate the landscape, the intention being to make navigation feel natural, enabling the unencumbered performer to seamlessly move from  a conversational mode to a navigational one. Using the Kinect Extras package I explored combinations of spatial location, body movement, gesture and voice.

The following three videos document these developments. The first video demonstrates the use of gesture and spatial location , the second body orientation combined with  gesture and voice and the third voice and body orientation with additional animation to enhance the illusion that the character is walking rather than floating through the environments.

Video 1: Gesture Control

Gestures: left hand out = look left, right hand out = look right, hand away from body = move forwards, hand pulled in = move backwards, both  hands down = stop.

Step left or right = pan left/right.

The use of gesture to control the navigation proved problematic, it was actually very difficult to follow a path in the 3D world, and gestures were sometimes incorrectly recognised (or performed) resulting in navigational difficulties where a view gesture acted as a movement command or vice versa.

In addition the front view of the character did not marry well with the character moving into the landscape.

Further scripting and upgrading of the Kinect assets  and Unity to v4.6 enabled the successful implementation of a combination of speech recognition, body and gesture control.

Video 2: Body Orientation, Gesture and Speech Control

Here the gesture of both hands out activates view control, where body orientation controls the view. This was far more successful than the previous version and the following of a path proved much easier.

Separating the movement control to voice activation ( “forward”, “back”, “stop”) helped in removing gestural confusion, however voice recognition delays resulted in overshooting when one wanted to stop.

The rotation of the avatar to face the direction of movement produced a greater sense of believability that the character is moving through a landscape. The addition of a walking movement would enhance this further – this is demonstrated in the third video.

Video 3: Body orientation and Speech Control

The arms out gesture felt a little contrived and so in the third demonstration video I added the voice command “look” to activate the change of view.

Realising the demonstrations took a surprising amount of work, with much time spent scripting and dealing with setbacks and pitfalls due to Unity crashes and compatibility issues between differing versions of assets and Unity. The Unity Kinect SDK and Kinect Extras assets proved invaluable in realising these demonstrations, whilst the Unity forums provided insight, support and help when working with quaternions, transforms, cameras, animations, game objects and the sharing of scripting variables. At some point in the future I intend to document the techniques I used to create the demonstrations.

There is much room for improvement and creating the demonstrations has led to speculation as to what an ideal form of performative interaction might be for navigational control.

For instance a more natural form of gestural recognition than voice control would be to recognise the dynamic gestures that correspond to walking  forwards and backwards. According to the literature this is technically feasible, using  for instance Neural Networks  or Dynamic Time Warping, but these complex techniques are felt to be way beyond the scope of this research.

The object here is not to produce fully working robust solutions, instead the process of producing the demonstrations act as proof of concept and identify the problems and issues associated with live performance, navigation and control. The enactment and performance to camera serves to test out theory through practise and raises further questions and challenges.

Further Questions

How might navigation work with two performers?

Is the landscape too open and might it be better if constrained via fences, walls etc?

How might navigation differ between a large outside space and a smaller inside one, such as a room?

How might the landscape be used as a narrative device?

What are the differences between a gaming model for navigation  where the player(s) are generally seated looking at a screen using a mouse/keyboard/controller and a theatrical model with free movement of one or more unencumbered performers on a stage?

What are the resulting problems and issues associated with navigation and the perspective of performers and audience ?