FaceFX TTS Developer Guide

 

Welcome to the FaceFX TTS Developer Guide.  FaceFX has partnered with Cereproc to combine world-class text-to-speech technology with industry-leading facial animation technology.  This document has all of the information you need to get started developing cutting-edge talking character applications.

Developer ID

To get started, simply request a developer id by sending a request to info@oc3ent.com.  Each developer id will be initialized with 10,000 credits (one character = one credit), and will be refreshed to 10,000 once a month.  To purchase additional credits or inquire about partnerships or bulk-purchasing, please contact info@oc3ent.com.   To find out how many credits your developer key has remaining, check out the Developer Demo.

Sending a Request

 

Requests are sent to https://ttsanim.facefx.com/request.php  or http://ttsanim.facefx.com/request.php

The following variables should be present as url variables or HTML form variables.

*         developer_id   - Your developer id.  To keep it secure, only use form variables to send it, and only connect with https.

*         text  -  The text you want spoken.  Maximum 1024 characters.

*         voice  - The Cereproc voice you want to use.  Defaults to Sarah.  This also determines the language.  Valid voices are: Adam, Nathan, Isabella, Katherine, Hannah, Heather, Kirsty, Stuart, Sarah, William, Jack, Jess, Caitlin,Sue, Nicole, Dodo, Claire, Anne, Suzanne, Laura, Laia, Sara, Alex, Gudrun, Leopold, Lucia, Gabriel, Yuki, Laurent, Nuria, Giles, Lauren.   For more information, please check out the Cereproc Cloud Services Guide: https://www.cereproc.com/files/CereVoiceCloudGuide.pdf

*         format The audio format you want to use.  Defaults to ogg.  Acceptable values are ogg, wav, mp3.

A very simple (and insecure) request would look like:

http://ttsanim.facefx.com/request.php?text=hello&developer_id=YOUR_DEVELOPER_ID

Receiving a Response

Responses are returned in JSON format.  They look something like:

{ "audio":"http://link-to-audio.ogg", “anim":"http://link-to-animation.json”}

Responses can have the following elements:

*         audio – a link to the audio file that was generated

*         anim – a link to the animation that was generated in json format (see below)

*         error – if present, the request was not successful and details of the error can be found here

*         animcontents– if present, this contains the contents of the animation JSON file referenced above.  It is not always present, but can reduce your application’s latency when it is available. 

Animation DATA FORMAT

The animation data contains a dictionary of animation curves.  The dictionary key is the curve name.  The dictionary elements are an array of float values representing the keys.  The number of floats is always a multiple of 4, because each key contains four values: (time, values, slope-in, slope-out).  Keys should be evaluated with a hermite function.  The time variable is in seconds.

A simple animation JSON file would look like:

{ "curves": { "open":[0, 0, 0, 0, 1, .5, 0, 0], “ShCh”:[2,0,0,0]}}

This animation has two curves, open, and ShCh.  Open has two keys, the first at time zero with zero value, the second at one second with the value .5.  ShCh has a single key at time 2 seconds.  All slopes in this example are 0.

The curves that are output correspond to the targets in the FaceFX default character setup.  They include targets for moving the mouth, as well as head and eye rotations, blinks, squints, and eyebrow raises.  You can make any 3D character talk with these targets (including non-human characters).

Example Integration in Unity

An example implementation has been provided with the Developer Demo or the Pandorabot Demo which brings the character to life using Pandorabot technology. You can download source code to the demo to see how to communicate with the server, parse the animation results and use them to drive a character in a Unity-based 3D application.