By inserting a proxy between Siri, running on the iPhone, and the Apple server, spoken user input can be processed to text. The first application of this breakthrough is to control the setting of a thermostat by voice command. Update - now Siri has been used to start and control a car, a TV and tweet. New Videos.
It seems that Apple's Siri is too good to leave locked up inside an iPhone 4S. Initially I wasn't greatly interested when I heard that the Siri protocol had been decoded by applidium - but I couldn't have been more wrong.
The problem that had to be solved is that Siri communicates with the server that does all of the clever stuff via a slightly modified encrypted HTTPS connection that includes your iPhone's unique identifier. What this means is that you can use the information to move the operation of Siri from your phone to another device and while this didn't seem particularly promising it turns out to be a really worthwhile idea.
Ruby programmer, Pete Lamonica, has placed a proxy in between Siri running on the iPhone and the Apple server. The result is that a user can talk into Siri and what they say is processed to text by the Apple server. The first application of this is to control the setting of a thermostat by voice command.
The approach is possible because Siri doesn't do any of the heavy lifting on the iPhone. Instead all that happens is that the audio is recorded to an Ogg Speex format and then sent to the server. This does voice recognition and returns the text complete with confidence ratings. Using the link protocol to move the whole procedure to another machine doesn't seem particularly exciting but by placing a proxy in between the iPhone and the server you can intercept the recognized speech and act on it. That is the proxy passes the data from the iPhone to the Apple server but it intercepts the reply. So when the user says "Siri increase the temperature 10 degrees" the proxy can pickup the key words and make the change to the thermostat. The proxy can also change the text that is finally delivered to Siri so that status data can be reported back to the user.
You can see it all in action in the following video:
Of course this isn't the end of the story as the proxy can be coded to do almost anything as long as the device being controlled is connected to it - and this is what the internet of things is supposed to be about. In the future you could be using Siri to control your microwave, TV or car.
Since the above was written a few days ago - yes someone has setup a similar arrangement and used it to start and control a car. This works in the same general way via a Proxy but in this case with some PHP script. The car responds to commands like "Start my car", "Lock my car" and so on. See the video:
The original Proxy server has now made available on GitHub and it is being rapidly adopted by other projects creating addins using Siri. So far there are four projects - a Hockey score reporter, a voice tweet poster, a dreambox controller and a plex TV controller. You can see a short video on each one below:
As the system works by a proxy you don't have to jailbreak the phone and Apple really shouldn't notice that anything different is happening. Not only this, but it is difficult to see how the protocol could be changed to stop it from being hijacked. It would require the next level up in secure encryption such that the proxy couldn't read the data. This isn't an easy task but clearly Apple isn't going to want to provide the computing power to allow everyone to voice control their thermostats and all the other things that beign worked on.
Skype has started to roll out the "first phase" of its live speech translating feature. It is between Spanish and English for now and there is also text translation for instant messaging in more than [ ... ]