Showing all posts tagged voiceinterface:

#tweet2voice: Sociolinguistic Experiment on Tweets

The origin of idea

During the middle of the night, I had an idea where Twitter users can include a special hashtag to have their tweets be read out loud by strangers. While it started merely as a novel idea, I believe this could provide an interesting way to transform a vast amount of text feed into human speech and add additional layer of information (such as emotional intelligence) that was not present before. Before I describe the idea further, here is the thought process I followed as I was building this.

Role of speech

Traditionally, speech function helps convey information and expressing social relationships that are outlined below:
  1. Expressive - express speaker's feelings
  2. Directive - get others to do things
  3. Referential - provide information
  4. Metalinguistic - comments on language
  5. Poetic - aesthetic language
  6. Phatic - language for solidarity and empathy

Where the inspiration came from

This experiment was partially inspired by a project I worked on called Audil, an environmental system for the visually impaired. One of the frustrations that we empathized with is that the blind people have virtually no choice when it comes to how information is disseminated to them. For example, computer speech synthesis software is used frequently throughout the day to absorb information and interact with the world. However, this technology also creates social disparity between the visually impaired and the people who are not. We felt that we can design technology in a way that brings people together rather than to simply subtitute human presence with technology.

Another inspiration came from an app called Umano. It's essentially an audiobook player app for blogs and it's very useful when your hands are preoccupied with complex tasks such as driving. Umano is a bit different than its competitor, SoundGecko, which utilizes server-sided dictation software to read articles and documents. Instead, Umano relies on professional voice actors and announcers to read the articles out loud. In terms of listening experience, computer algorithms of today still cannot compete with human's ability to fine-tune tonality, speed and pitch to make the content seemingly more interesting to our brain.

How it works
  1. Amazon Mechanical Turk worker reads instructions below.
  2. Worker opens Google Spreadsheet with latest tweets with hashtag #tweet2voice.
  3. Worker then calls toll-free number (VoIP) and reads the tweet out loud.
  4. Line2 voicemail notification email with MP3 attachment is sent.
  5. ITTT identifies email with attachment, places MP3 into Dropbox folder and then uploads MP3 to SoundCloud and Tumblr.
  6. Admin tweets SoundCloud link to the original Twitter user.

Instructions for Amazon Mechanical Turk

Summary: You will be calling a toll-free number and reading a statement out loud for the voicemail.
  1. Go to this link.
  2. Find a statement next to "No."
  3. Call the toll-free number 888-707-2925.
  4. When the voicemail beeps, begin reading the statement out loud. Please be expressive when speaking. You can simply read, exaggerate a bit or be emotional, angry, happy, funny, weird, etc.
  5. When completed, type replace "No" with "Yes" next to the statement you just spoke.
  6. Insert the current date and time (in Pacific Time Standard) under "Date & Time Submitted."
  7. Finally, check the box below and submit.
I have called the number and left a voicemail according to the instructions.

Testing HumanWare Victor Reader for Visually Impaired

During an interview with a person with visual impairment, I got to play with his nifty device called Victor Reader. Essentially, it’s a glorified MP3 player without a screen that dictates menu and content (audio books, ebooks, DAISY books, text files, notes, music, etc.). This particular model also records voice notes, but the newer model offers streaming radio over wifi and improved dictation.

Notice that all of the buttons are shaped differently based on their functions and some buttons even offer tactile or braille texture on the surface of the buttons.

Behavior of a button can go beyond physicality. For example, LG TV remote utilizes a subtle affordance that helps the user be aware of which button s/he has pressed without staring at the remote. The remote achieves this by having an unique pitch generated by each directional button. I discovered this when I pressed the buttons repeatedly and realized that the “click" sounds were slightly varied. In fact, up, down, left and right buttons all create slightly different pitches generated by the mechanical buttons/membranes. Furthermore, the volume and channel buttons offer slightly different contour that helps the user feel where their fingers are placed.

As you can see, the pitch that individual buttons generate is unique.

Left Button

Down Button

Right Button

Up Button

Enter Button

Deliberate Use of Female Computer Voice in Tech

While I was researching about voice interface and how to design one properly, I stumbled into this article.

While the story about why users preferred female voices was certainly interesting, I was quite intriguied by what Rebecca Zorach, the Director of the Social Media Project at the University of Chicago's Center for the Study of Gender and Sexuality, said in the article.

"What's interesting to me is how they seem to intentionally make her [Siri’s] speech sound artificial -- they could choose to make her speech more seamless and human-like, but they choose instead to highlight the technology," she said. "That makes you aware of how high-tech your gadget is."

via CNN