Text-To-Speech – Software Comparison

Text-To-Speech – Software as a Service (SaaS)

Introduction

Welcome to the first of a few informative articles, which we will be sharing over the coming months, helping people understand digital accessibility, assistive technology, and the resources that are currently available on the web.

Here at the Digital Accessibility Centre, we have many users with many differing disabilities, each with their own unique perspective on using assistive technology and accessing digital products through the PC, via a mobile device (mobile or tablet), from a gaming device and on a Digital TV platform.

In this article we examine various text-to-speech technologies, more specifically SaaS (Software as a Service). The software is purchased and installed by the developer, bearing no costs to the end user, and requiring no download; details, which have interested many people we have spoken to over the years.

We wanted to look at software which does not require download, for ease-of-use, and see how straightforward it is for a user to access the text on a web page. This also meant being able to view the page with colour changes to both text and background; referred to as Synchronised highlighting.

These tests were carried out with the latest popular browsers on PCs through Internet Explorer 9 and Firefox 19, on a Mac through Safari 5.1.7 and Firefox 19, and through the latest version of a straightforward iOS interface (6.0.2) with no accessibility options active.

It is important to note that, whilst there are quite a few text to speech packages available, we decided to review the five products we come across most frequently during testing of client’s sites. Additionally, there were several unique functions that these products were able to offer, additional functions, which, we hope, can be revisited in a future article.

There are many criteria that could be used for comparison when testing text-to-speech programs but I tried not to make this too technical from the outset. I hope I succeeded.

PC Performance in Internet Explorer (IE) and Firefox (FF)

First up was BrowseAloud Plus

Once activated, the program appeared in the top right corner of our screen. When using this for the first time on a larger screen area, users were not immediately aware of its presence, especially those with lower vision. In Firefox, BrowseAloud Plus offered exactly the same functionality; however, it also provided an animation which travelled from the image link, when activating the program, to the top-right; this gave the user an indication of its appearance and location on-screen; if using browser zoom, this proved to be more vital for inexperienced users.

Additionally, when activated in Internet Explorer the icon disappeared for 2-3 seconds, reappeared in the same spot, and a duplicate appeared in the top right corner of the browser. Unusually, both options could be clicked on to change the state of this software, but the first image would then disappear. Our users found this very disorientating. One of our users also noted that their choice in using this product did not appear to be applied to every page visited during their time on the website. In almost every instance the user had to re-initiate the service.

BrowseAloud Plus Activation Icon for IE9

BrowseAloud have made an impressive leap forward recently with their voice technology, introducing a new Nuance Vocaliser Expression range. This has made a stark difference to the product’s diction and flow. Listening to a large paragraph of 5-6 lines previously produced several very noticeable and unusual inflections; affecting the reading flow with occasional synthetic resonance. It appears that this latest upgrade has now rectified the issue; a clear benefit for the end-user. There were some minor pronunciation issues but these were issues that a website manager/developer, in control of the software, would need to consider making adjustments to.

The start-up default offered users a mouse “on-hover” option which worked well. Although product time-of-response is purported to be 250 m/s from mouse-hover to speech, in our test environment the area of text surrounding a motionless mouse pointer highlighted in less than 1 second on average, and speech starting approximately 1 second after the highlighting; with both the sentence and each word read receiving a differing contrast highlight for background and text (Dual-colour Highlighting). This feature appears to be available for a majority of the software looked at, and was very strong in this case.

It is a fair likelihood that companies which release products such as these, have substantial research to backup software development decisions. In the case of BrowseAloud I am aware that substantial user feedback is also incorporated. It could be argued, however, that this assumes every user will be happy with these preselected colours; there were no options to help the user customise their experience, in fact there were no options for the user to do anything additional.

One of our users also mentioned their preference of a change-of-state to always be present when pressing, what appears to be, a button. The power and play buttons were effective in this regard; however, the pause and stop buttons, shown in the image below, barely indicated any interaction had occurred at the time of selection; a very small movement occurred on mouse-click, each time, but this movement for all intents and purposes was practically non-detectable. Furthermore, our users stated that they expected these buttons to “grey out” when selected, with a highlight placed on the play button which could then be accessed to resume reading.

BrowseAloud Plus Toolbar

BrowseAloud Floating toolbar

On start up the toolbar is set as shown above with mouse-hover in operation which appears to be the sensible option; the default mouse-hover technique reads content without issue. Accessing the “power” button highlighted both the power and play buttons in green; this allowed us to select and read back areas of content. Our users found that an incomplete section of selected text could not be read out; selecting part of a sentence, and pressing play, always read back the whole sentence up to the next full stop forcing the user to listen to whole chunks of text unnecessarily.

Next up was ReadSpeaker

ReadSpeaker was clean, simple, and straightforward to use; the toolbar was nicely located at the start of each page’s main content, just below the breadcrumb trail. Users were able to click anywhere on the bar, beginning read-back immediately from the top of the main content, without requiring mouse interaction.

The default contrast was a pleasant surprise for dual-colour highlighting; it was not as strong as the BrowseAloud default but offered a nice selection of alternative synchronised highlight contrasts together with options for word highlighting, sentence highlighting, or no highlighting at all. We were also able to change the speed of reading.

ReadSpeaker Toolbar Closed

Fixed and floating toolbar - closed

ReadSpeaker Toolbar Open

Fixed and floating toolbar - open

The software allowed users to move throughout a pages playback with immediate effect, like an online video, and download any selected text as an MP3 file for later listening. Mid-sentence selections of content could be read out instead of listening to whole “chunks” of text, and these could also be downloaded as MP3 files via the pop-up selector which our users were also very happy with. ReadSpeaker worked just as well in Firefox. It had very nice diction and flow, and was clearly a forerunner for natural voice quality.

Dixerit Plus was third on our list

It worked well on both Internet Explorer and Firefox on a PC. The program appeared as a bar across the top of every page, accessed by an image link at the top of the page. The placement of the image link in this scenario made it far easier to notice where the new toolbar had appeared.

Dixerit Activation Icon

Image Link

Clicking ‘Read Text’ started the reader. This read from the start of the main content; with no highlighting options to offer without involving a download. The only other options present affected the toolbar’s icon size, altered voice speed, and there were a few lines of text describing how to change webpage ‘Text size’ via the menus in both IE and Firefox. The ‘read links’ icon and ‘jump to link’ edit box are a unique feature to this program that read out links only, we could then enter the link number desired in the edit field to access it. The service also offered a slider to move back and forth through the audio file which, after some frustration, our users learned was not an interactive slider. Using the mouse to only click either side pulled the bar one way or the other, an unorthodox method.

TTS section of fixed toolbar

TTS section of fixed toolbar

Download Services section of fixed toolbar

Download services section of fixed toolbar

Mouse-hover pop-up

Mouse-hover pop-up

Hovering over an area of text introduced the Pop-up button; the users were unable to select part of a sentence. It was also noted that the Voice was unpleasant to listen to; it had poor diction and flow. Other features were offered through toolbar icons, but only on download i.e. Magnifier, Contrast, and WebRead.

Rok-Talk was our penultimate service

Rok-Talk offered a unique way of running the program. The icon was used to appear as a switch, turning the toolbar on and off at the bottom of the screen. As with Dixerit at the top of the webpage, this icon was also placed at the bottom of the page, in the footer, just above the area where the toolbar would appear. Our users had no issue being aware of its appearance once turned on.

Rok-Talk Activation Icon - not active Rok-Talk Activation Icon - active

Toolbar not active Toolbar active

Nothing was active by default for this service. Rok-Talk offers mouse-hover text highlight and read-back, and the select-to-speak option; the users could not just press a play type button and hear the page read out. Additionally, we noted the lack of a pop-up option on selected text, the user had to return to the toolbar to start and stop playback. This should have made use consistent and straight-forward on all devices, platforms, and browsers but it was not the case (See later sections for Mac and Mobile feedback).

The mouse-hover option also includes a contrast highlight change. Although there are also ten different contrast style sheets available (including the default), the text highlight on-hover is always the same combination. It is worth noting that the style sheets offer a very diverse and effective level of immediate customisation for the whole page.

Standard Rok-Talk Toolbar

Standard Rok-Talk Toolbar

This service also offers MP3 downloads of the pages audio and translation with captioning, as shown in the following image.

Rok-Talk captioned play-back

Captioned play-back

Unfortunately, when using the Text Size option through the Toolbar a horizontal scroll bar appeared in Internet Explorer only which in itself is not best practice; however, if used to scroll the screen horizontally the toolbar disappears, although using mouse-over does show it is still there on the page. We also found that while the audio is in play-back, any mouse movement stops the manual speech option in the middle of talking. There were no issues within Firefox on the PC.

The last service up for review is Recite

The interface was accessed through a link at the top of the page, as with Dixerit Plus, and the toolbar opened across the top of the page directly above it. Recite was highly customisable not only with 15 pre-set style sheets, but any colour of the spectrum through full palettes, specifically targeting dual-colour highlighting for background, normal text, and text links.

Standard Recite Toolbar

Standard Recite Toolbar

On start-up, the default was set to mouse-hover, which read back whole areas of text only; this also produced a pop-up allowing control over the audios play-back. The Play, FF, & FR buttons allowed the user to move through a page as a keyboard-only user would with the tab key; dual-highlighting as we moved from one item to the next. The play button allowed the user to replay the current item.

Recite pop-up with play-back controls

Pop-up with play-back controls

Several comments were received regarding the icons used. The video-type controls were an issue as without instructions our users had to figure out how to use the product as best they could; trying the play button several times in combination with selecting text, then realising the mouse-hover option could not be switched off. One user gave up trying to use these buttons. It was also noted that audio to MP3 download was available to take the web content with you, without requiring an Internet connection which was useful.

Other features not required to download included a converter to a text-only page, a dictionary with an A-Z icon which one user felt was a little ambiguous, and a font change option with text resizing. One user relatively new to accessibility, found the Ruler to be very useful in helping him read ‘chunks’ of text. Again, there were no issues using this product in Firefox.

Mac Performance (Safari and Firefox)

There were not that many differences between use with a PC and with an iMac. Opening animations for BrowseAloud Plus were present in both Safari and Firefox. ReadSpeaker was as consistent and seamless with both browsers as it was on the PC, as was Dixerit Plus and Recite.

Rok Talk was the only product to have an issue on the iMac, and it was a big one. In Safari we were unable to produce any audio of the webpage using this service. No content could be read out. We tried clearing the cache and restarting the computer but reproduced the same issue ten out of ten times consecutively after restart.

On Mobile

Before looking at this section, we have to concede that, due to the hover-and-read and select-to-read nature of TTS software services, and the relatively new introduction to the mobile interface market, there is an inherent problem trying to duplicate the same access on a touchscreen device; our users were unable to figure out an official equivalent action of, for example, trying to read a link without actually activating it. With this in mind the following results were reached.

BrowseAloud Plus read the page well. Content could be selected and read, including link text and the floating bar could be moved around the screen at will, but this was not always the case. On smaller devices the bar could not always be moved easily, our users had to zoom in to increase its size and then move it around the screen; accessing it at a relatively small size sometimes left meant the user would activate the image link, next to the close button, by accident; or they would select content/activate links slightly behind the top of the toolbar.

At times the interface did not handle the updating of its position on-screen very well either. On occasion we had two toolbars present on the page; one that was visually present but had no functionality. When trying to read text-only content.

Apart from the slightly different loo, the Android device offered a similar experience to the iPhone. The interface didn’t seem to encounter duplicated toolbars when moving and re-sizing the screen but the same issue applied with moving the toolbar around the page; either the user activated the image link next to the close button by accident, or they were selecting/activating content slightly behind the top of the toolbar.

ReadSpeaker worked very well. Although the touchscreen interface is not completely compatible with text-to-speech techniques for reading link content, as with BrowseAloud, areas of content included link text could be selected and read back to our users; the pop-up appeared next to the selected content and was easy to activate. Our users were impressed with this feature; they also liked that the software interface duplicated itself in its entirety when selecting a text area. It followed them around the screen and they could zoom and move the page at will without worrying about finding it again.

On android, the main interface worked smoothly as on the PC, Mac and iOS; however, we were not able to obtain the pop-up from specific text areas. Our users either got no response or they held for too long and just normally selected the text.

Dixerit also appeared to work well on a mobile device, a single finger touch gesture brought out the pop-up feature, allowing the underlying area of text to be read out, the user cannot affect the area of content being read. Due to the underlying issues with a touchscreen interface it had to be a quick touch, as the pop-up would not appear for selected text; again, links text could not be read without activation.

The android experience was a little better. We were able to use the default text reader without issue; we were also able to produce the pop-up efficiently for an area of text (including link text).

Rok-Talk was, unfortunately, not very compatible with mobile devices. We were unable to make a text selection read back to us in iOS and android. Resizing the page with the pinch-to-zoom gesture also highlighted a problem with the toolbar. It would not stay at the bottom of the screen; instead it travelled up and down while our users tried scrolling; obscuring page content, especially if the page was scrolled too quickly.

In the android environment the toolbar disappeared off the bottom of the screen and did not return until the zoom was almost back to normal.

Recite was comparable to Dixerit in terms of compatibility and capability. It worked fine on non-link content; the same quick one touch gesture opened a pop-up which could be accessed to read back text; however, it unexpectedly disabled the pinch-to-zoom gesture. Before cold-booting the device to duplicate this problem, we opened two other random websites, each on their own tabs, and were able to use the zoom feature with ease. Although the tool bar offers the user an option to increase text size in the browser they are unable to magnify the screen for any other content if required.

Additionally, the android device allowed our users a limited amount of re-sizing through the zoom option, but, more importantly, the video-type controls were not present on the android mobile version of the toolbar, relying on the touch to speak option which we were unable to access.

BrowseAloud Plus

ReadSpeaker

Dixerit Plus

Rok-Talk

Recite

Features

Ease of use

5*

5*

5*

3*

4*

Voice Clarity

5*

5*

2*

4*

3*

Default Text Contrast

5*

5*

1*

2*

3*

Contrast Options

3*

4*

3*

4*

5*

Compatibility

iOS

3*

4*

4*

2*

4*

Android

3*

4*

5*

1*

1*

IE9 (PC)

3*

5*

5*

3*

5*

FF19 (PC)

4*

5*

5*

5*

5*

Safari (Mac)

4*

5*

5*

1*

5*

FF (Mac)

4*

5*

5*

5*

5*

Evaluation

To Summarise, generally, our users agreed that there was a certain level of frustration when controls did not perform their presumed functions properly or were not straight-forward enough to pick up quickly. Instructions at the time and place of initiation, or direct links to instructions were not present in most cases, to aid in first time use especially. Similarly, it was also thought that, in some cases, capabilities may have been available to read about the product but quick access to simple instructions were not deemed necessary. Rok-Talk was the only service with a direct link to a user manual on their website.

So in conclusion, there are several good web services that can offer above and beyond in terms of features, in the text-to-speech arena. However, in terms of straightforward use on both mobile and desktop devices for changing text-to-speech, synchronised highlighting, good diction and flow of speech synthesis, and options for changing text to enhance readability, both ReadSpeaker and Recite seem to offer the broadest features, compatibility, and user-centric capabilities. However, with recent new upgrades such as textHELP’s BrowseAloud Plus and more up-coming releases on the horizon, including a known development commitment from Readspeaker, we can see some really great products, both available now and in planning, that can deliver a high quality of service; easily managing the user’s need and expectation.