Rearranging The Deckchairs

Frank O'Dwyer's blog

Word on the Street Beta Test Review

My Word on the Street iPhone app is now in the queue for approval into the Apple app store (just over 1 week and counting). Meanwhile I’ve taken the opportunity to write up some experiences and stats from the alpha/beta period, which started around the end of April this year. This is mainly for my own benefit but may be of use to others who are looking to set up a beta test for a service based app, so I’m sharing it here.

What the app does

The app allows people to leave notes wherever they are. You can also search for notes that other people have left nearby. You can rate locations, tag them, and share them with people, even if they don’t have an iPhone. You can also rate and edit other people’s entries. The idea is to share local knowledge not just about restaurants and bars etc, but about absolutely anything. I’ve also added a feature which allows charities and non-profits who need volunteers at a physical location to post their requests, so that app users in the area can respond. If know of any charity/non-profit that may be interested in this, please send them here.

Beta participation

About 80 users expressed an interest to test and signed up via, invitation, and other venues, but mostly via ibetatest. Their devices were provisioned in the ‘ad hoc distribution’ system that Apple requires. Of these, only about half got as far as downloading the program and registering as users. Registration for an account is optional, so I don’t have stats on users that used the program without registering. However I believe that almost everyone who got as far as installing the program also registered, since some features like writing and editing entries require an account. I suspect that a lot of the reason behind the difference in numbers registering for the beta, versus numbers getting as far as installing the program and registering for an account, is the godawful ‘ad hoc distribution’ method Apple provides for distributing apps outside the app store. This is notoriously buggy and fussy and I suspect many users got errors, or couldn’t follow the pretty convoluted steps, and simply gave up somewhere along the road. And who can blame them. Life is too short. Many of the users who did successfully install the program could only do so with some handholding from me. Some were able to install some versions and not others. Around the time OS3.0 came out, my developer certificate had to be renewed, and some users were no longer able to install the beta. A nightmare. I also had to turn some users away because I hit the Apple imposed 100 device limit. Even even though some users upgraded to the new iPhone 3GS and no longer needed their 3G device provisioned, deleting their old device did not free up a provisioning slot in Apple’s system - this is actually by design on Apple’s part! At least one user who upgraded to 3GS was unable to participate further because of this. So in short, due to badly implemented restrictions imposed by Apple, I probably had less than 40% of the active testers I could have had, even sticking to a 100 device limit. And of course if the device limit was not so ridiculously low or was not present at all I could have had many more. Please Apple, come up with something better for this. Preventing abuse should not mean preventing use. Given this is a free app, the rationale for limiting to 100 devices - which is that it might be sold outside the app store - makes no sense here.

How active were users?

Of the users that signed up, some were more active than others. The graph below shows total users over time, versus users who had connected within time windows of 1 to 8 weeks. Active vs Total registered users over the last year Sat Aug 15
20-48-04 UTC
2009.png (This graph is set up to show a window of a year however the beta started around 4 months ago, so there is only 4 months of data so far) It looks like somewhere between 25% to 50% of the users tried the program on a semi-regular basis and at least half only tried it once or twice and then gave up. There is also a small spike in interest any time there is a new version of the app to test, but this diminishes over time. Also, though not shown on the graph, not everyone installs each new version and a lot of returning users stuck with older versions. It is difficult to ascribe reasons why some users didn’t return much or at all, as it may have simply been that users didn’t like the app and lost interest. However some other possible reasons behind this: - Installing new versions via ad hoc provisioning is a pain for users,

and it doesn't always work. A proportion of users had trouble with
some versions and not others and some users upgraded to 3GS and
couldn't be provisioned. That 100 device limit again.
  • Initial versions of the app were more alpha than beta - many features were unfinished and buggy. Even though this was made pretty clear, probably some users gave up because of this.
  • Some users only had iPod touches - the program is usable but a lot less useful on these, not because these devices lack GPS, but because currently the app really needs on the move internet. Again, some users probably gave up because of this.
  • The program is less interesting until it has a lot of users as ‘nearby’ entries tend to be your own. I’ve addressed this issue as much as I can - for example the program will show entries that are far away if necessary - but still it is a basic limitation until there are more users.

Lesson: Choose your testers carefully. Also bear in mind that ‘release early, release often’, may not work so well for an iPhone app, especially not within the 100 device limit imposed by Apple. The ‘release often’ part helps a little, but not that much. You can also see the spike in tester signups due to on this graph.

How much data?

The graphs below show growth in the amount of data collected over the beta period. You can see that even though the number of users hit the maximum pretty early on, growth in the data itself has remained pretty steady, though has tailed off recently with less users posting entries. This is pretty much what I expected as once you have written about places nearby your usual haunts, the tendency is then to add entries only when you make an unusual trip. It is also sometimes difficult to add entries when abroad as data roaming charges on iPhone are crazy, so you need to be able to get on free wifi, which currently limits what you can post about. I have ideas to address this in future versions. Growth over the last year Sat Aug 15 20-48-04 UTC
2009.png Growth over the last month Sat Aug 15 20-48-04 UTC
2009.png Growth over the last week Sat Aug 15 20-48-04 UTC

Who wrote what?

The graph below shows number of entries per user. Most users posted at least one entry. However, by far the most entries were posted by me :-) Aside from that, even within the small set of beta testers there is a clear Long Tail effect with a small percentage of users accounting for almost all other entries. This is pretty much what I expected and is yet another reason why Apple’s 100 device limit is a pain in the ass - you need an awful lot of users to get a lot of active posters. Entries vs User (all time, 1 or more entries) Sat Aug 15 20-48-04 UTC

Who rated place descriptions?

I was surprised that few users rated the helpfulness of other people’s entries. The graph below shows who rated what. Again, most ratings were done by me - the data set is small enough so far that I was able to rate everything that other people posted. But, only 10 users (25%) rated entries at all, and though there is not yet enough data to really say for sure, again there is a sign of a Long Tail effect, with a small percentage of users accounting for almost all ratings. Ratings vs User (all time, 1 or more ratings) Sun Aug 16 10-17-21 UTC
2009.png Note, these ratings are only about helpfulness of descriptions. This is separate from ratings of the places themselves, which are not shown on the above graph. I get into that later. In fact slightly more users rated places than posted entries about them. Generally, anyone who posted an entry also rated the place because the app generally requires it. Initially, some users tried to rate their own entries. Although I planned to allow this as I think it can be useful to know whether a user thinks their own entry is better or worse than their other entries, I wound up discarding these ratings and disabling them. This was because it was too much of a pain to filter out self-rating when doing other calculations about entry helpfulness and user ‘karma’ etc, to prevent gaming of the system. However I will probably reinstate this capability at some point.

How helpful were the place descriptions?

With the caveat that not many people rated place descriptions, the graph below shows the breakdown of how helpful the descriptions were. Place description helpfulness Sun Aug 16 10-37-56 UTC
2009.png You can see that most entries were not rated at all (this is because most people didn’t rate descriptions at all), but of the ones that were, they were mostly at least “OK”. Only a handful were rated less than OK or close to ‘didn’t help at all’. Of course it is difficult to draw any conclusions with this small sample but it looks like most entries are pretty helpful overall. Most of the ones that didn’t help at all were people posting about their own home and rating it awesome :-) This is really pretty common and I expect a lot of it once the app goes to a wider audience. Surprisingly, a handful of entries also had a completely incorrect location and were voted down because of that. These were posted using an iPod touch, which does not have a GPS and relies on WiFi triangulation and some kind of WiFi database to get a fix. It looks like for some areas, this database is completely broken. For example, some entries that were posted in Vietnam showed up on the map in Kansas! The iPhone reports an ‘accuracy’ estimate, but this too was completely broken for these entries. Again, I expect to see more of this once the app goes to a wider audience and because of this I distinguish between entries posted using an iPod versus ones posted using a phone.

How good were the places?

The graph below shows the breakdown of ratings on the places themselves, rounded to the nearest 5 star rating - i.e. how good was the location/restaurant/bar etc, according to the poster and anyone else who rated it. Place rating Sat Aug 15 20-48-04 UTC
2009.png Again, this is a small sample so it is hard to draw conclusions. But interestingly more places are rated better than just “OK” - perhaps because users are generally more motivated to write about places they like. Very few people rated a location as really bad, and no place was rated in the region of no stars at all. Far more people rated places as ‘very good’ or ‘awesome’ (5 stars) !