My Word on the Street iPhone app is now in the queue for approval into the Apple app store (just over 1 week and counting). Meanwhile I’ve taken the opportunity to write up some experiences and stats from the alpha/beta period, which started around the end of April this year. This is mainly for my own benefit but may be of use to others who are looking to set up a beta test for a service based app, so I’m sharing it here.
What the app does
The app allows people to leave notes wherever they are. You can also search for notes that other people have left nearby. You can rate locations, tag them, and share them with people, even if they don’t have an iPhone. You can also rate and edit other people’s entries. The idea is to share local knowledge not just about restaurants and bars etc, but about absolutely anything. I’ve also added a feature which allows charities and non-profits who need volunteers at a physical location to post their requests, so that app users in the area can respond. If know of any charity/non-profit that may be interested in this, please send them here.
Beta participation
About 80 users expressed an interest to test and signed up via ibetatest.com, invitation, and other venues, but mostly via ibetatest. Their devices were provisioned in the ‘ad hoc distribution’ system that Apple requires. Of these, only about half got as far as downloading the program and registering as users. Registration for an account is optional, so I don’t have stats on users that used the program without registering. However I believe that almost everyone who got as far as installing the program also registered, since some features like writing and editing entries require an account. I suspect that a lot of the reason behind the difference in numbers registering for the beta, versus numbers getting as far as installing the program and registering for an account, is the godawful ‘ad hoc distribution’ method Apple provides for distributing apps outside the app store. This is notoriously buggy and fussy and I suspect many users got errors, or couldn’t follow the pretty convoluted steps, and simply gave up somewhere along the road. And who can blame them. Life is too short. Many of the users who did successfully install the program could only do so with some handholding from me. Some were able to install some versions and not others. Around the time OS3.0 came out, my developer certificate had to be renewed, and some users were no longer able to install the beta. A nightmare. I also had to turn some users away because I hit the Apple imposed 100 device limit. Even even though some users upgraded to the new iPhone 3GS and no longer needed their 3G device provisioned, deleting their old device did not free up a provisioning slot in Apple’s system - this is actually by design on Apple’s part! At least one user who upgraded to 3GS was unable to participate further because of this. So in short, due to badly implemented restrictions imposed by Apple, I probably had less than 40% of the active testers I could have had, even sticking to a 100 device limit. And of course if the device limit was not so ridiculously low or was not present at all I could have had many more. Please Apple, come up with something better for this. Preventing abuse should not mean preventing use. Given this is a free app, the rationale for limiting to 100 devices - which is that it might be sold outside the app store - makes no sense here.
How active were users?
Of the users that signed up, some were more active than others. The
graph below shows total users over time, versus users who had connected
within time windows of 1 to 8 weeks.
(This graph is set up to show a window of a year however the beta
started around 4 months ago, so there is only 4 months of data so far)
It looks like somewhere between 25% to 50% of the users tried the
program on a semi-regular basis and at least half only tried it once or
twice and then gave up. There is also a small spike in interest any time
there is a new version of the app to test, but this diminishes over
time. Also, though not shown on the graph, not everyone installs each
new version and a lot of returning users stuck with older versions. It
is difficult to ascribe reasons why some users didn’t return much or at
all, as it may have simply been that users didn’t like the app and lost
interest. However some other possible reasons behind this:
- Installing new versions via ad hoc provisioning is a pain for users,
and it doesn't always work. A proportion of users had trouble with
some versions and not others and some users upgraded to 3GS and
couldn't be provisioned. That 100 device limit again.
- Initial versions of the app were more alpha than beta - many features were unfinished and buggy. Even though this was made pretty clear, probably some users gave up because of this.
- Some users only had iPod touches - the program is usable but a lot less useful on these, not because these devices lack GPS, but because currently the app really needs on the move internet. Again, some users probably gave up because of this.
- The program is less interesting until it has a lot of users as ‘nearby’ entries tend to be your own. I’ve addressed this issue as much as I can - for example the program will show entries that are far away if necessary - but still it is a basic limitation until there are more users.
Lesson: Choose your testers carefully. Also bear in mind that ‘release early, release often’, may not work so well for an iPhone app, especially not within the 100 device limit imposed by Apple. The ‘release often’ part helps a little, but not that much. You can also see the spike in tester signups due to ibetatest.com on this graph.
How much data?
The graphs below show growth in the amount of data collected over the
beta period. You can see that even though the number of users hit the
maximum pretty early on, growth in the data itself has remained pretty
steady, though has tailed off recently with less users posting entries.
This is pretty much what I expected as once you have written about
places nearby your usual haunts, the tendency is then to add entries
only when you make an unusual trip. It is also sometimes difficult to
add entries when abroad as data roaming charges on iPhone are crazy, so
you need to be able to get on free wifi, which currently limits what you
can post about. I have ideas to address this in future versions.

Who wrote what?
The graph below shows number of entries per user. Most users posted at
least one entry. However, by far the most entries were posted by me :-)
Aside from that, even within the small set of beta testers there is a
clear Long Tail effect
with a small percentage of users accounting for almost all other
entries. This is pretty much what I expected and is yet another reason
why Apple’s 100 device limit is a pain in the ass - you need an awful
lot of users to get a lot of active posters.

Who rated place descriptions?
I was surprised that few users rated the helpfulness of other people’s
entries. The graph below shows who rated what. Again, most ratings were
done by me - the data set is small enough so far that I was able to rate
everything that other people posted. But, only 10 users (25%) rated
entries at all, and though there is not yet enough data to really say
for sure, again there is a sign of a Long Tail effect, with a small
percentage of users accounting for almost all ratings.
Note, these ratings are only about helpfulness of descriptions. This is
separate from ratings of the places themselves, which are not shown on
the above graph. I get into that later. In fact slightly more users
rated places than posted entries about them. Generally, anyone who
posted an entry also rated the place because the app generally requires
it. Initially, some users tried to rate their own entries. Although I
planned to allow this as I think it can be useful to know whether a user
thinks their own entry is better or worse than their other entries, I
wound up discarding these ratings and disabling them. This was because
it was too much of a pain to filter out self-rating when doing other
calculations about entry helpfulness and user ‘karma’ etc, to prevent
gaming of the system. However I will probably reinstate this capability
at some point.
How helpful were the place descriptions?
With the caveat that not many people rated place descriptions, the graph
below shows the breakdown of how helpful the descriptions were.
You can see that most entries were not rated at all (this is because
most people didn’t rate descriptions at all), but of the ones that were,
they were mostly at least “OK”. Only a handful were rated less than OK
or close to ‘didn’t help at all’. Of course it is difficult to draw any
conclusions with this small sample but it looks like most entries are
pretty helpful overall. Most of the ones that didn’t help at all were
people posting about their own home and rating it awesome :-) This is
really pretty common and I expect a lot of it once the app goes to a
wider audience. Surprisingly, a handful of entries also had a completely
incorrect location and were voted down because of that. These were
posted using an iPod touch, which does not have a GPS and relies on WiFi
triangulation and some kind of WiFi database to get a fix. It looks like
for some areas, this database is completely broken. For example, some
entries that were posted in Vietnam showed up on the map in Kansas! The
iPhone reports an ‘accuracy’ estimate, but this too was completely
broken for these entries. Again, I expect to see more of this once the
app goes to a wider audience and because of this I distinguish between
entries posted using an iPod versus ones posted using a phone.
How good were the places?
The graph below shows the breakdown of ratings on the places themselves,
rounded to the nearest 5 star rating - i.e. how good was the
location/restaurant/bar etc, according to the poster and anyone else who
rated it.
Again, this is a small sample so it is hard to draw conclusions. But
interestingly more places are rated better than just “OK” - perhaps
because users are generally more motivated to write about places they
like. Very few people rated a location as really bad, and no place was
rated in the region of no stars at all. Far more people rated places as
‘very good’ or ‘awesome’ (5 stars) !