It’s what most people say they want. So how do we know how happy people are? You can’t improve or understand what you can’t measure. In a blow to happiness, we’re very good at measuring economic indices and this means we tend to focus on them. With hedonometer.org we’ve created an instrument that measures the happiness of large populations in real time.
Our hedonometer is based on people’s online expressions, capitalizing on data-rich social media, and we’re measuring how people present themselves to the outside world. For our first version of hedonometer.org, we’re using Twitter as a source but in principle we can expand to any data source in any language (more below). We’ll also be adding an API soon.
So this is just a start — we invite you to explore the Twitter time series and let us know what you think.
Hedonometer.org is based on the research of Peter Dodds and Chris Danforth and their team in the Computational Story Lab, including visualization by Andy Reagan, at the University of Vermont Complex Systems Center, and the technology of Brian Tivnan, Matt McMahon and their team from The MITRE Corporation.
The economist Francis Edgeworth coined the term in the late 1800’s to describe "an ideally perfect instrument, a psychophysical machine, continually registering the height of pleasure experienced by an individual." [wikipedia]
To quantify the happiness of the atoms of language, we merged the 5,000 most frequent words from a collection of four corpora: Google Books, New York Times articles, Music Lyrics, and Twitter messages, resulting in a composite set of roughly 10,000 unique words. Using Amazon’s Mechanical Turk service, we had each of these words scored on a nine point scale of happiness: (1) sad to (9) happy. You can explore the average scores of each word on our words page, or download the entire list from the publication supplement here.
hedonometer.org currently measures Twitter’s Gardenhose feed, a random sampling of roughly 50 million (10%) of all messages posted to the service, comprising 100GB of JSON each day. Words in messages written in English are thrown into a large bag (containing roughly 100 million words per day), and the bag is assigned a happiness score based on the average happiness score of the words contained within.
Is that even a question? Well, we do have a knob. It allows us to tune the relative importance of the most emotionally charged words by removing neutral words from consideration when determining the happiness of a given day. It also allows us to remove words that receive widely varying scores when rated on Mechanical Turk. Many profanities received average ratings between 4 and 6 due to the bimodal nature of their word score distribution. (Details on the choice of Δ havg = 1 can be found in figure 2 of the publication listed below.)
Tweets represent a non-uniform subsampling of all utterances made by a non-representative subpopulation of all people. However, there are hundreds of millions of people presently using the website to express their activities and interests, and as such it is an important social signal.
Yes! And Twitter’s demographics have also changed over time. Nevertheless, we’re using Twitter as our initial data source for a few reasons:
Many people presume this day will be one of clear positivity. While we do see positive words such as “celebration” appearing, the overall language of the day on Twitter reflected that a very negatively viewed character met a very negative end. It was a day of complex emotion which is best explored in the word shift for the day, rather than the single number of its average happiness.
In our Computational Story Lab blog we describe research projects in which we use our hedonometer to characterize happiness variations with respect to geography, network topology, demographics, and socio-economic data. For example, here’s a map of the US with cities colored by happiness:
For the full story of our hedonometer algorithm, please read our foundational paper describing its construction:
We are currently in the process of scoring the most frequently used words in a dozen other languages, and hope to have these measurements incorporated into the instrument by the end of the year.
We are currently developing a principled method to identify relevant phrases, for example to deal with the multitude of both positive and negative uses of profanity. We expect to be scoring phrases instead of words, where appropriate, in the near future.
We will soon be including text from other online sources including Google Trends (what people are searching for), bit.ly (what people are viewing online), and the BBC (what people are reading), which will serve as different lenses through which to explore societal trends.
We are currently building a large-scale database of word-based measures for emotions other than happiness and sadness such as fear, anger, and surprise. We intend to incorporate these emotions into future versions of the hedonometer.
We use Amazon Web Services and the Vermont Advanced Computing Core (VACC) to compute happiness vectors from the Twitter Gardenhose, and preprocess some of the data on the Linode server. For specifics on how the tweets are analyzed, including the parsing details (regular expressions), a sample code to compute happiness is available on Github here.
Peter Dodds, Chris Danforth, Andi Elledge, Sharon Alajajian, Nicholas Allgaier, Catherine Bliss, Melody Burkins, Eric Clark, Emily Cody, Kameron Decker Harris, Suma Desu, Mike Foley, Morgan Frank, Bill Gottesman, Isabel Kloumann, Paul Lessard, Lewis Mitchell, Kate Morrow, Eitan Pechenick, Michael Pellon, Aaron Powers, Andy Reagan, Matt Tretin, Lindsay Van Leir, and Jake Williams.
Brian Tivnan, Matt McMahon, Ivan Ramiscal, Mike Shadid, Pete Carrigan, Zach Furness, Zoe Henscheid, Garry Jacyna, Matt Koehler, and Karine Megerdoomian.
Mike Austin, Josh Brown, Jim Burgmeier, Kate Danforth, Tyler Gray, John Kaehny, Jim Lawson, Aimee Picchi, Andrew Reece, Tony Richardson, John Tucker and Toph Tucker.
And special thanks go to Jonathan Harris and Sep Kamvar for their initial inspiration.