Creating the BFFS conference Twitter archive
This is an unusually geeky post for this blog, but several people asked how I’d created the combined page of all the tweets for the BFFS national conference and Film Society of the Year awards in a permanent, readable form.
Twitter makes it easy for developers to do stuff with the data on the Twitter service by publishing its API. This is the reason why there are loads of Twitter applications, visualisation tools and much else, since they’ve invariably been developed by someone outside Twitter, who quite reasonably take the view that you can’t predict what users might want to do with this data, so it makes sense to give users the tools to build what they want.
There are several useful blog posts about how to download Twitter content, and the most useful for me was that on the IBM site.1 This in turn pointed me to the Twitter API documentation2, and using the two sources together meant that I could quickly find all of the conference tweets on the two hashtags using:
http://search.twitter.com/search.atom?q=%23BFFSnatconf+OR+%23FSOY2012&page=1&rpp=100
This generated a feed in Atom format3, so I could save the underlying XML data. Now, since the maximum number of results per page (“rpp”) this query can return is 100, I needed to change the value of the page parameter several times and re-run the query so I could get all the data. Clearly, this is OK for a few hundred items like we have but not a scalable solution.
Next, I concatenated the important content of all six XML files together, to create a single well-formed file containing all the tweets. This gave us over 500 items, each of which had this structure:
<entry> <id>tag:search.twitter.com,2005:249906722150424576</id> <published>2012-09-23T16:23:00Z</published> <link type="text/html" href="http://twitter.com/NewcastleCinema/statuses/249906722150424576" rel="alternate"/> <title>Get yourself to the bar!! @MinicineYorks #BFFSnatconf</title> <content type="html">Get yourself to the bar!! @<a class=" " href="https://twitter.com/MinicineYorks">MinicineYorks</a> <em><a href="http://search.twitter.com/search?q=%23BFFSnatconf" title="#BFFSnatconf" class=" ">#BFFSnatconf</a></em></content> <updated>2012-09-23T16:23:00Z</updated> <link type="image/png" href="http://a0.twimg.com/profile_images/1640504972/twitterAvatar_reasonably_small_normal.gif" rel="image"/> <twitter:geo/> <twitter:metadata> <twitter:result_type>recent</twitter:result_type> </twitter:metadata> <twitter:source><a href="http://twitter.com/download/iphone">Twitter for iPhone</a></twitter:source> <twitter:lang>en</twitter:lang> <author> <name>NewcastleCinema (NCC)</name> <uri>http://twitter.com/NewcastleCinema</uri> </author> </entry>
In addition, some of the tweets reference pictures from the weekend, and it would be nice to include them in the stream, rather than have to follow the links, so those required a bit of hand-editing of the data to insert them, using a @rel="enclosure" to distinguish them in the data:
<link rel="enclosure" href="https://pbs.twimg.com/media/A3i_RJVCYAE-TuT.jpg"/>
Now, there’s lots of stuff in there that we don’t really need, and it would be easier to read them all if they were in chronological order, for instance, so I then sorted them and tidied up the rendering to make a single HTML page using XSLT (see example below). In addition, the default date/time format that you get back from Twitter is the ISO 8601 standard (“2012-09-24T11:12:30Z”), which isn’t very readable; that can also be modified in the XSLT, for which I used a slightly modified version of code on an Austrian blog4:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:fn="http://www.w3.org/2005/xpath-functions" xmlns:google="http://base.google.com/ns/1.0" xml:lang="en-US" xmlns:openSearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:twitter="http://api.twitter.com/" xmlns:georss="http://www.georss.org/georss" exclude-result-prefixes="google openSearch twitter georss xs fn atom" xpath-default-namespace="http://www.w3.org/2005/Atom"> <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/> <xsl:template match="@*|node()"> <xsl:apply-templates select="@*|node()"/> </xsl:template> <xsl:template match="/"> <html> <head> <title>#BFFSnatconf | #FSOY2012 : A conference in tweets</title> </head> <body> <h1>#BFFSnatconf | #FSOY2012</h1> <h2>A conference in tweets</h2> <h3>BFFS National Conference and Film Society of the Year awards, 2012</h3> <xsl:for-each select="atom:feed/atom:entry"> <!-- Sort the items in chronological order --> <xsl:sort select="atom:published" order="ascending"/> <div> <p class="dateline">On <a> <xsl:attribute name="href"><xsl:value-of select="atom:link[@rel='alternate']/@href"/></xsl:attribute> <span class="date"> <xsl:call-template name="FormatDate"> <xsl:with-param name="DateTime" select="atom:published"/> </xsl:call-template> </span></a>, <a> <xsl:attribute name="href"><xsl:value-of select="atom:author/atom:uri"/></xsl:attribute> <span class="author"><xsl:value-of select="atom:author/atom:name"/></span> </a> wrote:</p> <p class="content"> <xsl:value-of select="atom:content" disable-output-escaping="yes"/> <xsl:if test="atom:link[@rel='enclosure']"><xsl:apply-templates select="atom:link[@rel='enclosure']" /></xsl:if> </p> </div> </xsl:for-each> </body> </html> </xsl:template> <xsl:template name="FormatDate"> <xsl:param name="DateTime" /> <xsl:variable name="mo"> <xsl:value-of select="substring($DateTime,6,2)" /> </xsl:variable> <xsl:variable name="day"> <xsl:value-of select="substring($DateTime,9,2)" /> </xsl:variable> <xsl:variable name="year"> <xsl:value-of select="substring($DateTime,1,4)" /> </xsl:variable> <xsl:variable name="time"> <xsl:value-of select="substring($DateTime,12,8)" /> </xsl:variable> <xsl:variable name="hh"> <xsl:value-of select="substring($time,1,2)" /> </xsl:variable> <xsl:variable name="mm"> <xsl:value-of select="substring($time,4,2)" /> </xsl:variable> <xsl:variable name="ss"> <xsl:value-of select="substring($time,7,2)" /> </xsl:variable> <xsl:value-of select="$day"/><xsl:text> </xsl:text> <xsl:choose> <xsl:when test="$mo=1">January</xsl:when> <xsl:when test="$mo=2">February</xsl:when> <xsl:when test="$mo=3">March</xsl:when> <xsl:when test="$mo=4">April</xsl:when> <xsl:when test="$mo=5">May</xsl:when> <xsl:when test="$mo=6">June</xsl:when> <xsl:when test="$mo=7">July</xsl:when> <xsl:when test="$mo=8">August</xsl:when> <xsl:when test="$mo=9">September</xsl:when> <xsl:when test="$mo=10">October</xsl:when> <xsl:when test="$mo=11">November</xsl:when> <xsl:when test="$mo=12">December</xsl:when> </xsl:choose><xsl:text> </xsl:text> <xsl:value-of select="$year"/><xsl:text>, </xsl:text> <xsl:value-of select="$hh"/><xsl:text>.</xsl:text> <xsl:value-of select="$mm"/><xsl:text>:</xsl:text> <xsl:value-of select="$ss"/> </xsl:template> <!-- Handle the manually-added pictures --> <xsl:template match="atom:link[@rel='enclosure']"> <div class="image"><img alt="Image referenced in the post" ><xsl:attribute name="src"><xsl:value-of select="@href"/></xsl:attribute></img></div> </xsl:template> </xsl:stylesheet>
So, creating the Twitter stream for a single event was actually quite quick, though did require some knowledge of XML and XSLT. Now what else can we do with it? Well, it wouldn’t be hard now to create the book of the conference, using a tool like Blurb or Lulu, though you’d obviously lose all the benefit of the links. That can be something for the future….
Notes
1. Carey, Brian M. ‘Using the Twitter Search API: Create automated tweet searches.’ 4 Aug 2009. IBM developerWorks. 24 Sep 2012 <http://www.ibm.com/developerworks/xml/library/x-twitsrchapi/>
2. ‘REST API v.1.1 Resources.’ Twitter Developers. 24 Sep 2012 <https://dev.twitter.com/docs/api/1.1>.
3. Nottingham, M. and Sayre, R. ‘The Atom Syndication Format.’ RFC4287. Dec 2005. 24 Sep 2012. <http://tools.ietf.org/html/rfc4287>
4. Beer, Florian. “XSLT: convert ISO 8601 DateTime format.” 10 Apr 2008. Blog.No-Panic.At. 24 Sep 2012. <http://blog.no-panic.at/2008/04/10/xslt-convert-iso-8601-datetime-format/
Other tools for collecting Twitter streams include TweetDoc and Storify.