{"id":975,"date":"2012-10-02T11:53:51","date_gmt":"2012-10-02T10:53:51","guid":{"rendered":"http:\/\/forestrowfilmsociety.org\/news\/?p=975"},"modified":"2013-03-21T17:57:27","modified_gmt":"2013-03-21T16:57:27","slug":"creating-the-bffs-twitter-archive","status":"publish","type":"post","link":"https:\/\/forestrowfilmsociety.org\/news\/2012\/10\/creating-the-bffs-twitter-archive\/","title":{"rendered":"Creating the BFFS conference Twitter archive"},"content":{"rendered":"<p>This is an unusually geeky post for this blog, but several people asked how I&#8217;d created the <a href=\"http:\/\/forestrowfilmsociety.org\/natconf2012.html\">combined page of all the tweets for the BFFS national conference and Film Society of the Year awards<\/a> in a permanent, readable form.<\/p>\n<p>Twitter makes it easy for developers to do stuff with the data on the Twitter service by publishing its API. This is the reason why there are loads of Twitter applications, visualisation tools and much else, since they\u2019ve invariably been developed by someone outside Twitter, who quite reasonably take the view that you can\u2019t predict what users might want to do with this data, so it makes sense to give users the tools to build what they want.<\/p>\n<p>There are several useful blog posts about how to download Twitter content, and the most useful for me was that on the IBM site.<a href=\"#n1\"><sup>1<\/sup><\/a> This in turn pointed me to the Twitter API documentation<a href=\"#n2\"><sup>2<\/sup><\/a>, and using the two sources together meant that I could quickly find all of the conference tweets on the two hashtags using:<\/p>\n<pre>http:\/\/search.twitter.com\/search.atom?q=%23BFFSnatconf+OR+%23FSOY2012&amp;page=1&amp;rpp=100<\/pre>\n<p>This generated a feed in Atom format<a href=\"#n3\"><sup>3<\/sup><\/a>, so I could save the underlying XML data. Now, since the maximum number of results per page (\u201c<tt>rpp<\/tt>\u201d) this query can return is 100, I needed to change the value of the <tt>page<\/tt> parameter several times and re-run the query so I could get all the data. Clearly, this is OK for a few hundred items like we have but not a scalable solution.<\/p>\n<p>Next, I concatenated the important content of all six XML files together, to create a single well-formed file containing all the tweets. This gave us over 500 items, each of which had this structure:<\/p>\n<pre>&lt;entry&gt;\r\n   &lt;id&gt;tag:search.twitter.com,2005:249906722150424576&lt;\/id&gt;\r\n   &lt;published&gt;2012-09-23T16:23:00Z&lt;\/published&gt;\r\n   &lt;link type=\"text\/html\" href=\"http:\/\/twitter.com\/NewcastleCinema\/statuses\/249906722150424576\"\r\n       rel=\"alternate\"\/&gt;\r\n   &lt;title&gt;Get yourself to the bar!! @MinicineYorks #BFFSnatconf&lt;\/title&gt;\r\n   &lt;content type=\"html\"&gt;Get yourself to the bar!! @&amp;lt;a class=\" \"\r\n       href=\"https:\/\/twitter.com\/MinicineYorks\"&amp;gt;MinicineYorks&amp;lt;\/a&amp;gt; &amp;lt;em&amp;gt;&amp;lt;a\r\n       href=\"http:\/\/search.twitter.com\/search?q=%23BFFSnatconf\" title=\"#BFFSnatconf\" class=\"\r\n       \"&amp;gt;#BFFSnatconf&amp;lt;\/a&amp;gt;&amp;lt;\/em&amp;gt;&lt;\/content&gt;\r\n   &lt;updated&gt;2012-09-23T16:23:00Z&lt;\/updated&gt;\r\n   &lt;link type=\"image\/png\"\r\n       href=\"http:\/\/a0.twimg.com\/profile_images\/1640504972\/twitterAvatar_reasonably_small_normal.gif\"\r\n       rel=\"image\"\/&gt;\r\n   &lt;twitter:geo\/&gt;\r\n   &lt;twitter:metadata&gt;\r\n       &lt;twitter:result_type&gt;recent&lt;\/twitter:result_type&gt;\r\n   &lt;\/twitter:metadata&gt;\r\n   &lt;twitter:source&gt;&amp;lt;a href=\"http:\/\/twitter.com\/download\/iphone\"&amp;gt;Twitter for\r\n       iPhone&amp;lt;\/a&amp;gt;&lt;\/twitter:source&gt;\r\n   &lt;twitter:lang&gt;en&lt;\/twitter:lang&gt;\r\n   &lt;author&gt;\r\n       &lt;name&gt;NewcastleCinema (NCC)&lt;\/name&gt;\r\n       &lt;uri&gt;http:\/\/twitter.com\/NewcastleCinema&lt;\/uri&gt;\r\n   &lt;\/author&gt;\r\n&lt;\/entry&gt;<\/pre>\n<p>In addition, some of the tweets reference pictures from the weekend, and it would be nice to include them in the stream, rather than have to follow the links, so those required a bit of hand-editing of the data to insert them, using a <tt>@rel=\"enclosure\"<\/tt> to distinguish them in the data:<\/p>\n<pre>&lt;link rel=\"enclosure\" href=\"https:\/\/pbs.twimg.com\/media\/A3i_RJVCYAE-TuT.jpg\"\/&gt;<\/pre>\n<p>Now, there\u2019s lots of stuff in there that we don\u2019t really need, and it would be easier to read them all if they were in chronological order, for instance, so I then sorted them and tidied up the rendering to make a single HTML page using XSLT (see example below). In addition, the default date\/time format that you get back from Twitter is the ISO 8601 standard (\u201c<tt>2012-09-24T11:12:30Z<\/tt>\u201d), which isn\u2019t very readable; that can also be modified in the XSLT, for which I used a slightly modified version of code on an Austrian blog<a href=\"#n4\"><sup>4<\/sup><\/a>:<\/p>\n<pre>&lt;xsl:stylesheet version=\"2.0\" xmlns:xsl=\"http:\/\/www.w3.org\/1999\/XSL\/Transform\" xmlns:xs=\"http:\/\/www.w3.org\/2001\/XMLSchema\" xmlns:fn=\"http:\/\/www.w3.org\/2005\/xpath-functions\" xmlns:google=\"http:\/\/base.google.com\/ns\/1.0\" xml:lang=\"en-US\" xmlns:openSearch=\"http:\/\/a9.com\/-\/spec\/opensearch\/1.1\/\" xmlns:atom=\"http:\/\/www.w3.org\/2005\/Atom\" xmlns:twitter=\"http:\/\/api.twitter.com\/\" xmlns:georss=\"http:\/\/www.georss.org\/georss\" exclude-result-prefixes=\"google openSearch twitter georss xs fn atom\" xpath-default-namespace=\"http:\/\/www.w3.org\/2005\/Atom\"&gt;\r\n  &lt;xsl:output method=\"xml\" version=\"1.0\" encoding=\"UTF-8\" indent=\"yes\"\/&gt;\r\n  &lt;xsl:template match=\"@*|node()\"&gt;\r\n    &lt;xsl:apply-templates select=\"@*|node()\"\/&gt;\r\n  &lt;\/xsl:template&gt;\r\n  &lt;xsl:template match=\"\/\"&gt;\r\n    &lt;html&gt;\r\n      &lt;head&gt;\r\n        &lt;title&gt;#BFFSnatconf | #FSOY2012 : A conference in tweets&lt;\/title&gt;\r\n      &lt;\/head&gt;\r\n      &lt;body&gt;\r\n        &lt;h1&gt;#BFFSnatconf | #FSOY2012&lt;\/h1&gt;\r\n        &lt;h2&gt;A conference in tweets&lt;\/h2&gt;\r\n        &lt;h3&gt;BFFS National Conference and Film Society of the Year awards, 2012&lt;\/h3&gt;\r\n        &lt;xsl:for-each select=\"atom:feed\/atom:entry\"&gt;\r\n&lt;!-- Sort the items in chronological order --&gt;\r\n          &lt;xsl:sort select=\"atom:published\" order=\"ascending\"\/&gt;\r\n          &lt;div&gt;\r\n            &lt;p class=\"dateline\"&gt;On &lt;a&gt;\r\n                &lt;xsl:attribute name=\"href\"&gt;&lt;xsl:value-of select=\"atom:link[@rel='alternate']\/@href\"\/&gt;&lt;\/xsl:attribute&gt;\r\n                &lt;span class=\"date\"&gt;\r\n                  &lt;xsl:call-template name=\"FormatDate\"&gt;\r\n                    &lt;xsl:with-param name=\"DateTime\" select=\"atom:published\"\/&gt;\r\n                  &lt;\/xsl:call-template&gt;\r\n                &lt;\/span&gt;&lt;\/a&gt;, &lt;a&gt;\r\n                &lt;xsl:attribute name=\"href\"&gt;&lt;xsl:value-of select=\"atom:author\/atom:uri\"\/&gt;&lt;\/xsl:attribute&gt;\r\n                &lt;span class=\"author\"&gt;&lt;xsl:value-of select=\"atom:author\/atom:name\"\/&gt;&lt;\/span&gt;\r\n              &lt;\/a&gt; wrote:&lt;\/p&gt;\r\n            &lt;p class=\"content\"&gt;\r\n              &lt;xsl:value-of select=\"atom:content\" disable-output-escaping=\"yes\"\/&gt;\r\n              &lt;xsl:if test=\"atom:link[@rel='enclosure']\"&gt;&lt;xsl:apply-templates select=\"atom:link[@rel='enclosure']\" \/&gt;&lt;\/xsl:if&gt;\r\n            &lt;\/p&gt;\r\n          &lt;\/div&gt;\r\n        &lt;\/xsl:for-each&gt;\r\n      &lt;\/body&gt;\r\n    &lt;\/html&gt;\r\n  &lt;\/xsl:template&gt;\r\n    &lt;xsl:template name=\"FormatDate\"&gt;\r\n        &lt;xsl:param name=\"DateTime\" \/&gt;\r\n        &lt;xsl:variable name=\"mo\"&gt;\r\n           &lt;xsl:value-of select=\"substring($DateTime,6,2)\" \/&gt;\r\n        &lt;\/xsl:variable&gt;\r\n        &lt;xsl:variable name=\"day\"&gt;\r\n           &lt;xsl:value-of select=\"substring($DateTime,9,2)\" \/&gt;\r\n        &lt;\/xsl:variable&gt;\r\n        &lt;xsl:variable name=\"year\"&gt;\r\n           &lt;xsl:value-of select=\"substring($DateTime,1,4)\" \/&gt;\r\n        &lt;\/xsl:variable&gt;\r\n        &lt;xsl:variable name=\"time\"&gt;\r\n           &lt;xsl:value-of select=\"substring($DateTime,12,8)\" \/&gt;\r\n        &lt;\/xsl:variable&gt;\r\n        &lt;xsl:variable name=\"hh\"&gt;\r\n           &lt;xsl:value-of select=\"substring($time,1,2)\" \/&gt;\r\n        &lt;\/xsl:variable&gt;\r\n        &lt;xsl:variable name=\"mm\"&gt;\r\n           &lt;xsl:value-of select=\"substring($time,4,2)\" \/&gt;\r\n        &lt;\/xsl:variable&gt;\r\n        &lt;xsl:variable name=\"ss\"&gt;\r\n           &lt;xsl:value-of select=\"substring($time,7,2)\" \/&gt;\r\n        &lt;\/xsl:variable&gt;\r\n           &lt;xsl:value-of select=\"$day\"\/&gt;&lt;xsl:text&gt; &lt;\/xsl:text&gt;\r\n                &lt;xsl:choose&gt;\r\n          &lt;xsl:when test=\"$mo=1\"&gt;January&lt;\/xsl:when&gt;\r\n          &lt;xsl:when test=\"$mo=2\"&gt;February&lt;\/xsl:when&gt;\r\n          &lt;xsl:when test=\"$mo=3\"&gt;March&lt;\/xsl:when&gt;\r\n          &lt;xsl:when test=\"$mo=4\"&gt;April&lt;\/xsl:when&gt;\r\n          &lt;xsl:when test=\"$mo=5\"&gt;May&lt;\/xsl:when&gt;\r\n          &lt;xsl:when test=\"$mo=6\"&gt;June&lt;\/xsl:when&gt;\r\n          &lt;xsl:when test=\"$mo=7\"&gt;July&lt;\/xsl:when&gt;\r\n          &lt;xsl:when test=\"$mo=8\"&gt;August&lt;\/xsl:when&gt;\r\n          &lt;xsl:when test=\"$mo=9\"&gt;September&lt;\/xsl:when&gt;\r\n          &lt;xsl:when test=\"$mo=10\"&gt;October&lt;\/xsl:when&gt;\r\n          &lt;xsl:when test=\"$mo=11\"&gt;November&lt;\/xsl:when&gt;\r\n          &lt;xsl:when test=\"$mo=12\"&gt;December&lt;\/xsl:when&gt;\r\n        &lt;\/xsl:choose&gt;&lt;xsl:text&gt; &lt;\/xsl:text&gt;\r\n                &lt;xsl:value-of select=\"$year\"\/&gt;&lt;xsl:text&gt;, &lt;\/xsl:text&gt;\r\n                &lt;xsl:value-of select=\"$hh\"\/&gt;&lt;xsl:text&gt;.&lt;\/xsl:text&gt;\r\n                &lt;xsl:value-of select=\"$mm\"\/&gt;&lt;xsl:text&gt;:&lt;\/xsl:text&gt;\r\n                &lt;xsl:value-of select=\"$ss\"\/&gt;\r\n    &lt;\/xsl:template&gt;\r\n&lt;!-- Handle the manually-added pictures --&gt;\r\n    &lt;xsl:template match=\"atom:link[@rel='enclosure']\"&gt;\r\n    &lt;div class=\"image\"&gt;&lt;img alt=\"Image referenced in the post\" &gt;&lt;xsl:attribute name=\"src\"&gt;&lt;xsl:value-of select=\"@href\"\/&gt;&lt;\/xsl:attribute&gt;&lt;\/img&gt;&lt;\/div&gt;\r\n    &lt;\/xsl:template&gt;\r\n&lt;\/xsl:stylesheet&gt;<\/pre>\n<p>So, creating the Twitter stream for a single event was actually quite quick, though did require some knowledge of XML and XSLT. Now what else can we do with it? Well, it wouldn&#8217;t be hard now to create the book of the conference, using a tool like <a href=\"http:\/\/www.blurb.com\">Blurb<\/a> or <a href=\"http:\/\/www.lulu.com\">Lulu<\/a>, though you&#8217;d obviously lose all the benefit of the links. That can be something for the future&#8230;.<\/p>\n<h4>Notes<\/h4>\n<p id=\"n1\">1. Carey, Brian M. \u2018Using the Twitter Search API: Create automated tweet searches.\u2019 4 Aug 2009. <em>IBM developerWorks<\/em>. 24 Sep 2012 &lt;<a href=\"http:\/\/www.ibm.com\/developerworks\/xml\/library\/x-twitsrchapi\/\">http:\/\/www.ibm.com\/developerworks\/xml\/library\/x-twitsrchapi\/<\/a>&gt;<\/p>\n<p id=\"n2\">2. \u2018REST API v.1.1 Resources.\u2019 <em>Twitter Developers<\/em>. 24 Sep 2012 &lt;<a href=\"https:\/\/dev.twitter.com\/docs\/api\/1.1\">https:\/\/dev.twitter.com\/docs\/api\/1.1<\/a>&gt;.<\/p>\n<p id=\"n3\">3. Nottingham, M. and Sayre, R. \u2018The Atom Syndication Format.\u2019 RFC4287. Dec 2005. 24 Sep 2012. &lt;<a href=\"http:\/\/tools.ietf.org\/html\/rfc4287\">http:\/\/tools.ietf.org\/html\/rfc4287<\/a>&gt;<\/p>\n<p id=\"n4\">4. Beer, Florian. \u201cXSLT: convert ISO 8601 DateTime format.\u201d 10 Apr 2008. <em>Blog.No-Panic.At<\/em>. 24 Sep 2012. &lt;<a href=\"http:\/\/blog.no-panic.at\/2008\/04\/10\/xslt-convert-iso-8601-datetime-format\/\">http:\/\/blog.no-panic.at\/2008\/04\/10\/xslt-convert-iso-8601-datetime-format\/<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is an unusually geeky post for this blog, but several people asked how I&#8217;d created the combined page of all the tweets for the&#8230;<\/p>\n","protected":false},"author":1,"featured_media":992,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[23],"tags":[],"_links":{"self":[{"href":"https:\/\/forestrowfilmsociety.org\/news\/wp-json\/wp\/v2\/posts\/975"}],"collection":[{"href":"https:\/\/forestrowfilmsociety.org\/news\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/forestrowfilmsociety.org\/news\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/forestrowfilmsociety.org\/news\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/forestrowfilmsociety.org\/news\/wp-json\/wp\/v2\/comments?post=975"}],"version-history":[{"count":23,"href":"https:\/\/forestrowfilmsociety.org\/news\/wp-json\/wp\/v2\/posts\/975\/revisions"}],"predecessor-version":[{"id":1634,"href":"https:\/\/forestrowfilmsociety.org\/news\/wp-json\/wp\/v2\/posts\/975\/revisions\/1634"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/forestrowfilmsociety.org\/news\/wp-json\/wp\/v2\/media\/992"}],"wp:attachment":[{"href":"https:\/\/forestrowfilmsociety.org\/news\/wp-json\/wp\/v2\/media?parent=975"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/forestrowfilmsociety.org\/news\/wp-json\/wp\/v2\/categories?post=975"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/forestrowfilmsociety.org\/news\/wp-json\/wp\/v2\/tags?post=975"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}