Hansard Reports
Debates (Commons), Debates (Lords), Westminster Hall
XML files containing Debates in the main chambers and in Westminster Hall from the start of the 2001 parliament (Commons) or 1999 reform (Lords). Speeches and the speaker are labelled with unique identifiers, as are divisions and how each MP or Lord voted.
Written Answers (Commons), Written Answers (Lords)
XML files containing Written Answers to questions MPs and Lords have asked ministers. Data from the start of the 2001 parliament (Commons) or 1999 reform (Lords). Questions and replies are clearly distinguished, and speakers labelled with their unique identifier.
Written Ministerial Statements (Commons), Written Ministerial Statements (Lords)
XML files containing statements which ministers made to the houses in writing. These are a bit like press releases, but in parliamentary language.
Getting the Data
By Browsing
You can browse the list of available files and download them individually at:
http://www.theyworkforyou.com/pwdata/scrapedxml/
By git
Warning: There is a lot of data, downloading it all may take a while.
This is currently not available, we hope to have it back soon.
By rsync
Warning: There is a lot of data, downloading it all may take a while.
The easiest way to get hold of the data currently stored for Hansard is via
rsync to data.theyworkforyou.com::parldata
. This is especially useful if you
e.g. want to update every day to have the latest files, but only want to
download the new or changed ones. You can see what’s available by running:
rsync data.theyworkforyou.com::parldata
You can then use rsync to retrieve content as necessary; the parsed XML files
are in the scrapedxml
directory. Check man rsync
for more information on
the available options.
We strongly recommend that where possible you use --exclude '.svn' --exclude
'tmp/'
switches, as these are used for processing and versioning, and aren’t
relevant to the data. To download all Commons main chamber debates from October
2012, you would use something like:
rsync -az --progress --exclude '.svn' --exclude 'tmp/' --relative data.theyworkforyou.com::parldata/scrapedxml/debates/debates2012-10-* .