BBC Media Dataset for Research
A data resource enabling University of Manchester researchers to carry out qualitative, quantitative and computational media analyses over a large collection of BBC TV programmes.
Overview
The BBC-AVS 10K Dataset has been recently shared with The University of Manchester (UoM) through the BBC Data Science Research Partnership (DSRP).
It consists of more than 10,000 TV programmes broadcast by the BBC between 2007 and 2017.
For each programme, audio, video, subtitle data and associated metadata are included.
Any student or researcher who is affiliated with UoM can access and utilise the dataset, subject to approval. The review will be carried out by a representative of the BBC and will be coordinated by Dr Riza Batista-Navarro, the UoM BBC DSRP Lead and Data Manager.
What's Included in the Dataset
The dataset consists of a total of 10,160 TV programmes broadcast by the BBC between 2007 and 2017 in the UK. The contents of these programmes were originally recorded between 1962 and 2017.
Due to compliance policies and editorial restrictions, some programmes were excluded from the dataset. To find more information about the specific programmes, a spreadsheet is available to UoM students and staff.
For information on the metadata about the TV programmes included in the dataset, as well as file formats, please read the detailed description prepared by the BBC.
To illustrate how the dataset is organised/structured, the BBC have provided some readily downloadable sample data (accessible only to UoM students and staff).
A document summarising the coverage of the BBC-AVS 10K Dataset is available for UoM students and staff.
To request access to the dataset, the following two documents need to be prepared:
- A completed Terms of Use Agreement, signed by the UoM student/researcher
- A one-page outline of the proposed project. While there is no specific format required, we suggest that you include the following information in the outline:
- Project Title
- Background and Rationale
- Research Questions/Objectives
- Research Design/Methodology (including which genre of TV programmes you intend to use)
- Expected Contributions
The above documents should be submitted to the UoM Data Manager using this form (accessible only to UoM students and staff). The outline will then be reviewed by a representative of the BBC.
If a researcher wishes to publish work that makes use of the dataset, they are required to share a draft of the publication with the BBC at least 30 days prior to submission.
They will ensure compliance and that no confidential information has been included in the publication.
The UoM Data Manager can help facilitate this process.
FAQs
No. We encourage anyone who finds the TV programmes relevant to their research, regardless of academic background or expertise, to make use of the dataset.
No. We encourage anyone who finds the TV programmes relevant to their research, regardless of academic background or expertise, to make use of the dataset.
The BBC have told us that recent changes to the DSRP programme mean they are unable to approve any research that will require the effort of legal or editorial policy teams.
This means that research that is complex for the BBC to sign off is unlikely to get approved.
This would include areas like:
- Research that involves identification of specific individuals, which will cause issues with data protection rules. For example, research on general voice segmentation is likely to be fine, but research on voice identification, which involves training on or identifying individual voices, is unlikely to be approved.
- Research that requires additional sign off by BBC News, Editorial Policy or Strategy. For example, research around bias or representation or on politically contentious issues usually requires additional effort which the DSRP no longer have the resources to offer. For that reason, such research topics are unlikely to be approved.
Please refer to the "Technical Details" section of the detailed description provided by the BBC here.
We advise you to download and explore the sample data provided by the BBC. Please see the "What's Included in the Dataset" section above.
Please allow 1-2 weeks for the BBC representative to review your proposed project, countersign the Terms of Use Agreement, and for the UoM Data Manager to come back to you with specific instructions for downloading the data.
Yes, we advise you to submit a distinct data access request for each intended project, so that you can tailor each project outline to a specific topic. You can however supply the same version of your signed Terms of Use Agreement.
Any academic researcher in UoM can request access to the dataset. Within the context of the BBC DSRP data sharing agreement between UoM and the BBC, "academic researcher" is defined as "employees, consultants, fellows, visiting researchers, post-graduate researchers, doctoral or masters students and undergraduate students".
No. You cannot share the dataset with your colleagues without them first signing a Terms of Use Agreement with the BBC. Hence, they will also have to submit a data access request according to the steps explained in the "Accessing the Dataset" section above.
Please contact the UoM Data Manager, Dr Riza Batista-Navarro, who will then arrange a convenient and secure way for you to share your paper draft with the BBC.
Contact Information
If you have any further questions, please contact the UoM BBC DSRP Lead and Data Manager, Dr Riza Batista-Navarro.