Government Digital Service Podcast

Government Digital Service Podcast #23: The Data Standards Authority

September 30, 2020

Alison Pritchard:

Hello and welcome to this month's episode of the Government Digital Service Podcast. I'm Alison Pritchard, the Director General at GDS - before taking up appointment at the ONS [Office for National Statistics] as its Deputy National Statistician and Director General for Data Capability. 

 

So I'm delighted that, although I'm moving, I'll still be part of the wider digital and data transformation agenda through ONS’s digital and data services, and our work on data governance boards. 

 

GDS is responsible for the digital transformation of government. As part of that, we’ve set a vision for digital government to be joined up, trusted and responsive to user needs. We're focussing on 5 pillars to get that done, one of which is data - the focus of this podcast. 

 

Government holds considerable volumes of data in a myriad of places. But often this data is inconsistent, incomplete or just unusable. If the government is going to realise the benefits data can bring, we'll need to fix the foundations. And one way of doing this is by focussing on data standards. 

 

GDS is leading a new authority, the Data Standards Authority (DSA), that focuses on making data shareable and accessible across government services. The metadata standards and guidance we published in August were our first deliverable. They cover what information should be recorded when sharing data across government - for example in spreadsheets - to assure it's standardised and easy to use. It's a step in quality assuring how government data is shared. Our focus on standards is one part of the bigger picture around better managing data to assure better policy outcomes and deliver more joined-up services to citizens. 

 

That's all from me. I'll now hand over to Vanessa Schneider, the podcast host, who will be speaking to technical leads from GDS and ONS about how we take this work forward. Enjoy the discussion. 

 

Vanessa Schneider: 

Thank you Alison. As Alison said, I’m Vanessa Schneider, Senior Channels and Community Manager at GDS and your host today. Joining me are Rosalie Marshall and Tomas Sanchez. Rosalie, let's start with you. Can you please introduce yourself and what you do?

 

Rosalie Marshall:

I'm Rosalie. I'm the Technical Lead for the Government Data Standards Authority. That involves a lot of recruitment, looking and getting work streams off the ground relating to data standards, and just looking at the data standards landscape in detail.

 

Vanessa Schneider: 

Thank you, Rosalie. Tomas, could you please introduce yourself? 

 

Tomas Sanchez:

Yes. So I'm Tomas. I'm the Chief Data Architect for ONS [Office for National Statistics]. And I'm responsible for a bunch of things related to data architecture and data management. So one of those things is the ONS Data Strategy. And amongst the various things that my division in ONS does is best practices around data. 

 

One of the things that we work on is data standardisation. So apart from that, I'm also quite keen, and responsible to talking to various departments across government about all the things that we do with the aim of, you know, being on the same page of best practises and so on. And this is how we got in touch with the Data Standards Authority and other streams in central government.

 

Vanessa Schneider: 

You mentioned that your area covers data standards in government. What does that entail?

 

Tomas Sanchez:

So basically, the whole point of standardisation is to make sure that everybody uses the same things, particularly related to data. And it is, it is good that ONS is trying to do this. But we cannot do this by ourselves. Doing this in a coordinated way through, sort of, central authority like the DSA is very helpful. 

 

While ONS has its own standards, to do what we need to do in ONS, there is, we need to agree amongst the different departments of what it is that we are trying to standardise, and the scoping of this and what things we’re doing first and we are doing second and so on is part of what the DSA is about.

 

Vanessa Schneider:

Rosalie, so you work as part of the DSA. How do you work together with Tomas on this issue?

 

Rosalie Marshall:

So, yes. So this is a joint actually endeavour between the Government Digital Service and ONS. So we're actually partnering up on the Data Standards Authority. So while we are at the central point in GDS, we are working very closely with ONS and actually a number of our team members will sit within ONS. 

 

The good thing about being virtual is that we've really been able to work very tightly together and department lines haven't played much of a part.

 

Vanessa Schneider:

So, as Rosalie mentioned, the Data Standards Authority is very new. Would you mind sharing with the listeners how it came about? What kicked it all off? 

 

Rosalie Marshall: 

So the Data Standards Authority was kicked off about roughly at what was probably just over a year ago now in terms of idea. So that was done by DCMS, the Department for Digital, Culture, Media and Sports, who at that point looked after data policy for government and they worked with a number of departments on this bid, including, it was mainly actually GDS and ONS. So we've been working together now for a while on what this should look like. And since March, it's become a reality.

 

Tomas Sanchez

So when I joined ONS in 2017, apart from looking internally at the office to see what we should do internally for better practices in terms of data management, we also thought that it was very important to look across government and see what other people are doing so we can learn from others and hopefully maybe others can learn from us eventually. 

 

One of the things that we did is setting up the Cross-government Data Architecture Community, which was just a community of practitioners around data architecture and data management, which of course included data standardisation, amongst other things. Apart from this community, we also got involved in a number of forums in central government, looking at data and data usage and data infrastructure and other things, such as, for example, the Data Leaders Network. And it was within these conversations within central government that we got in touch with DCMS and GDS, who were also thinking about how to work on data foundations and data infrastructure for government to enhance data sharing, data interoperability, and just how to use data better in government. 

 

And it was that way that the idea of creating a central authority in charge of fixing one of the fundamental problems of data, which data standardisation tends to be. So as Rosalie mentioned, we worked quite a long time with them for various reasons. Listeners might remember that there was supposed to be a spending review in 2019, which never happened. So that gave us a lot of time to think about how to go, how to, how to do this. And eventually we did put a bid for the budget this year, earlier this year. 

 

And then that's how the Data Standards Authority got funded and the rest is history obviously.

 

Vanessa Schneider: 

So looking to the future of the DSA, what are your immediate next goals? I know that you've put out pieces of guidance, for instance.

 

Rosalie Marshall:

So the big ones are, we've got an API catalogue that is trying to, it's not a workstream that is actually setting a standard in data, but it's helping us with our journey on standards because we need transparency of where data exchange is taking place. 

 

I think it's important that we mention that, you know, we are looking at data flow as a priority. There's a lot that you can do within departments in terms of governance. But really, we're looking at that boundary and the data exchange that is happening between departments and how we can improve that.

 

So as a first off, you know, we are getting the API catalogue into a service or product that is really worthwhile for departments to use. We want to make sure that there's a lot more uptake of that catalogue on there to increase transparency of development taking place, but also so we can understand the standards that are being used by APIs. So that's one workstream.

 

So one of the big work streams that we got off the ground is relating to metadata standards. And that was a very entry level piece of, very entry level standard, in some ways. We're recommending that we follow schema.org and Dublin Core and also csv on the web. So that's a recommendation that we are now working with departments further along on their metadata journeys. We got a workshop coming up on the 2nd October that we'd like as many people to join as possible to understand where everyone's at.

 

We're also looking at standards in relation to file formats and doing some work there. And then I think there's 2 areas which probably Tomas is best placed to talk about and that’s around what we're thinking about at least. So it's it's probably too early days, but at least we can share some of the thinking that we're doing around some of the identifiers and also data types as well.

 

Tomas Sanchez:

So Rosalie, mentioned about identifiers, I think the overall concept is that something that we call reference data that people might know with different names, like master data or code list or typologies etc. So there are multiple names of, for those.

 

But essentially the idea from this is that there are lists of items or entities that people refer to all the time. So we think there are datasets, for example, many datasets contain address information. So the idea is, so there is only one valid list of all the addresses in the country. So if we will have a reference set of addresses that everybody can refer to, then it will be easier to link datasets amongst themselves that are talking about those addresses, right?

 

And you can make the same case for other types of things, like the standard classifications or lists of businesses or things like that, which government departments refer to all the time to do their work, but that there is not one version of the truth for the whole government just because we didn't get to do that yet together. 

 

And I think that is basically the foundations of making sure that we can link data sets across government more easily. And of course, part of that as Rosalie was mentioning is that you need to have a unique identifier for each one of these addresses or these entities. Right. So this is definitely something that we need to look at as part of the standardising data, but reference data as a whole is, as I said, a key piece of the puzzle to standardise data across government.

 

The other thing that Rosalie mentioned there is data types. So obviously if we are sharing data across departments, which is of a specific type, for example, a date. So if we maintain different standards for dates, so we record the data for different ways for dates, then when we get data from other departments, then we have to transform that into a format that we can use internally. And that transformation, maybe dates doesn't sound very complex, but you have to do this for more complex data - types of data. Then it becomes quite time consuming. 

 

So if we get to manage to standardise data types and then departments are able to adopt this. Again, we are not only helping them on their work, they have to do for themselves so they don't have to think about what to use. So we provide guidance of what data type standards they can use. But also when we get to share data, then we already have the same format that we are using internally. So it's much easier to process.

 

Vanessa Schneider: 

The term metadata has cropped a few times now. Can you explain what kind of data that is please?

 

Tomas Sanchez: 

So when people ask me, what metadata is, I always think about, you know, everybody knows libraries. People have used libraries. You go to the library, you have a lot of books in a lot of shelves, and you have to find the book that you are looking for. So the books themselves are the content, are the data. Right. But we need to find a way of finding things efficiently. If we had every book indexed in a different way and we stored different type of information for each book, it would be very difficult to do it.

 

But as we all have been in libraries, we know that you have a catalogue where you go and then you have the title of the book and the author of the book you can search for either you can search for date or you can search for other thing. So that information that we are storing about the book, which is the content, that's what is metadata, so it’s information about the data itself. Right. So. So all the data centre are not books is exactly the same thing. We have to find a consistent way of describing the data so that we can catalogue it better.

 

Vanessa Schneider: 

Rosalie, would you mind explaining to the listeners what an API is? I hear that's a challenging question. 

 

Rosalie Marshall: 

It is a challenging question just because everyone has a different answer. So an API is just another one of our lovely acronyms that we have in government. It stands for application programming interface, so, and that kinda tells you what it is, it’s the interface for your application. APIs come up in talking about data exchange. 

 

The way I guess you can kind of start to understand it, I think I started to understand it when someone talked to me about an API being like a restaurant menu. It tells you what’s on, what you can have, from an application. So, you know, if your, an API will talk about all the different features within an application that you need to be aware of in order to interact with that. 

 

Vanessa Schneider:
I understand that you're also expecting to set standards for memorandums of understandings, also known as MoUs. Can you please explain a bit more about what that means? 

 

Rosalie Marshall:

So in terms of the MoUs, so they are, you know, those and data-sharing agreements are formed within the public sector when data exchange is being passed from one entity to another. And the difficulty with the landscape at the moment is that the MoUs and data-sharing agreements take lots of different forms, cover lots of different areas. 

 

And it's quite a big undertaking when forming these because legal teams often need to be involved. And there's obviously a lot to think about when we're working on a data-sharing agreement.

 

So it's just really bringing standards to this area so that we can improve efficiency in data-sharing and make it easier for those who want to consume data, particularly on local authorities I think. You know there’s, local authorities are not a big API developers at the moment, but they consume a huge amount of government data from all, all over government and loads of departments. So for them, it's a big undertaking when it comes to MoUs. So actually kind of simplifying the process and all, all conforming to a certain standard and template is a good way forward. So that's something that we're starting to look at.

 

Vanessa Schneider: 

So, you've touched on a couple of topics, such as the identifiers and transparency, and it seems like ethics are quite an important component of that. I know that in 2018 there was a Data Ethics Framework that was published. 

 

Rosalie Marshall: 

The Data Ethics Framework is not a piece that’s happening in the Data Standards Authority. But it's obviously something that we need to be aware of and tapped into.

 

We are updating a number of different pieces of guidance, for example, at the moment, we're redrafting Point 10 of the Technology Code of Practice, which relates to data. And, you know, we're also updating the government API standards. And so we're working on new guidance and standards as part of the DSA. 

 

And obviously something that we need to be aware of when doing that is the Data Ethics Framework, which is a framework that sets out principles for how data should be shared in the public sector and really builds on the Civil Service Code in some ways, so it builds on the idea of managing data with integrity, honesty, objectivity and impartiality. So it's just, I mean, there's probably other people who can give you better summaries. But, yeah, it's important to be aware of when writing any guidance on data.

 

Vanessa Schneider:

I was just wondering, Rosalie, if you knew of a success that government has had where we've started standardising data. 

 

Rosalie Marshall:

Yes. So there are a number of different successes that we could point to. I mean, there's I think, the API standards were one area that has been very successful in terms of setting central government standards and having other departments follow them. So the API standards were launched in 2018 and have been iterated with the API and data exchange community. But we know that a lot of departments are following these standards and are building their API strategies around them. 

 

The reason why it's important to follow the API standards are for consistency in terms of API development, but also in terms of better data flow because of following the data standards that exist. You know, we refer to the ISO date standard, for example, in the API standards. 

 

It also ensures that APIs are developed securely, that transfer can happen in the right way. And that versioning again, is clear. So, as I said, there are benefits.

 

There's also benefits in terms of findability for following these standards, in terms of people moving around development teams and having the right skills, knowing what skills you need for API development. 

 

So that's one example of where we've been successful in setting government standards relating to data centrally. 

 

There's also examples of government using data where it's a positive experience. And I think that's really around moving to the delivery of whole services. So rather than a citizen having to interact with one department for a particular service, they can just think about interacting with the service and you know, the numbers of departments that help support that service isn't something they need to know about. 

 

So, for example, one of the services that has been on the transformation towards being a whole service is that of the Blue Badge scheme, which is managed by the Department for Transport [DfT] and is a scheme that gives those with disabilities access to restricted parking areas. 

 

So, you know, previously local authorities had to kind of manage the eligibility for this scheme. And, you know, they would have many applications, some that wouldn't be successful. I think they received kind of, around 2,500 applications a month that they had to deal with. But, and then there were obviously lots of different data exchanges that happened with the Department for Transport and local councils, before a Blue Badge could be given to the applicant.

 

But now a Blue Badge user goes to GOV.UK to have their eligibility confirmed. And then an API seamlessly links the customer back to the local council’s case management system for the application process. Once approved, another API links back to the central system, to store the record, and then at this point, the Blue Badge is produced and sent to the customer centrally by DfT. So it's a lot of a smoother system. 

 

And I guess what's next is integration through APIs with some of the other departments that are involved in Blue Badges like DWP [Department for Work and Pensions], which has to produce the letter of eligibility. A citizen needs that to upload onto GOV.UK and like the Passport Office, where you need to provide a picture of you and proof of your identity. So, you know, there's still a way to go on a service like that, but it shows the direction in which, you know, where government services are heading. 

 

Vanessa Schneider: 

Thank you, Rosalie. Tomas, I was wondering what kind of challenges do you foresee in establishing data standards across government? I assume with the ONS you interface with a lot of departments providing data. Do you have any idea?

 

Tomas Sanchez: 

So indeed, we do interface with a lot of departments. Obviously doing this at ONS’s scale, and doing this at a government scale is quite a different thing.

 

But I think definitely the area that's probably going to be a challenge is the governance in the sense that we put guidelines of how people, how other departments can approach standardisation, but making sure that people actually or departments actually follow the system, that is is, it is a different thing. Right?

 

So obviously, how to approach this is a delicate thing. Obviously, departments want to continue doing their job without having interference in terms of how they have to do their job. 

 

But we in central government believe that doing this, following certain standards is in the end more beneficial for the government as a whole. And we need to try to put something in there to make sure that these guidelines are adopted. So how exactly to do that? How to incentivise departments to actually do this? I think it's going to be quite a tough challenge. 

 

Vanessa Schneider: 

Would that kind of enforcement lie with the DSA or is it something that can be incentivised in another way do you think?

 

Tomas Sanchez:

I wouldn't like to call it enforcement, incentivising is, is a better word. I think there are different ways of doing it. You think about GDS is already doing this with the IT and digital in different ways. Probably the best way of approaching it is using the existing mechanisms and include the data standardisation within those. So hopefully we can exist, we can reuse existing things without having to add new layers of complexity to how certain things are incentivised. 

 

Vanessa Schneider: 

I can tell that you're both very passionate about data and making sure that government has usable data and is able to share data with each other to make it services better for citizens. I was wondering, where does that come from?

 

Rosalie Marshall:

Personally, like, you know, you'd have lots of circumstances in your life. And I guess some people have more interactions with the state than others. You know, depending on your health, you know, whether you have kids.

 

And I guess, like, I've probably had a fair amount. So it comes from just understanding that frustration of another organisation having data about me that might not be accurate or that, or them not having it at all. And I'm wondering why that is. You know, I've had 2 kids on the NHS system. It was frustrating to me, for example, that the hospital didn't have any of the records that I’d had my first child with. And there was no way to get those records. So I then started creating my own records and holding all the data myself. 

 

And there's so many examples that I've gone through. And I'm sure you know, there's so many people in this boat and it's just wanting to fix things and wanting to make data work for the end user. 

 

But also, as a civil servant, I see silos and it's sometimes frustrating when you realise that, you know, through no fault of an individual, because this is just as we know, this is the system that needs improving, it's not one organisation or individual. We just need to fix this. So, so if we can create standards that everyone can use and that's why we're focusing on international, on open standards, because those are the ones that can cross boundaries and that, you know, it's not just going to be working in one department, but it will help join up both central and local and the wider public sector. 

 

Vanessa Schneider: 

Thank you. Tomas, is there maybe a service that you hope you are able to change through the standardisation of data?

 

Tomas Sanchez: 

So I think data is such at the core of everything that not only government, but every organisation in this country does, that having a right way of standardising the data and making the data clear so everybody can understand it better will basically, virtually benefit, not just the organisations themselves that are doing the services, but also the users of those services.

 

And if we think about government and we see government as an organisation which provides services to the users based on data that actually government collects from the users themselves. Then you need to have some opportunity to enhance that service. And that's exactly what we want to do.

 

Vanessa Schneider: 

Rosalie, any thoughts? 

 

Rosalie Marshall:

I think, you know, yeah, I would agree with Tomas. I think there’s a lot of priority areas that that need improvement, for example, social care. You know, I talked about delivering whole services for users and things like the Blue Badge scheme, which is, which I see as very important. 

 

But there's also, you know, bringing, you know, the social care, those departments that are involved there and allowing them to share data to help those who are vulnerable. There's also a lot, you know, in terms of the environment where, you know, sharing data between the energy sector and you know Ofgem and some of the big energy companies, there's a huge amount there that we could do with improved data standards as well. 

 

So I think there's so many things that we can make better in public life with data if it's done right. And so, yeah, just I mean, I can’t pick one area really. 

 

Vanessa Schneider: 

That was unfair of me. I'll give you that. You mentioned it earlier, there was a way for listeners to get involved, if I'm not mistaken. Could you please remind us what that opportunity was? 

 

Rosalie Marshall:

There’s quite a lot of ways people can get in touch. There's a number of workshops that are coming up that we'd really like cross-government engagement on and attendance. 

 

So we've got an API catalogue workshop for the API community. We also have a metadata workshop coming up on the 2nd October for those who are working in metadata and we're planning to blog a lot more about the work that we're doing. So we invite people to comment on those blogs and get in touch if they want to talk to us. We're also looking at having an open repo on GitHub to help share some of our work and invite feedback on that as well. So, yeah, we're hoping to make it really easy to contact us. And we do have an email address as well that people can write to, which is data-standards-authority@digital.cabinet-office.gov.uk. So that’s also open to everyone to use.

 

Vanessa Schneider: 

Thanks, Rosalie. It's not the easiest one to spell out, but we'll make sure to include it in our show notes. 

 

I really appreciate you giving me your time so that we could record this episode. Thank you so much to all of our guests for coming on today. You can listen to all the episodes of the Government Digital Service Podcast on Apple Music, Spotify and all other major podcast platforms. The transcripts are available on Podbean.

 

Goodbye. 

 

Tomas Sanchez: 

Thank you, bye. 

 

Rosalie Marshall

Thanks so much for having us. Bye.

 

Play this podcast on Podbean App