Never mind who I am, ask me about my credentials
Many (most) identity systems make a fundamental assumption that is built into their very architecture. This assumption creates three significant problems: privacy erosion; toxic data stores; and poor security.
The problem stems from the way traditional systems assume a model of identification, authentication, and authorisation. In one way or another, the question these systems want you to prove is that you have an identifier they recognise, before they look up their list of access rights to decide what, if anything, you can access and do.
Self-Sovereign Identity (SSI) models are built around the World Wide Web Consortium (W3C) standards of Verifiable Credentials and Decentralised Identifiers. In the SSI model, the unit of exchange is verifiable credentials (in whole, in part, or through zero knowledge proof), not identifiers.
This paper explores the reasons that the traditional, identifier based, digital identity model leads to unintended consequences for all parties involved.
Why identifier models aren’t the answer we need
Whilst understandable at a human social level, sharing who you are is often irrelevant in digital transactions and should never be demanded as the default. The real question is “can you prove that you have the credentials for the service you request?” Or to use Malcolm Crompton’s 5 word definition of digital identity, do you have the necessary “verifiable credentials about relevant attributes”. In this context, the attributes are relevant to the access or activity being negotiated, and the credentials prove the attributes to the extent required.
Most current identity systems and frameworks are a form of “identifier service provider”. Each time a new person (or entity) is onboarded, they are issued a unique identifier. The onboarding process can be onerous or simple, subject to meeting service, commercial and/or legal regulations required by the issuer. The identifier is used in subsequent transactions and the knowledge of who the identifier relates to is held by the issuing authority. Think Apple Id, login with Google, Facebook, or in Australia AusPost Digital Id, MyGov Id, and the DTA’s TDIF.
You prove you “own”, or rather, have been temporarily allocated, the identifier by knowing other factors bound to the identifier, such as a password and (preferably) several other factors. The security model that underpins these architectures has four key elements:
- identity creation (onboarding and binding a person to a created identifier);
- identification – matching the person to the identifier;
- authentication – proving that the person offering the identifier controls or owns the identifier; and
- authorisation – do they have the right to access the resource they have requested?
This architecture is flawed for three reasons: Privacy erosion; Toxic Data Stores; and perhaps most surprisingly, Poor Security. Let’s look at each of these in turn.
Every time you use an identifier provided to you by an issuing authority, from accessing your bank, to paying for a coffee, to opening your social media account, it is linked by the issuing authority to you as an individual. The issuing authority knows who you are (through the onboarding process), and knows what you’re doing (through the verification of your id by the receiving party). This is not a correlation risk, it is a correlation fact. Even if the receiving party doesn’t check the id they are offered, the recorded identifier is bound to you by the issuing authority and can be linked later through data analysis. Overtime, the usage pattern, the data “exhaust” you generate becomes as much, and often more, of a revealing disclosure than the original personal material you provided to be given the identifier.
Some of these systems (the better ones) implement a form of digital blinding by the issuing and verification service provider; digitally covering their eyes and ears, to try to be true to their promise of not watching what you’re doing. But the architecture means that they have to work hard to not know, and if they do know, not to remember and often regulations and operating practices mean logs are kept.
Some of course make no such effort. They are commercially interested in remembering and analysing your activities. Others may have other interests, or perhaps have other interests imposed upon them by other parties. It is common practice for online services to share data about you back to third parties, typically including Google and Facebook directly or indirectly.
Basically using an identifier that is strongly linked to you as a person is like using your passport everytime you want to buy a coffee. Massive overkill and fraught with risk.
Toxic Data Stores
A identifier based architecture demands that all organisations using (receiving) the identifier remember which identifiers have access to which resources (the authorisation bit). Often described in terms of access control lists, put in human terms, each organisation needs to remember the access and data rights linked to each identifier that they recognise. Often of course, they remember a whole bunch of other stuff, your name, address, date of birth etc. This data, and site specific content, is replicated in every site and has three toxic impacts:
- Honeypot for hackers. Each receiver becomes a honeypot for hackers because of the data they hold and how by hacking the central system, all access rights and information can be hacked.
- Vulnerable to coercion. It opens each receiver up to coercion by agencies/actors that force them to divulge what each identifier has been doing.
- Increases costs. More complex personal data stores massively increases the cost to receivers of responding to a freedom of information or GDPR like request ” tell me all that you know about me and what decisions you’ve made based on that data”.
When GDPR came into force, one University in the UK estimated that a single GDPR data request would cost them 20,000 GBP to service. In Berlin a rental agent was fined 14.5 million Euros in 2019 for violating the GDPR regulation.
It gets worse when you have more data about more people.
One of the tenets of a robust security architecture is the concept of small “domains” – that is, rather than build monolithic “one key for all locks” structures, we have many keys for many locks, and each key only gives unlocks the minimum resources.
This works both ways. A common military security approach is to share information on a “need to know” basis – I only share the minimum information required to satisfy a challenge response. I don’t give away everything about me every time I do something. This is good for me, and good for the receiver, since they are not burdened with knowledge that might cost them dearly in the future, they can honestly state “I don’t know”, and even under duress can’t divulge information that they shouldn’t have.
Of course, if the law demands a proof of the identity of the person behind an interaction (such as a Know Your Customer check for a bank), then so be it. The receiver needs to find out enough to satisfy the law. The law is then defining what are considered “relevant attributes”. However, for the vast majority of our daily transactions, we don’t need to achieve a 100 point check.
Traditional federated identifier models provide poor security for several reasons, let’s focus on two. Firstly it encourages the use of a single digital id for everything, which is like using the same key for everything. Secondly the “Access Control List” structure that it demands is considered inherently weaker than one based on credentials (or “capabilities”). For those that would like to learn more about these topics, these two papers Capability Myths Demolished and ACLs Don’t provide a deeper exploration.
So there you have it, the difference between SSI and other digital identity models is that SSI is based on the ownership and use of verifiable credentials, and other identity models are based on the allocation and use of identifiers.
What do you think?