Most sensitive data is stored in some database. The first layer of protection is encryption: the files of the database are encrypted, just like every transmission back and forth. This means that, even if an attacker were able to “listen” to the data transfers, or to get access to the physical files of the database, the data would be incomprehensible. For example, a message such as:
> Come on over for hot dogs and soda!
would be encrypted into:
Moreover, data in the database is protected with a combination of username and password known only to the database itself and the application talking to it. This way, only the application is able to consult the database.
Authentication of a user
Authentication is based on standard protocols. The server validates a user credentials, and sends back a session identifier that denotes the fact that that specific user is logged into the server. The session identifier is then saved as a cookie or on the filesystem (for native and desktop applications) in order to provide persistent logins.
Passwords and other sensitive data is hashed and salted in the database, meaning that no passwords are ever stored as plain text!
**Table-** and **row-** level security in multi-tenant systems
Most systems we build are meant to be used by multiple, different types of users. Not all of these users can see all the available data, according to a set of permission rules.
The permission models that are usually built are based on two quintessential models:
These models are so important that we have built tooling in order to automatically create the critical code for permission management. This means that mistakes are prevented, top quality is guaranteed, and a lot of time is saved. Talk about a win-win!
By sorting users into different roles, we can achieve the simplest level of permission management. For example, we could have roles `admin`, `editor`, and `user`. `admin` may do anything, including creating other users. `editor` may only create content, whereas `user` may only view content. We could also define different types of content, such as `products` and `product-pages`, and decide that a new role, `product-editor`, may only create a `product`, but not create a `product-page` (because publication of new pages is the sole responsibility of an `editor`).
Role-based authorization schemes are effective, but coarse. They usually focus on locking some large blocks of data up, at the table/entity level. Sometimes, saying that an `editor` can either edit all `products` or none at all is way too permissive, in those scenarios where we want to pick which entries may be edited and which entities may not.
Consider the following hierarchical data model for a webshop:
Where a `supplier` offers a series of `categories`, each containing a series of `products`. The entities, and the links between them, are all stored in the database.
An `editor` is an employee of the supplier, who must be able to edit data about products and categories. The permission scheme based on role alone would be insufficient here: an editor might be able to change information about the products and categories of others (even competing!) suppliers, which is clearly not acceptable.
The `editor` also has a link to his employing `supplier` though. This means that we could state that an `editor` may only edit those `products` which are **reachable** via the links in the database along the path:
editor -> supplier -> category -> product
With this scheme, we can protect data at a very fine level, and even easily grant or remove permissions from users by simply adjusting the link between them and the data they may manipulate.
Such a permission model based on relations is also particularly compatible with modern enterprise API protocols such as OData (click here for more information about OData) and GraphQL, and their concept of inline expansion of linked data. This means that building safety into our application does not imply any undesirable trade-offs with respect to other features.
In some critical cases, we need to add yet another layer of security into our applications. Consider for example the association between personal data such as name, social security number, address, etc. and medical data such as examination outcomes. It is clear that **such data must never be leaked**, because the stakes are as high as the reputation of a trusting customer in the platform!
The preferred security measure in order to minimise this risk is _pseudonymisation_. Instead of storing all data in a single database, which might be leaked (backups, lost or stolen credentials, etc.), we store it in multiple databases. This makes the life of a potential attacker much more difficult, because while gaining access to a single system is already a daunting enterprise, gaining access to multiple systems at the same time is nearly impossible.
Thus, instead of storing personal data, medical data, and links between the two, we store the personal data in a separate database, where each user is given a pseudo-random identifier. The medical data in the original database is then linked to the pseudo-random identifiers. An attacker who gains access to the medical data would then be able to discover that person `XXASD34234SDF2345OPI4RT59U` is unfortunately very sick, but would have no way of finding out who the identifier really belongs to!
This technique can be applied to all sensitive data, but with a word of caution: some queries that require given combinations of data coming from different databases will be more complex to build, adding to the maintenance costs of the project on the long term. More fun for our engineers!
We believe in our security so much that we are not afraid of being audited by third parties. We will facilitate third party auditors, penetration-tests, white-hat hackers, etc. in all possible ways.
We implement basic security measures along the lines of the OWASP top 10, and always use tools such as C#’s LINQ and type-safe serialization in order to make some security flaws *impossible* to accidentally build into our software.
Type-safety means that each variable must conform to a strict specification of its content, thereby making errors such as “the code expected a Person, but instead got an Admin” (a typical escalation of credentials attack) impossible to have at runtime, because the compiler itself would ensure that such ambiguous and unvalidated code is rejected from running. All bugs that are related to a mismatch between the data that the program needs to process when it is run, and the data that is expected, either become impossible in a type-safe language, or they become much harder to achieve, dramatically improving correctness and reducing the need for constant (and regression!) testing. As an added bonus, type-safe programming languages do not need to check the structure of the data they need to process at runtime, given the correctness guarantees that we already have built at compile time. This need for less runtime checks makes it possible for the compiler to create a faster executable, leading to the rule of thumb that type-safe also means faster.
Moreover, we always carefully update all of our technological stack proactively, so that new insights and improved libraries are immediately integrated into our projects.
In 2018, it is hard to justify using anything but the most advanced and reliable tools available.
Bring it on!!!