Schrödinger’s Data Types

Bosque del Apache National Wildlife Refuge

Schrödinger’s Data Types

On Values that May Be Simultaneously Required and Not Available

Monday, July 20, 2020

I only recently started experimenting with nullable reference types in C#. (Yes, I’m a laggard.) I decided to convert an ASP.NET Core (API) project to use the feature. So far, I have succeeded in getting rid of all the warnings that swamp you the first time you enable the setting, and, overall, my experience has been very positive. In this post, however, I’d like to make two observations:

Some values may be required and still missing (i.e., null) – specifically, I find it hard to avoid this problem when dealing with external systems, such as HTTP clients (or even databases); as a result, I caught myself possibly overusing the “overrides” that allow you to effectively suppress the compiler’s well-intentioned cautionary voice
When converting my application, I realized that I (we) often, perhaps sub/unconsciously, fall back on default values in values types, and I started questioning the practice

I love the new feature. But I also wonder if it could benefit from a few accompanying runtime checks. At the risk of possibly (i.e., without a doubt) embarrassing myself, I’m going to show one way this could be accomplished by introducing a new C# construct that, if it existed, would (at least partially) address my concerns – and maybe others.

Demo application

To illustrate my points, in a distilled form, I built a simple demo client/server API application. You can find the source code on GitHub. It’s a solution with two console app projects (server and client). Although the server is an actual HTTP server (and the client an actual HTTP client), I decided to implement everything manually, using HttpListener and HttpClient, as opposed to relying on a framework (e.g., ASP.NET Core). The “product” also contains a rudimentary in-memory repository as a stand-in for an actual persistence mechanism. I made these decisions for two reasons:

I wanted to keep things simple with fewer dependencies (web server, database, etc.)
Some of the experiments I present below wouldn’t work with the built-in ASP.NET Core JSON serializer, validation engine, Entity Framework Core, etc.

The server exposes a primitive CRUD API for two working with departments and employees. Don’t worry about the code too much – I admit that I ran out of my stockpile of duct tape to cobble it together – I will show the important parts as they become relevant 😊

The client provides a text-based interface for calling the server – it takes input from the console, and it shows the HTTP request it makes, including the verb, the address, and the (JSON) content of the request. It also prints the status code and the response body received. When prompted for any value (e.g., employee first name), you can just hit enter to enter null. I’m not going to offend you by describing the app in detail – if you want to follow along, I’m sure you’ll find the interface straightforward 😊 Please note that the client exists merely as a convenience – you could achieve the same goals with, say, Postman.

All the code snippets below come from this demo app.

Those bloody external systems

Let’s take a step back. The “official” documentation on reference-type “nullability” states that you shouldn’t “use this feature to remove all null values from your code. Rather, you should aim to declare your intent to the compiler and other developers…” Further, the idea is for you to “write code where the compiler enforces those design decisions.”

In other words, conceptually, some fields, variables, or properties must always have values, while in others, not applicable or not available represent perfectly valid states. Reference-type nullability should allow you to better express these concepts in code – it should bring your code closer to the conceptual/domain model.

So far so good – I love it! The problem I encountered in my Web API project, though, is that sometimes a value is conceptually required and yet still not available. Consider the following model class representing the request to create/update an employee:

public class EmployeeRequest
{
    public int DepartmentId { get; set; }

    public string FirstName { get; set; }

    public string LastName { get; set; }

    public DateTime DateOfBirth { get; set; }

    public DateTime? DateOfDeath { get; set; }
}

Both FirstName and LastName are required, non-nullable, and when you compile this snippet, you’ll get two instances of warning CS8618: Non-nullable property ‘***’ is uninitialized. Consider declaring the property as nullable.

What do I do?

I don’t want to follow the compiler’s recommendation (make them nullable), as that would go against the principle that I should declare my intent to the compiler and other developers. Besides, as I use the values to populate my POCOs for “database” storage and they are required (i.e., non-nullable) there too, I’d merely be kicking the can down the road.

I could add a constructor to enforce their population, but the class represents the payload of a POST request, and the JSON serializer needs a parameterless constructor. Further, although the properties do indeed represent required values, as we’re dealing with an external system – in this case, the client – their absence is a genuine possibility: The client may simply send a crappy payload 😊 Further still, if/when this happens, I can’t just have the serializer throw an exception and ignore the request – I need my validation engine to process it and formulate an appropriate response.

So how do you square this circle?

This is what I did do (for the first pass):

public class EmployeeRequest
{
    ...

    public string FirstName { get; set; } = null!;

    public string LastName { get; set; } = null!;

    ...
}

The ‘null!’ syntax effectively “forcefully” sets the property values to null (suppressing the warning) – use at your own risk. In this case, I accept that hazard and rely on the validation engine to catch any missing values – once validated, the class instance enters a “cleanroom” environment where I assume they’re never null.

In the demo application, I intentionally introduced a validation bug to illustrate the consequences of getting this wrong: The engine allows the value of LastName to be null, and this results in the dreaded NullReferenceException, the very one we’ve striven so hard to avoid, somewhere deep in my code 😊 If you want to see this problem “in action,” pick option 9 (create employee) in the client app, and specify all values except last name.

You can find the corresponding code in the master branch of the repository.

Please note that this represents the approach I adopted for my real-world application.

Note: Dealing with bad data in external systems is a broader topic. When enforcing business rules, for example, refusing to save a new object that violates them is a no-brainer. But what if you, say, end up with offending rows in a database table? You can’t just ignore them. Do you force the user to fix them the next time they try to edit the records? Possibly, but that may be dangerous – they may end up making up missing values, for instance. I’ve never found an entirely satisfactory, universally applicable solution to this dilemma.

Have we been cheating?

Let’s consider another solution to my payload class problem:

public class EmployeeRequest
{
    ...

    public string FirstName { get; set; } = string.Empty;

    public string LastName { get; set; } = string.Empty;

    ...
}

Instead of forcing null references, I assign default values to the properties. From a purely syntactic perspective, it’s great – it satisfies the requirements and prevents the possibility of a NullReferenceException. The problem is that it feels wrong: An empty string isn’t a valid, meaningful default value for first or last name.

But hold on a second! Is 0 a valid, meaningful default value for Department ID? Is Jan 1, 0001 a valid, meaningful default value for Date of Birth? Would 0 (degrees centigrade) represent a valid, meaningful default value for Temperature (if you had such a property)? Hardly. Yet – that’s exactly what you get with value types!

Unlike reference types, value types have natural default values (all bytes set to zero); the fact that they’re natural doesn’t make using them a good idea, in my opinion. In other words, have we been getting away with sloppiness whenever we didn’t properly initialize them? 😊

EF Core to the rescue!

Interestingly enough, EF Core applications face a problem with manifestations similar to those I encountered when dealing with JSON-serialized request payloads, even though it has very different underlying causes. Specifically, while the framework (unlike the JSON serializer) can take advantage of non-parameterless constructors to populate individual discrete values (e.g., first and last name – values in the SQL sense, not necessarily represented by value types), navigation properties present difficulties.

In my demo application, for example, each employee has a mandatory department – the Employee class exposes Department as a navigation property. To faithfully express the design intent, the property doesn’t use a nullable type. (It is Department, not Department?.) However, the fact that each employee works for a department doesn’t mean the property will always be populated. By default, it won’t, in fact – not unless you explicitly request it using the Include clause:

context.Employees.Include(employee => employee.Department)

If you don’t request it, the required Department property will be null. Any bells ringing? 😊

(Obviously, this is hypothetical – the demo app doesn’t use EF Core, so it doesn’t directly apply.)

The EF Core tutorial on nullable reference types suggests a solution that I find clever, and I decided to adopt a variant of it for my request models. Here’s a snippet of my modified EmployeeRequest class (see the ManualPropertyImplementation branch of the repo):

public class EmployeeRequest
{
    ...

    private string? lastName = null;

    private DateTime? dateOfBirth = null;

    ...

    public string LastName
    {
        get => this.lastName ??
            throw new InvalidOperationException(
                $"The value of {nameof(this.LastName)} is not set.");
        set => this.lastName = value;
    }

    [JsonIgnore]
    public bool IsLastNameSet => this.lastName != null;

    public DateTime DateOfBirth
    {
        get => this.dateOfBirth ??
            throw new InvalidOperationException(
                $"The value of {nameof(this.DateOfBirth)} is not set.");
        set => this.dateOfBirth = value;
    }

    [JsonIgnore]
    public bool IsDateOfBirthSet => this.dateOfBirth.HasValue;

    ...
}

Each property uses a nullable backing field whose value initially gets set to null, but the getter will never return null. Instead, if you try to get the property value when it isn’t available, an exception gets thrown at that point. The downside is that you can’t test the property value for null – you’ll get the exception if you try and it is null. Because the validation engine (my improvised implementation, that is) needs the ability to perform this test, I added the Is***Set properties instead.

The pattern also works in value types. As a result, I was able to introduce additional validation logic to enforce the presence of both DepartmentId and DateOfBirth.

Of particular interest are the implications this has for the validator bug, the missing validation of last name:

It still leads to an exception (obviously) – a bug is a bug! – but…
It fails much earlier – at the point where you try to dereference/assign the missing value, copy it from the request model to the POCO, as opposed to somewhere deep down, in a seemingly random place
The exception has a useful, informative message (“The value of LastName is not set.”)

Note: The fact that the exception message is informative is, in my view, significant. I believe that the complete and utter uselessness of NullReferenceException (and its message) is a big part of the reason why we hate null references so much.

Don't repeat yourself

I’m sure you can appreciate that the approach shown will involve a lot of repetition. I hate that, which is why I’ve introduced two structs: Required (for reference types) and RequiredValue (for value types). Here’s what our beloved class looks like now (see the RequiredNullableStructs branch of the repo):

public class EmployeeRequest
{
    public RequiredValue<int> DepartmentId { get; set; }

    public Required<string> FirstName { get; set; }

    public Required<string> LastName { get; set; }

    public RequiredValue<DateTime> DateOfBirth { get; set; }

    public DateTime? DateOfDeath { get; set; }
}

Both structs support implicit conversion to and from the underlying type, so this will work (without compiler warnings):

Required<string> requiredFirstName = "Charles";
string firstName = requiredFirstName; // firstName == "Charles"

This, too, will compile, but it will throw an exception:

Required<string> requiredLastName = null;
string lastName = requiredLastName; // InvalidOperationException thrown

The structs also expose the HasValue property, which the modified validation engine uses.

The one limitation of this approach, in comparison with the previous one, is that, unfortunately, I can’t think of a straightforward way to preserve the informative nature of the exception message – it is now completely generic. However, it does still fail early – and that alone is valuable. Furthermore, the fact that I can’t achieve something easily for a demo app doesn’t mean it can’t be done – read on!

A little thought experiment

I believe you could integrate something like this into the C# language itself. This integration would, in my opinion, somewhat resemble that of nullable value types (available since C# 2) – just as ‘int?’ is merely “syntactic sugar” for ‘Nullable<int>,’ why couldn’t, say, ‘string!’ become a shortcut for ‘Required<string>’?

Note: Mads Torgersen mentions this notation in the post in which he introduces nullable reference types. He talks about it as a candidate notation for non-nullable values that he and his team considered and rejected. Consequently, it doesn’t currently exist, and I’m floating the possibility of using it for a different purpose.

Here’s how I envision it might work…

string! requiredName = "Johnny Cash";           // shortcut for Required<string>
string name = requiredName;                     // implicit cast to non-nullable string
string? optionalName = (string?)requiredName;   // explicit cast to nullable string
bool isNull = requiredName is null;             // isNull == false

…contrast with:

string! requiredName = null;                    // shortcut for Required<string>
string name = requiredName;                     // exception thrown
string? optionalName = (string?)requiredName;   // optionalName is null
bool isNull = requiredName is null;             // isNull == true

(It would work the exact same way in value types.)

Now, this may be the most crucial point about this hypothetical construct:

You may assign a null reference to a required-nullable value, but a required-nullable value will never evaluate as null.

(The one exception would be the explicit cast to a nullable type – see above.)

Consequently, you could safely (i.e., without a compiler warning) assign required-nullable values to non-nullable variables, fields, properties, and parameters. Should the required-nullable value be null, it would at least fail (with an exception) immediately – at the point where you attempted to dereference it. I think this might – in some circumstances – represent a slightly better, safer alternative to constructs such as ‘null!’ and ‘!.’.

In summary, here’s the intent I think this would help me express to the compiler and other developers:

I require this value, but I know it may be null. I accept the responsibility to validate it before using it. If I don’t do my homework, please, slap me with an exception at the earliest opportunity.

Hopefully, it would be possible to make the exception and its message informative. (What I mean by ‘informative’ mostly boils down to one thing: It would help if it indicated the name of the offending variable, field, or property.)

Note: If this did get integrated into the language, many limitations of my structs-only approach could be eliminated. For example, in my implementation, you can’t use the ‘is null’ operator (or ‘== null’) – you have to rely on the HasValue property. Also, standard components, such as the built-in JSON serializer or EF Core, could support it natively.

Although this feature doesn’t exist, I still show it in code (which doesn’t compile) – take a look at the HypotheticalLanguageConstruct branch of the repo. Here’s what the EmployeeRequest class looks like:

public class EmployeeRequest
{
    public int! DepartmentId { get; set; }

    public string! FirstName { get; set; }

    public string! LastName { get; set; }

    public DateTime! DateOfBirth { get; set; }

    public DateTime? DateOfDeath { get; set; }
}

Potential benefits

I believe the feature might increase the expressiveness of the C# language.

Depending on the specifics of its (imaginary) implementation, it might exhibit identical or almost identical behavior in value and reference types, which would further bridge the gap between them.

I don’t think it would lead to a backward incompatibility, nor do I see a need for a change to the runtime. As far as I can tell, this could be one of those compiler-only, “syntactic sugar” changes.

Crucially, it would provide additional protection against inadvertent null references. I could be wrong, but I suspect that most real-world C# codebases today will, by necessity, rely on the various overrides (null!, !., etc.) or questionable default values – at least to some extent. I think a feature like this could reduce or eliminate that need.

I know I’m speculating wildly now, but I believe this could also enable the introduction of stricter compiler settings where the use of these overrides would be banned. (And maybe you could even elevate the current warnings to errors.)

Conclusion

I love nullable reference types, and I think that a handful of accompanying runtime checks, such as those that the hypothetical constructs described in this post would enable, might tie the few remaining loose ends and make the functionality even more powerful.

JON'S MUSINGS