My first exception caused solely by .Net Native

By | November 6, 2015

In a current project we’re developing a UWP application which, as you may or may not know, *requires* you to compile to .Net Native in order to submit to the store. No biggie, that hasn’t been a problem.

Until now.

We had a tester come to us with the app installed from the store yesterday reporting the app seized up and crashed upon their first click/tap within the app.

“INCONCEIVABLE!” we said.

No matter how hard we tried, Debug mode, Release mode, x86, x64, mobile, tablet, we couldn’t reproduce.

Until we turned on .Net Native tool chain in the project compilation options:

2015-11-06_1039

The second we did this, boom.

It left us scratching our heads but there was one thing we *had* added to the app recently that seemed like it could be the culprit:

SignalR.

We had a backend service which used SignalR to stream data to our app. The SignalR libs we were using are the ones from Microsoft: AspNet.SignalR (available on NuGet).

When we dug in to where and why the crash was happening, it was in this code:

if (_connection != null)
{
    _connection.Stop();

    _connection.Dispose();
    _connection = null;
}

 

turns out – only when in .Net Native – the call to .Stop() caused a NullReferenceException to be thrown from the bowels of the SignalR libs and, since the Stop() call is synchronous, it seized the app while the exception bubbled and percolated through before finally crashing the process.

How did we fix it?

Turns out it’s somewhat of a hack but hey, it works – at least until the SignalR folks on the AspNet team figure out how to play nice with .Net Native:

//Found that when compiled .NET Native that tearing down the connection can take 30 seconds and result in huge exceptions and failures.
//We are going to task this out.
Task.Run(() =>
{
    if (_connection != null)
    {
        try
        {
            //Give the connection 3 seconds to stop. If it doesn't, then just eat all exceptions and move on with life.
            _connection.Stop(TimeSpan.FromSeconds(3)); 
            //_connection.Dispose(); -- Sounds like Dispose is the same as _connection.Stop, no need to double up on the calls.
            //^ Left this here as a tombstone for future people. If there are bugs, feel free to use Dispose, but then you don't get the
            //nice timeout parameter that Stop provides.
        }
        catch
        {
            //When we compile release / .NET native the connection object behaves erraticly and can throw exceptions.
        }
        _connection = null;
    }
    _hubProxy = null;
});

 

The comments basically explain the approach, but to spell it out:

  1. Stop your connection on a background thread/task so the UI isn’t locked while Stop() executes (remember it’s synchronous).
  2. Eat any exceptions that might occur during stop; we at least were going to be throwing the connection away anyway.
  3. Provide a timeout for the Stop() call so it doesn’t try and seize your UI while it’s executing.

After doing this, our app behaved exactly as it needed to with respect to our usage of our SignalR backend and its connection creation & disposal points.

UPDATED 11.9.2015:

After fixing this we noticed that, while the app no longer crashed, it also wasn’t getting the SignalR signals from our backend that it should’ve been – but again, only in .Net Native compilations. We dug in and found the code was halting on

await connection.Start();

so I started some spelunking. I came across a GitHub issue that described this exact scenario and another that seemed to point at the PCL targets as being the problem. Pulling on this thread, I forked the repo and created a UWP-targeted library that uses all the guts from the WinRT project already in the repo. Turns out that when you pull the NuGet project, you get the PCL version – NOT the WinRT version. The SignalR folks could likely fix this by changing the targets of the nuget project that comes out of the WinRT build, but ain’t nobody got time for that.

After creating a UWP lib, everything worked! I submitted this fix as a PR to the project. YOU can get it today by simply targeting my fork containing the fix as a subrepo of your existing solution, then pulling in the UWP project and referencing it from yours. Good luck!