Tuesday, February 21, 2012

Json.NET deserialization and initialization in constructors

I've recently run into a quite interesting problem when using Json.NET library. It shows up with a static lookup collection being modified during the deserialization of some objects. Although the behavior I've encountered is documented, but for me it is breaking a principle of least astonishment a bit, so I've decided to share.

I have a class, TestClass, that I'm going to serialize and deserialize using Json.NET. It contains a simple collection of string values, that is initialized in the constructor to contain some predefined values - it's is default state. I have these values defined somewhere in the separate class in a collection marked as readonly, treated like a constant, not supposed to be modified.

Here are the tests (written in Machine.Specifications) that illustrates the issue. I'm setting TestClass state in the constructor, but I expect it to be overwritten during the deserialization, as my JSON string contains different data. In fact, deserialized values are appended to the existing collection, which occurs to be exactly the same collection as my "constants".

public static class Constants
public static readonly IList<string> NotSupposedToBeModified = new List<string>()
"the first",
"the last"

public class TestClass
public IEnumerable<string> TheCollection { get; set; }

public TestClass()
TheCollection = Constants.NotSupposedToBeModified;

public class DeserializingTest
Because of = () =>
result = JsonConvert.DeserializeObject<TestClass>(@"{""TheCollection"":[""other""]}");

It should_deserialize_the_collection_correctly = () =>

It should_not_modify_the_constant_collection = () =>
Constants.NotSupposedToBeModified.ShouldContainOnly("the first", "the last");

static TestClass result;

What is the result? Both tests failed:

should deserialize the collection correctly : Failed
Should contain only: { "other" }
entire list: { "the first", "the last", "other" }

should not modify the constant collection : Failed
Should contain only: { "the first", "the last" }
entire list: { "the first", "the last", "other" }

There are three separate issues that showed up together and resulted that apparently surprising behavior:

The first one is about the constant collection defined in Constants class, being not really constant. The readonly keyword guarantees that one can not replace the collection instance, but that already-created collection instance itself still can be modified normally. It's pretty clear, but can be missed at the first sight.

The second one is even more obvious - the assignment in TestClass's constructor doesn't initialize the local collection with values from the Constants class - it just assigns the reference to exactly the same collection. So, as the assigned collection can be modified and we've just assigned it to our TestsClass instance, we already have the doors open to modify the "constant" collection by mistake.

And finally, what the Json.NET deserializer is doing here? The documentation states: "By default Json.NET will attempt to set JSON values onto existing objects and add JSON values to existing collections during deserialization.". It means that when the collection instance for TheCollection property was already created by the constructor (well, actually not created but "borrowed" from Constants class), Json.NET doesn't create a new one and just appends deserialized values to the existing collection, modifying our NotSupposedToBeModified collection.

Well, the first two pitfalls are pretty easy, but I wouldn't expect the third one. Fortunately, Json.NET provides an easy way to customize its behavior in this matter using ObjectCreationHandling option. One simple addition in DeserializeObject method and we have two green tests (even if the first two issues are still there):

result = JsonConvert.DeserializeObject<TestClass>(
new JsonSerializerSettings() { ObjectCreationHandling = ObjectCreationHandling.Replace });

1 comment:

  1. Nice article, confirmed what I was looking for. I totally aggree with you, it makes no sense that the values are added, maybe it is for avoiding non-settable properties by default. Who knows.

    Takes me a while to figure it out as I was using the serializer in a massively distributed system where the passed array was used as an idem-potency marker used in rare cases when transient errors occures... Was an hell of ride to tackle this simple case down :).

    As always, don't go forward until an issue is well understood and solved or you will definitely regret it in production. :)