Announcing yaml v2 for Go

There were a few long standing issues in the yaml.v1 package which were being postponed so that they could be done at once in a single incompatible change, and the time has come: yaml.v2 is now available.

Besides these incompatible changes, other compatible fixes and improvements were performed in that push, and those were also applied to the existing yaml.v1 package so that dependent applications benefit immediately and without modifications.

The subtopics below outline exactly what changed, and how to adapt existent code when necessary.

Type errors

With yaml.v1, decoding a YAML value that is not compatible with the Go value being unmarshaled into will silently drop the offending value without notice. In many cases continuing with degraded behavior by ignoring the problem is intended, but this was the one and only option.

In yaml.v2, these problems will cause a *yaml.TypeError to be returned, containing helpful information about what happened. For example:

yaml: unmarshal errors:
  line 3: cannot unmarshal !!str `foo` into int

On such errors the decoding process still continues until the end of the YAML document, so ignoring the TypeError will produce logic equivalent to the old yaml.v1 behavior.

New Marshaler and Unmarshaler interfaces

The way that yaml.v1 allowed custom types to implement marshaling and unmarshaling of YAML content was slightly confusing and troublesome. For example, considering a CustomType with a keys field:

type CustomType struct {
        keys map[string]int
}

and supposing the goal is to unmarshal this YAML map into it:

custom:
    a: 1
    b: 2
    c: 3

With yaml.v1, one would need to implement logic similar to the following for that:

func (v *CustomType) SetYAML(tag string, value interface{}) bool {
        if tag == "!!map" {
                m := value.(map[interface{}]interface{})
                // ... iterate/validate/convert key/value pairs 
        }
        return goodValue
}

This is too much trouble when the package can easily do those conversions internally already. To fix that, in yaml.v2 the Getter and Setter interfaces are both gone and were replaced by the Marshaler and Unmarshaler interfaces.

Using the new mechanism, the example above would be implemented as follows:

func (v *CustomType) UnmarshalYAML(unmarshal func(interface{}) error) error {
        return unmarshal(&v.keys)
}

Custom-ordered maps

By default both yaml.v1 and yaml.v2 will marshal keys in a stable order which is increasing within the same type and arbitrarily defined across types. So marshaling is already performed in a sensible order, but it cannot be customized in yaml.v1, and there’s also no way to tell which order the map was originally in, as some applications require.

To fix that, there is a new pair of types that support preserving the order of map keys both when marshaling and unmarshaling: MapSlice and MapItem.

Such an ordered map literal would look like:

m := yaml.MapSlice{{"c", 3}, {"b", 2}, {"a", 1}}

The MapSlice type may be used for variables going in and out of the yaml package, or in struct fields, map values, or anywhere else sensible.

Binary values

Strings in YAML must be valid UTF-8 or UTF-16 (with a byte order mark for the latter), and for binary data the specification defines a standard !!binary tag which represents the raw data encoded (encrypted?) as base64. This is now supported both in yaml.v1 and yaml.v2, transparently. That is, any string value that is not valid UTF-8 will be base64-encoded and appropriately tagged so that it roundtrips as the same string. Short strings are inlined, while long ones are automatically broken into several lines and represented in a proper style. For example:

one: !!binary gICA
two: !!binary |
  gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgI
  CAgICAgICAgICAgICAgICAgICAgICAgICAgICA

Multi-line strings

Any string that contains new-line characters (‘\n’) will now be encoded using the literal style. For example, a value that would before be encoded as:

key: "line 1\nline 2\nline 3\n"

is now encoded by both yaml.v1 and yaml.v2 as:

key: |
  line 1
  line 2
  line 3

Other improvements

Besides these major changes, some assorted minor improvements were also performed:

  • Better handling of top-level “null”s (issue #35)
  • Marshal base 60 floats quoted for YAML 1.1 compatibility (issue #34)
  • Better error on invalid map keys (issue #25)
  • Allow non-ASCII characters in plain strings (issue #11).
  • Do not catch unrelated panics by mistake (commit a6dc653f)

For obtaining the yaml.v1 improvements:

go get -u gopkg.in/yaml.v1

For updating to yaml.v2, adapt the code as necessary given the points above, replace the import path, and run:

go get -u gopkg.in/yaml.v2

This entry was posted in Architecture, Design, Go, Project, Snippet. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *