Infrastructure-as-code, automation and particularly Terraform is the hot topic at the moment, and rightfully so. There has been some fantastic articles by people far more adept and intelligent than me on how to use Terraform in conjunction with VMware Cloud and the NSX-T networking. However, I thought I’d come at this from a different angle, not using a brand-new or greenfield SDDC, but one already in use and ‘worn in’. In other words, one with VMs and NSX-T rules, services and groups already in-situ.

Back in March, I posted an article on how to get started with Terraform on VMC on AWS. My intention was to spend lots more time with it, and learn my first automation language. Covid hit, work got super busy, and my then 3 month old child for some reason didn’t want Daddy on the computer, selfish huh? So 6 months done the line I’ve had the chance to revisit it. Hopefully the child will be understanding this time….

Before I get started with my findings, I’d like to thank a few people who have helped me somewhat with a lot of this content. Firstly, as always Nico Vibert who I swear is on my beck and call for any random question, and lastly Chris Noon, whose hugely useful article and support has helped me to find some of these answers. Thanks fellas.

So, there are plenty of articles out there (Chris and Nico have all posted ones relatively recently) around using Terraform to build SDDCs, configure rules on the Management Gateway, Compute Gateway, and of course build VMs however I found things weren’t exactly clear cut when I tried against an SDDC that wasn’t brand new. This, for me, is a real world example and is just a way for me to help anyone who may set off down the same path as me and come into some similar problems.

Gotcha #1: Existing rules on MGW & CGW

When using the API to connect to NSX-T, there are some gotchas. Firstly, you need to ensure that the default rules for both the CGW AND MGW are there.

resource "nsxt_policy_gateway_policy" "cgw_policy" {
  category     = "LocalGatewayRules"
  display_name = "default"
  domain       = "cgw"
  
# Default CGW Rules
  rule {
    action                = "DROP"
    destination_groups    = []
    destinations_excluded = false
    direction             = "IN_OUT"
    disabled              = false
    display_name          = "Default VTI Rule"
    logged                = false
    profiles              = []
    scope                 = ["/infra/labels/cgw-vpn",]
    services              = []
    source_groups         = []
    sources_excluded      = false
  }
}

The above bit of code should be the basis for your CGW terraform code. Or so I thought. I soon realised that (at the time of writing)that wasn’t enough! You need to ensure that you have this snippet above your rule definitions

  lifecycle {
    prevent_destroy = true
  }

Essentially all this does to prevent these rules being destroyed when you tear down your changes using a terraform destroy. The same code is required for your mgw too:

resource "nsxt_policy_gateway_policy" "mgw_policy" {
  lifecycle {
    prevent_destroy = true
}
  category     = "LocalGatewayRules"
  display_name = "default"
  domain       = "mgw"

# Default MGW Rules
  rule {
    action                = "ALLOW"
    destination_groups    = []
    destinations_excluded = false
    direction             = "IN_OUT"
    disabled              = false
    display_name          = "vCenter Outbound Rule"
    logged                = false
    profiles              = []
    scope                 = ["/infra/labels/mgw",]
    services              = []
    source_groups         = ["/infra/domains/mgw/groups/VCENTER",]
    sources_excluded      = false
  }

  rule {
    action                = "ALLOW"
    destination_groups    = []
    destinations_excluded = false
    direction             = "IN_OUT"
    disabled              = false
    display_name          = "ESXi Outbound Rule"
    logged                = false
    profiles              = []
    scope                 = ["/infra/labels/mgw",]
    services              = []
    source_groups         = ["/infra/domains/mgw/groups/ESXI",]
    sources_excluded      = false
  }
}

Gotcha #2 NSX-T items added in the GUI need to be referenced by SID not by name

So when you’re writing a new rule for either the CGW or MGW that doesn’t already exist in your NSX-T config, then you will typically write something along these lines. The below would allow HTTPS outbound

  rule {
    action                = "ALLOW"
    destination_groups    = []
    destinations_excluded = false
    direction             = "IN_OUT"
    disabled              = false
    display_name          = "Outbound HTTP / HTTPS"
    logged                = false
    profiles              = []
    scope                 = ["/infra/labels/cgw-public",]
    services              = ["/infra/services/HTTPS",]
    source_groups         = ["/infra/domains/cgw/groups/HTTPS_VMs",]
    sources_excluded      = false
  }

The key bit here, is you can reference a source_group by it’s display name somewhere else in your code. However, if that group exists in your NSX-T configuration already, you can only reference it by its SID name, so it’ll look a bit more like this:

  rule {
    action                = "ALLOW"
    destination_groups    = []
    destinations_excluded = false
    direction             = "IN_OUT"
    disabled              = false
    display_name          = "Outbound HTTP / HTTPS"
    logged                = false
    profiles              = []
    scope                 = ["/infra/labels/cgw-public",]
    services              = ["/infra/services/HTTPS",]
    source_groups         = ["/infra/domains/cgw/groups/a26gb260-f6hy-11ea-b156-57099d268bz7",]
    sources_excluded      = false
  }

To get hold of that unique ID, VMware have helped you though. Within the GUI, next to each item, whether that be a service, a group, a network segment or individual rules, you can click the 3 dots, and there is an option for ‘Copy path to clipboard’. Saves you a lot of time!

Gotcha #3 Hidden Attributes

This one plagued me for a good while and although I have a workaround in place, its not perfect. So lets say you have a number of rules in place on your CGW already, you know you need to replicate those rules so that you don’t lose them when you run a terraform apply. You write your rule code for HTTPS outbound VMs as above, but when you run a terraform plan to check what changes are going to be made, you are told about an attribute change within your rule. This attribute is ip_version.

As you can see in the image above, terraform is seeing a change to this rule, the ~ tells you that. The attibute that is being added is ip_version, with the value of “IPV4_IPV6”.

Within the GUI of NSX-T there is no option to select ip_version, or the equivalent, and checking the Terraform NSXT provider page, it shows it as being optional. Even to this day, I’m not sure why this does this, however the workaround (Thanks again to Nico for this), is to modify the block at the top of your code relating to either the CGW or MGW to include “ignore_changes”. Sadly, I had to declare each rule individually because I couldn’t find a way to blanket all of them to ignore the ip_version attribute.

resource "nsxt_policy_gateway_policy" "cgw_policy" {
  lifecycle {
    prevent_destroy = true
    ignore_changes = [rule[0].ip_version,
                      rule[1].ip_version,
                      rule[2].ip_version,
                      rule[3].ip_version,
                      rule[4].ip_version,
                      rule[5].ip_version,
                      rule[6].ip_version,
                      rule[7].ip_version,
                      rule[8].ip_version,
                      rule[9].ip_version,
                      rule[10].ip_version,
                      rule[11].ip_version]
  }
  category     = "LocalGatewayRules"
  display_name = "default"
  domain       = "cgw"

rule {
    ....

If anyone has a way around this, then more than happy to be shown a better way!

OK, so those are some of the genuine problems I’ve had. Not many, but still work to do to improve the experience for automation within VMC. What would I like to see in future releases:

  • No longer requiring to run a terraform import to get the existing configuration from your NSXT environment
  • The API key that VMware provide, to not just work for NSXT, but for vCenter too. Although I don’t mind using two different providers, I don’t want to have to use different connectivity methods for a single platform.
  • Sorting some of these frustrating bugs out, around existing rules that exist within your NSXT environment.