It’s no secret to those who know me that I really dislike Terraform. It’s not because of the technology or the concept, though, but rather because of its execution at a project level. As a funded company, it does a piss poor job at keeping up to date – it’s clearly gotten “too big for its britches”.
Disclaimer : In this series you will find I am very critical of Terraform as a project. All opinions are my own and do not reflect those of my employer.
To that end, this starts a series where I will post all the various workarounds I’ve had to develop to handle the cases where TF simply doesn’t work with Azure (as of the writing of the posts). If you’re only deploying to one cloud, I suggest using that cloud’s deployment mechanism or else you are locking yourself into Terraform’s support of your cloud’s latest features. In the case of Azure, I recommend using Bicep instead of TF. Reasons are multiple:
- Better support from your cloud provider if/when problems arise
- No dependency on a 3p product in your CD pipeline
- Always capable of using the latest features in your cloud
The last one is paramount for me. Time and time again, I run into situations where I am trying to use something new(ish) on Azure and it’s either not supported or requires some janky workaround (read: provisioning via ARM template or direct API) because TF doesn’t have a first-class provider for it. One look at the TF Issues for Azure (2,716 open), AWS (3,648 open), and Google (1,611 open) and you’ll see they are doing an absolutely abysmal job of keeping up with things.
My latest struggle was how to create data sources, skill sets, indexes and indexers on a new AI Search instance.
While you would think this would be pretty straightforward and basic – especially when you consider Azure Search has existed since 2013 – with Terraform, it’s not.
So, let’s get started, shall we?
AI Search component deployment
Relevant GH Issues:
- https://github.com/hashicorp/terraform-provider-azurerm/issues/26144
- https://github.com/hashicorp/terraform-provider-azurerm/issues/26145
- https://github.com/hashicorp/terraform-provider-azurerm/issues/26146
- https://github.com/hashicorp/terraform-provider-azurerm/issues/26243
These will all follow the same basic template and approach. It’s best (cleanest) implemented as a Terraform module which is what I’ll show here.
Concept
Each of these component types have two basic parts:
- A creation & deletion REST API on the Search instance itself.
- A JSON schema defining them.
So, the approach I took was to do the following:
- Create a
templates
directory for each type in my repo (e.g.templates/indexes
) with.json
files in them whose names match the name of the component to create. For example, if I am creating a skill set namedfoo
, I would have a filetemplates/skillsets/foo.json
in my repo. - These templates use Terraform’s template language to enable replacement of tokens within them, enabling me to reference other Terraform resources I’ve created (e.g. model names, deployment names, etc.)
- During deployment, I use template_dir to transform all files in the relevant type directory to a corresponding
compiled/
directory that is.gitignore
‘d. - Then,
fileset()
enumeration is used in thecompiled/
folder to get all the files that were transformed, andfor_each
over them within aterraform_data
resource that useslocal-exec
provisioner
blocks to executecurl
commands toPOST
their content to the corresponding endpoint on the Search resource. Additionally, it usesDELETE
requests to delete them ontf destroy
– an added benefit of the approach.
Here’s what the TF code ends up looking like:
Module definition
variable "search_resource" { type = object({ name = string, primary_key = string, }) } variable "endpoint" { type = string } variable "template_vars" { type = map(string) description = "Inputs to use when transforming template files" default = {} } locals { template_dir = "${abspath(path.module)}/templates/${var.endpoint}" destination_dir = "${local.template_dir}/compiled" } resource "template_dir" "templates" { source_dir = local.template_dir destination_dir = local.destination_dir vars = var.template_vars } resource "terraform_data" "apply_resource" { for_each = fileset(local.destination_dir, "*.json") triggers_replace = [template_dir.templates, var.template_vars, var.search_resource.name] input = { filename = "${local.destination_dir}/${each.key}" endpoint = var.endpoint search_service = var.search_resource.name api_key = var.search_resource.primary_key # file name (without extension) is the name of the resource being created resource_name = element(split(".", basename(each.key)), 0) } provisioner "local-exec" { when = create interpreter = ["bash", "-c"] command = "curl -q -X POST https://${self.input.search_service}.search.windows.net/${self.input.endpoint}?api-version=2024-05-01-Preview -H \"Content-Type: application/json\" -H \"api-key: ${self.input.api_key}\" -d @${self.input.filename} > /dev/null" quiet = true } provisioner "local-exec" { when = destroy interpreter = ["bash", "-c"] command = "curl -q -X DELETE https://${self.input.search_service}.search.windows.net/${self.input.endpoint}/${self.input.resource_name}?api-version=2024-05-01-Preview -H \"api-key: ${self.input.api_key}\" > /dev/null" quiet = true } }
Module usage
Datasource deployment
module "search_datasources" { source = "./modules/search_data_plane" depends_on = [azurerm_storage_container.items] endpoint = "datasources" search_resource = azurerm_search_service.ai_search template_vars = { storage_account_id = azurerm_storage_account.storage_account.id } }
So, you can see where I create a datasource for each of the containers that I deployed in another Terraform resource (azurerm_storage_container.items) a
nd pass in the storage account’s ID to the template for the datasource. In this case, the template looks like this for one of the datasources:
my-datasource.json
{ "@odata.context": "https://my-search-instance.search.windows.net/$metadata#datasources/$entity", "@odata.etag": "*", "name": "my-datasource", "type": "azureblob", "credentials": { "connectionString": "ResourceId=${storage_account_id};" }, "container": { "name": "knowledge-base-pdfs", "query": "PDF" } }
Which is where you can see the usage of the template syntax ${storage_account_id}
corresponding to the key in the template_vars
dictionary I’m passing to the module.
This level of definition in the module allows for the rest of the components to simply reuse the module
syntax with different inputs:
module "search_indexes" { source = "./modules/search_data_plane" endpoint = "indexes" search_resource = azurerm_search_service.ai_search template_vars = { openai_name = azurerm_cognitive_account.openai.name embedding_deployment = azurerm_cognitive_deployment.embedding.name embedding_model = azurerm_cognitive_deployment.embedding.model[0].name } }
module "search_skillsets" { source = "./modules/search_data_plane" depends_on = [module.search_indexes, azurerm_cognitive_deployment.embedding] endpoint = "skillsets" search_resource = azurerm_search_service.ai_search template_vars = { openai_name = azurerm_cognitive_account.openai.name embedding_deployment = azurerm_cognitive_deployment.embedding.name embedding_model = azurerm_cognitive_deployment.embedding.model[0].name } }
module "search_indexers" { source = "./modules/search_data_plane" depends_on = [module.search_datasources, module.search_indexes, module.search_skillsets] endpoint = "indexers" search_resource = azurerm_search_service.ai_search }
And you can see, this also affords dependencies between these components. A deployment of an Indexer, for example, will fail if the data sources, indexes, and skillsets in its definition don’t already exist in the search instance. So, we’re able to set up those dependencies by using Terraform’s depends_on
syntax.
I’m sure these won’t be the last that I’ll post about, so if you’re using Terraform with Azure, stay tuned 😜