It’s no secret to those who know me that I really dislike Terraform. It’s not because of the technology or the concept, though, but rather because of its execution at a project level. As a funded company, it does a piss poor job at keeping up to date – it’s clearly gotten “too big for its britches”.
Disclaimer : In this series you will find I am very critical of Terraform as a project. All opinions are my own and do not reflect those of my employer.
To that end, this starts a series where I will post all the various workarounds I’ve had to develop to handle the cases where TF simply doesn’t work with Azure (as of the writing of the posts). If you’re only deploying to one cloud, I suggest using that cloud’s deployment mechanism or else you are locking yourself into Terraform’s support of your cloud’s latest features. In the case of Azure, I recommend using Bicep instead of TF. Reasons are multiple:
- Better support from your cloud provider if/when problems arise
- No dependency on a 3p product in your CD pipeline
- Always capable of using the latest features in your cloud
The last one is paramount for me. Time and time again, I run into situations where I am trying to use something new(ish) on Azure and it’s either not supported or requires some janky workaround (read: provisioning via ARM template or direct API) because TF doesn’t have a first-class provider for it. One look at the TF Issues for Azure (2,716 open), AWS (3,648 open), and Google (1,611 open) and you’ll see they are doing an absolutely abysmal job of keeping up with things.
My latest struggle was how to create data sources, skill sets, indexes and indexers on a new AI Search instance.
While you would think this would be pretty straightforward and basic – especially when you consider Azure Search has existed since 2013 – with Terraform, it’s not.
So, let’s get started, shall we?
AI Search component deployment
Relevant GH Issues:
- https://github.com/hashicorp/terraform-provider-azurerm/issues/26144
- https://github.com/hashicorp/terraform-provider-azurerm/issues/26145
- https://github.com/hashicorp/terraform-provider-azurerm/issues/26146
- https://github.com/hashicorp/terraform-provider-azurerm/issues/26243
These will all follow the same basic template and approach. It’s best (cleanest) implemented as a Terraform module which is what I’ll show here.
Concept
Each of these component types have two basic parts:
- A creation & deletion REST API on the Search instance itself.
- A JSON schema defining them.
So, the approach I took was to do the following:
- Create a
templatesdirectory for each type in my repo (e.g.templates/indexes) with.jsonfiles in them whose names match the name of the component to create. For example, if I am creating a skill set namedfoo, I would have a filetemplates/skillsets/foo.jsonin my repo. - These templates use Terraform’s template language to enable replacement of tokens within them, enabling me to reference other Terraform resources I’ve created (e.g. model names, deployment names, etc.)
- During deployment, I use template_dir to transform all files in the relevant type directory to a corresponding
compiled/directory that is.gitignore‘d. - Then,
fileset()enumeration is used in thecompiled/folder to get all the files that were transformed, andfor_eachover them within aterraform_dataresource that useslocal-execprovisionerblocks to executecurlcommands toPOSTtheir content to the corresponding endpoint on the Search resource. Additionally, it usesDELETErequests to delete them ontf destroy– an added benefit of the approach.
Here’s what the TF code ends up looking like:
Module definition
variable "search_resource" {
type = object({
name = string,
primary_key = string,
})
}
variable "endpoint" {
type = string
}
variable "template_vars" {
type = map(string)
description = "Inputs to use when transforming template files"
default = {}
}
locals {
template_dir = "${abspath(path.module)}/templates/${var.endpoint}"
destination_dir = "${local.template_dir}/compiled"
}
resource "template_dir" "templates" {
source_dir = local.template_dir
destination_dir = local.destination_dir
vars = var.template_vars
}
resource "terraform_data" "apply_resource" {
for_each = fileset(local.destination_dir, "*.json")
triggers_replace = [template_dir.templates, var.template_vars, var.search_resource.name]
input = {
filename = "${local.destination_dir}/${each.key}"
endpoint = var.endpoint
search_service = var.search_resource.name
api_key = var.search_resource.primary_key
# file name (without extension) is the name of the resource being created
resource_name = element(split(".", basename(each.key)), 0)
}
provisioner "local-exec" {
when = create
interpreter = ["bash", "-c"]
command = "curl -q -X POST https://${self.input.search_service}.search.windows.net/${self.input.endpoint}?api-version=2024-05-01-Preview -H \"Content-Type: application/json\" -H \"api-key: ${self.input.api_key}\" -d @${self.input.filename} > /dev/null"
quiet = true
}
provisioner "local-exec" {
when = destroy
interpreter = ["bash", "-c"]
command = "curl -q -X DELETE https://${self.input.search_service}.search.windows.net/${self.input.endpoint}/${self.input.resource_name}?api-version=2024-05-01-Preview -H \"api-key: ${self.input.api_key}\" > /dev/null"
quiet = true
}
}
Module usage
Datasource deployment
module "search_datasources" {
source = "./modules/search_data_plane"
depends_on = [azurerm_storage_container.items]
endpoint = "datasources"
search_resource = azurerm_search_service.ai_search
template_vars = {
storage_account_id = azurerm_storage_account.storage_account.id
}
}
So, you can see where I create a datasource for each of the containers that I deployed in another Terraform resource (azurerm_storage_container.items) and pass in the storage account’s ID to the template for the datasource. In this case, the template looks like this for one of the datasources:
my-datasource.json
{
"@odata.context": "https://my-search-instance.search.windows.net/$metadata#datasources/$entity",
"@odata.etag": "*",
"name": "my-datasource",
"type": "azureblob",
"credentials": {
"connectionString": "ResourceId=${storage_account_id};"
},
"container": {
"name": "knowledge-base-pdfs",
"query": "PDF"
}
}
Which is where you can see the usage of the template syntax ${storage_account_id} corresponding to the key in the template_vars dictionary I’m passing to the module.
This level of definition in the module allows for the rest of the components to simply reuse the module syntax with different inputs:
module "search_indexes" {
source = "./modules/search_data_plane"
endpoint = "indexes"
search_resource = azurerm_search_service.ai_search
template_vars = {
openai_name = azurerm_cognitive_account.openai.name
embedding_deployment = azurerm_cognitive_deployment.embedding.name
embedding_model = azurerm_cognitive_deployment.embedding.model[0].name
}
}
module "search_skillsets" {
source = "./modules/search_data_plane"
depends_on = [module.search_indexes, azurerm_cognitive_deployment.embedding]
endpoint = "skillsets"
search_resource = azurerm_search_service.ai_search
template_vars = {
openai_name = azurerm_cognitive_account.openai.name
embedding_deployment = azurerm_cognitive_deployment.embedding.name
embedding_model = azurerm_cognitive_deployment.embedding.model[0].name
}
}
module "search_indexers" {
source = "./modules/search_data_plane"
depends_on = [module.search_datasources, module.search_indexes, module.search_skillsets]
endpoint = "indexers"
search_resource = azurerm_search_service.ai_search
}
And you can see, this also affords dependencies between these components. A deployment of an Indexer, for example, will fail if the data sources, indexes, and skillsets in its definition don’t already exist in the search instance. So, we’re able to set up those dependencies by using Terraform’s depends_on syntax.
I’m sure these won’t be the last that I’ll post about, so if you’re using Terraform with Azure, stay tuned 😜
