What Happens When a Patient Says "Hey Siri, Find a Med Spa Near Me" — And Your Practice Has No Speakable Schema?
Voice search is the patient discovery surface that almost no independent medical practice has addressed — which makes it simultaneously the most overlooked gap and the most accessible first-mover opportunity. Here is exactly what happens when a patient uses Siri, Google Assistant, or Alexa to find a med spa, what data sources those voice assistants query, and why the absence of one specific schema property — Speakable — makes a practice systematically absent from voice-delivered responses regardless of everything else it has built.
What Siri Actually Does With "Hey Siri, Find a Med Spa Near Me"
When a patient says "Hey Siri, find a med spa near me for Botox," Siri does not search the web the way Google does. It does not retrieve a list of links and rank them by keyword relevance. It queries a combination of three data sources and synthesises a spoken response from them — all within two to three seconds.
The first data source is Google Business Profile data — the same data that powers Google Maps. Siri uses GBP categories, service entries, hours, and location data to identify candidate practices near the patient's current location. A practice with "Medical Spa" as its GBP primary category and "Botox Provider" as a secondary category will surface as a candidate. A practice categorised as "Day Spa" or "Beauty Salon" will often miss this initial filter entirely, regardless of what services it actually offers.
The second data source is LocalBusiness schema on the practice's website — specifically the geo-coordinates, address, phone number, and service information structured in machine-readable format. Siri cross-references GBP data against website schema to verify entity accuracy. A practice whose GBP address matches its LocalBusiness schema address is treated as a more reliable entity than one where the two differ. This is the NAP consistency problem expressed in voice search terms.
The third data source — and the one almost no practice has addressed — is Speakable-tagged content. When Siri identifies a candidate practice and needs to deliver a spoken description of it, it needs to know which content on the practice's website is appropriate to read aloud. Speakable schema provides that signal: it tags specific HTML sections by CSS class name and tells the voice assistant's text-to-speech system "this is the content you should read in response to a query about this practice." Without that schema, Siri has to guess which content to read — and it often defaults to the meta description, which is a 150-character marketing summary rather than the spoken, helpful answer a patient needs to make a booking decision.
The Three-Part Response and Where It Breaks
A complete voice search response for a local med spa query ideally delivers three things: the practice name and location, a spoken description of what the practice offers and what makes it worth visiting, and a call to action (phone number or booking URL). Each of these three parts maps to a specific data source — and the absence of any one source degrades the response.
Name and location comes from GBP. A practice with a correctly configured GBP, accurate address, and consistent NAP will have this part covered. Most practices have adequate GBP setup for basic identity delivery.
Spoken description comes from Speakable-tagged content. This is where almost every practice fails. Without Speakable schema tagging specific content blocks — the procedure descriptions, the practice's value proposition, the FAQ answers about services and pricing — the voice assistant cannot deliver a coherent spoken description. At best it reads the meta description aloud. At worst it delivers the practice name, address, and phone number with no contextual information — a response that does not answer the patient's actual question ("find me a med spa for Botox") and does not give them a reason to call.
Call to action comes from the GBP phone number and booking URL. This part is typically present if the GBP is complete. But it is only delivered after a coherent description — a voice assistant that cannot describe what the practice does is unlikely to deliver a confident call-to-action recommendation.
What Speakable Schema Actually Is (And How to Implement It)
Speakable schema is a schema.org property — part of the JSON-LD structured data standard — that identifies specific sections of a page as suitable for text-to-speech delivery. It works by referencing HTML elements through their CSS class names. The implementation is a two-step process.
Step one: add CSS classes to the HTML elements you want voice assistants to read aloud. On a med spa homepage, this might mean adding class="hero-headline" to your H1 element, class="hero-subtext" to your opening value statement, and class="value-proposition" to your core practice description paragraph. On a service page, it means tagging the procedure description and a key FAQ answer. The specific class names can be anything you choose — they just need to match what you declare in the schema.
Step two: add a Speakable JSON-LD specification to the page's schema block, referencing those CSS class names: "speakable": { "@type": "SpeakableSpecification", "cssSelector": [".hero-headline", ".hero-subtext", ".value-proposition"] }. Google's voice assistant reads the content inside those elements and delivers it as the spoken portion of the voice response.
Google recommends that Speakable content sections contain 20 to 30 seconds of readable content each — roughly two to three direct sentences. The content inside Speakable tags should be written in the inverted pyramid format: the most important information first, in direct language that answers a patient's question rather than describing the practice in marketing terms. "Radiance Med Spa offers Botox, dermal fillers, laser hair removal, and body contouring treatments in Austin, Texas, with medically supervised care and same-week availability" is a good Speakable sentence. "At Radiance, we believe beauty is a journey..." is not.
Voice Search Is Growing Faster in Healthcare Than Almost Any Other Category
Voice search adoption for healthcare queries has accelerated significantly since 2024, driven by two parallel trends: the proliferation of AI-powered voice assistants (Siri's upgrade to LLM-backed responses, Google Assistant's integration with Gemini, Alexa's healthcare-specific capabilities), and the generational shift in healthcare consumer behaviour — younger patients in particular default to voice interfaces for proximity-based queries. "Find a med spa near me" is a proximity query. "What treatments does [practice] offer?" is a service query. Both are now handled by AI-backed voice assistants that synthesise spoken responses from structured data rather than reading websites aloud verbatim.
The competitive reality: fewer than 2% of independent med spas in any given US market have implemented Speakable schema. Most have adequate GBP setup. Almost none have the Speakable layer that completes voice search visibility. This means the first practice in any local market to implement Speakable schema correctly, combined with a complete GBP and consistent LocalBusiness schema, becomes the voice search incumbent for their specialty in that geography. The competition for that position is currently close to zero.
Speakable schema is a standard component of every Iris by AdChoreo implementation — it is included in the schema stack for every page type, with CSS class requirements documented per page in the developer implementation guide. If you want to see whether your current website has Speakable markup (almost certainly not) and where that ranks in your overall AI visibility gaps, the free agentic readiness audit covers it as part of the schema markup dimension score.