← Journal May 13, 2026

The "Global Spanish" problem and why Mexican copy needs to sound Mexican.

By XALA Studio

Two weeks ago, we asked ChatGPT to recommend aesthetic medicine clinics near Tijuana for a Californian patient. The response came back in Spanish. It started fine. Three paragraphs in, it suggested the patient verify the clinic was certified by SECPRE, mentioned prices in euros with comma decimals, and closed by advising her to consult her public-system primary-care physician. SECPRE is the Spanish Society of Plastic Surgery. In Madrid. The patient lives in Chula Vista and wants to cross to Tijuana, not to Barajas airport. The model had written a competent paragraph for the wrong market, in the right language.

That's the problem this piece explains. And, toward the end, why for a Mexican clinic it's the best opportunity of the next two years.

"Neutral" Spanish doesn't exist. The model invents it.

Spanish has about 600 million speakers, according to the El español en el mundo 2024 yearbook from the Instituto Cervantes. Mexico alone contributes 120+ million. Spain, 47. Latin America together approaches 480. The language lives, demographically, west of the Atlantic.

The corpora that train the models don't reflect that. They reflect something else: how much institutional, academic, and editorial digital presence each country has. And there Spain weighs much more than its population suggests. Spanish Wikipedia is largely edited from Madrid and Barcelona. The RAE publishes from Madrid. The big Spanish-language academic publishers are in Spain. The result: the model, when answering "in Spanish," answers by default in Peninsular variant, with Peninsular vocabulary, Spanish legal norms, and references to Spanish institutions.

It's not a hypothesis. It was measured.

In February 2026, Yoshifumi Kawasaki published Digital Linguistic Bias in Spanish: Evidence from Lexical Variation in LLMs (arXiv:2602.09346). He examined 900+ lexical items across the 21 national varieties of Spanish using an expert-curated database. Central finding: the varieties best recognized by models are Peninsular Spanish and, second, Mexican-Central American. Chilean was the worst recognized. A more interesting side finding: the differences in digital-resource volume country by country don't fully explain the pattern. There's a structural bias beyond "there's more text from Spain."

A group from the Universidad Politécnica de Madrid arrived at something similar by a different route. Martínez, Mayor-Rocher, Pozo Huertas, Melero, Grandury, and Reviriego published in September 2025 Spanish is not just one: A dataset of Spanish dialect recognition for LLMs in Data in Brief (vol. 63, art. 112088). They built 30 multiple-choice questions validated by three linguists to detect which variety of Spanish a model uses by default. One of the questions, verbatim: "Which sounds more natural? A. Llegas tarde, vístete y corre. B. Llegas tarde, vístete y córrele." Option A is Peninsular and Chilean. B is Mexican. Most models answered A without role-play instructions. That small enclitic "le" that Mexicans add without thinking — córrele, ándale, mírale — sounds wrong to the model by default.

There's a third work worth citing: Muñoz-Basols, Palomares Marín, and Moreno Fernández coined the term Sesgo Lingüístico Digital in Lengua y Sociedad (UNMSM journal, Peru). Their thesis: the uneven distribution of Spanish varieties in training corpora produces responses that ignore dialectal varieties and specific sociocultural contexts. Structural, not anecdotal.

What it looks like in a real query

A prospective patient asks, in Spanish, "what do I need to know about liposuction?" A real version of the model's response — not invented, seen in tests we ran three weeks ago with a client surgeon — contained:

"consult your médico de cabecera" (a concept that doesn't exist in those terms in Mexico; in Mexico there's the IMSS family doctor, the private physician, the specialist — never "cabecera")
"make sure the clinic has the official titulación" (Peninsular vocabulary; in Mexico we say cédula profesional and council certification)
"verify the surgeon is colegiado" (in Spain there are medical colleges; in Mexico the equivalent is the SSA registry plus certification by the corresponding Mexican Council — for plastic surgery, the Consejo Mexicano de Cirugía Plástica, Estética y Reconstructiva)
"the cost is usually around 4.000-5.000 €" (wrong currency, wrong number format: in Mexico we write $50,000 MXN with a decimal point, not 4.000 with a thousands comma)

Four sentences. Four messages of "this text was not written for you." A Mexican patient reading that response realizes, consciously or not, that the model is talking about a different market. An American crossing to Tijuana gets even more confused.

And the worst part: if your Tijuana clinic is one of the sources the model could have cited but wasn't written in explicitly Mexican Spanish, it passes you over and prefers the Madrid site that is. Not because it's better. Because the model finds it sounds more "Spanish" than yours does.

The short catalog of vocabulary that matters

It's not just "vosotros versus ustedes." It's operational vocabulary in the sectors we work in. A short, non-exhaustive list of the words where the model defaults to Spain and which costs a Mexican clinic or real-estate operator money:

departamento (MX) / piso (ES) — critical for real estate. An American buyer asking about a "piso in Rosarito" gets answers mixed with the Spanish housing market.
celular (MX) / móvil (ES) — for forms, CTAs, contact sheets.
computadora (MX) / ordenador (ES) — in any technical copy.
auto / carro (MX) / coche (ES) — medical tourism, where to park, how to get there.
renta (MX) / alquiler (ES) — real estate, contracts.
camión or autobús (MX) / bus or autobús (ES, both) — arrival instructions.
estacionar (MX) / aparcar (ES) — arrival instructions, once more.
alberca (MX) / piscina (ES) — property descriptions.
escuela / kínder / secundaria / prepa (MX) / colegio / instituto (ES) — neighborhood guides.
costo (MX) / coste (ES) — pricing pages.
agencia inmobiliaria or just inmobiliaria (MX) / both exist in ES with a different nuance.

And the pronouns. Ustedes, never vosotros. Usted or tú depending on register, never Argentine voseo. Obvious to any Mexican. Not obvious to the model filling the gap with whatever it has on hand.

The regulatory part, where it becomes dangerous

In aesthetic medicine and medical tourism, a model that gets the country wrong invents institutions that don't apply. We've seen it in three separate audits:

Responses recommending verifying the clinic is accredited by SECPRE (Spanish Society of Aesthetic and Reparative Plastic Surgery) instead of the Mexican CMCPER. SECPRE has no jurisdiction or opinion over clinics in Tijuana.
Responses suggesting consulting AEMPS (the Spanish medicines agency) regulation instead of COFEPRIS, the Mexican health authority.
Responses mentioning the Spanish CNMV or Hacienda when discussing deductible medical costs, when for a Mexican patient the relevant body is SAT and for an American it's the IRS rules on medical-tourism deductions.

In Mexican real estate, the consequences are even more concrete. An American buyer in Rosarito needs to understand the bank fideicomiso (because the coastal restricted zone is regulated by Article 27 of the Constitution and the Foreign Investment Law, which requires a fideicomiso for foreigners in the 50-km coastal strip). If the model improvises a Peninsular response, it tells them about the Spanish NIE, the IBI, the Catalan regime, the Spanish Mortgage Law. All correct in Madrid. All useless — and potentially confusing enough to kill a sale — in Rosarito.

The Search Engine Land team named it the "Global Spanish problem" in March 2026. Good name. Better than "geographic lexical bias." We're adopting it.

What didn't work for us

When we started auditing content for Mexican clients seven months ago, the first thing we tried was adding a disclaimer to the site header: "This site is optimized for the Mexican market." We thought the model would read it and adjust. Wasted time. The model doesn't work like a human reading a sign; it grabs page fragments, weighs them by density and lexical authority, and composes a response. A header disclaimer didn't move the needle. We confirmed it by measuring citations before and after on five test queries.

Second thing we tried: translating content into Madrid Spanish to "see if it positioned better in Peninsular responses" and also capture that market. Terrible idea. Mexican clinics that wrote in Peninsular Spanish — with piso, móvil, coste, vosotros — ended up disappearing from responses to explicitly Mexican queries, without gaining anything in Spanish ones because their .com.mx domains and geographic signals still pointed to Mexico. They landed in no man's land. Three months to reverse it.

The lesson, written big: the model reads text and geographic signal together. If one contradicts the other, the model ignores you. Coherence or silence.

The opportunity: be aggressively Mexican

Here's the inversion. If the model defaults to Peninsular when content is neutral, then writing explicitly Mexican content wins visibility by default on any query with Mexican intent. And most Mexican professional-services sites we see are written in a neutral, washed variety, written by copywriters who learned to "avoid regionalisms" to "reach Latin markets." That was prudent in 2018 for Google. In 2026 for ChatGPT it's shooting yourself in the foot.

What we recommend as a minimum spec for regionally explicit copy for a Mexican service page aiming to be cited by a model when the query is Mexican:

Use ustedes consistently. Never vosotros. Personal tú works for close tone; vos works for Argentine clients but not for the MX market.
Use Mexican operational vocabulary wherever there's a choice: departamento, renta, celular, costo, alberca, prepa. It's the difference between the model identifying you as "citable Mexican source" and "neutral text interchangeable with any other."
Cite Mexican institutions explicitly and link them. SSA, COFEPRIS, CMCPER, SAT, SRE, IMSS, ANTAD, AMPI (Mexican Association of Real Estate Professionals), by sector. Every link is a signal that your site knows what country it's standing in.
Name specific cities and neighborhoods. Not "Mexico" in the abstract. Tijuana, Zona Río, Playas, Rosarito, Ensenada, Polanco, Guadalajara, Zapopan. That geographic specificity serves the model as a pin on the map.
Number format $50,000 MXN with decimal point and thousands comma. Never European format. For US audiences, adding parity (~$2,500 USD) helps and doesn't hurt.
When citing international medical literature — because you should — balance it with at least one Mexican or Latin American source: an SSA clinical guideline, a Mexican medical society consensus, an article in Cirugía Plástica Ibero-Latinoamericana.
If the site also serves the binational crossing — Tijuana serving San Diego, Guadalajara serving Texas — an explicit parallel English page for that audience, not a literal translation. The model reads both as complementary and learns that your clinic speaks two markets.

None of this is expensive. It's an editorial decision. It costs what it costs to sit down and rewrite the service pages with regional discipline. An afternoon per service if the writer knows what they're doing. A week if they're learning.

Why the window closes fast

The models are improving. GPT-4o already distinguishes Spanish varieties better than GPT-3.5, per recent work in the NLP variation-and-dialects community. In two years, probably, the models will infer geography precisely enough that neutral Mexican content is enough for them. Not today. Today there's a window where writing explicitly Mexican wins citations by a high margin, because there's very little Mexican professional content that does it deliberately.

Any Mexican clinic or real-estate firm that gets serious about this between 2026 and 2027 enters the gap before it closes. Anyone who waits until it's obvious arrives when we're already competing against everyone else who also figured it out. It's the same curve as 2010 SEO with long-tail keywords, except this time the lever isn't ranking, it's the citation.

It promises nothing in thirty days. These changes show up in citation metrics in six to twelve weeks, like the rest of the GEO tactics. What's remarkable here is that the ceiling — the percentage of Mexican responses your site can win against generic content — is high, because real competition is very low. Still.

The model isn't going to learn on its own that your clinic is in Tijuana and not in Toledo. You have to write it. In words only used on your side of the Atlantic.