← Journal May 13, 2026

The "Global Spanish" problem and why Mexican copy needs to sound Mexican.

Two weeks ago, we asked ChatGPT to recommend aesthetic medicine clinics near Tijuana for a Californian patient. The response came back in Spanish. It started fine. Three paragraphs in, it suggested the patient verify the clinic was certified by SECPRE, mentioned prices in euros with comma decimals, and closed by advising her to consult her public-system primary-care physician. SECPRE is the Spanish Society of Plastic Surgery. In Madrid. The patient lives in Chula Vista and wants to cross to Tijuana, not to Barajas airport. The model had written a competent paragraph for the wrong market, in the right language.

That's the problem this piece explains. And, toward the end, why for a Mexican clinic it's the best opportunity of the next two years.

"Neutral" Spanish doesn't exist. The model invents it.

Spanish has about 600 million speakers, according to the El español en el mundo 2024 yearbook from the Instituto Cervantes. Mexico alone contributes 120+ million. Spain, 47. Latin America together approaches 480. The language lives, demographically, west of the Atlantic.

The corpora that train the models don't reflect that. They reflect something else: how much institutional, academic, and editorial digital presence each country has. And there Spain weighs much more than its population suggests. Spanish Wikipedia is largely edited from Madrid and Barcelona. The RAE publishes from Madrid. The big Spanish-language academic publishers are in Spain. The result: the model, when answering "in Spanish," answers by default in Peninsular variant, with Peninsular vocabulary, Spanish legal norms, and references to Spanish institutions.

It's not a hypothesis. It was measured.

In February 2026, Yoshifumi Kawasaki published Digital Linguistic Bias in Spanish: Evidence from Lexical Variation in LLMs (arXiv:2602.09346). He examined 900+ lexical items across the 21 national varieties of Spanish using an expert-curated database. Central finding: the varieties best recognized by models are Peninsular Spanish and, second, Mexican-Central American. Chilean was the worst recognized. A more interesting side finding: the differences in digital-resource volume country by country don't fully explain the pattern. There's a structural bias beyond "there's more text from Spain."

A group from the Universidad Politécnica de Madrid arrived at something similar by a different route. Martínez, Mayor-Rocher, Pozo Huertas, Melero, Grandury, and Reviriego published in September 2025 Spanish is not just one: A dataset of Spanish dialect recognition for LLMs in Data in Brief (vol. 63, art. 112088). They built 30 multiple-choice questions validated by three linguists to detect which variety of Spanish a model uses by default. One of the questions, verbatim: "Which sounds more natural? A. Llegas tarde, vístete y corre. B. Llegas tarde, vístete y córrele." Option A is Peninsular and Chilean. B is Mexican. Most models answered A without role-play instructions. That small enclitic "le" that Mexicans add without thinking — córrele, ándale, mírale — sounds wrong to the model by default.

There's a third work worth citing: Muñoz-Basols, Palomares Marín, and Moreno Fernández coined the term Sesgo Lingüístico Digital in Lengua y Sociedad (UNMSM journal, Peru). Their thesis: the uneven distribution of Spanish varieties in training corpora produces responses that ignore dialectal varieties and specific sociocultural contexts. Structural, not anecdotal.

What it looks like in a real query

A prospective patient asks, in Spanish, "what do I need to know about liposuction?" A real version of the model's response — not invented, seen in tests we ran three weeks ago with a client surgeon — contained:

Four sentences. Four messages of "this text was not written for you." A Mexican patient reading that response realizes, consciously or not, that the model is talking about a different market. An American crossing to Tijuana gets even more confused.

And the worst part: if your Tijuana clinic is one of the sources the model could have cited but wasn't written in explicitly Mexican Spanish, it passes you over and prefers the Madrid site that is. Not because it's better. Because the model finds it sounds more "Spanish" than yours does.

The short catalog of vocabulary that matters

It's not just "vosotros versus ustedes." It's operational vocabulary in the sectors we work in. A short, non-exhaustive list of the words where the model defaults to Spain and which costs a Mexican clinic or real-estate operator money:

And the pronouns. Ustedes, never vosotros. Usted or depending on register, never Argentine voseo. Obvious to any Mexican. Not obvious to the model filling the gap with whatever it has on hand.

The regulatory part, where it becomes dangerous

In aesthetic medicine and medical tourism, a model that gets the country wrong invents institutions that don't apply. We've seen it in three separate audits:

In Mexican real estate, the consequences are even more concrete. An American buyer in Rosarito needs to understand the bank fideicomiso (because the coastal restricted zone is regulated by Article 27 of the Constitution and the Foreign Investment Law, which requires a fideicomiso for foreigners in the 50-km coastal strip). If the model improvises a Peninsular response, it tells them about the Spanish NIE, the IBI, the Catalan regime, the Spanish Mortgage Law. All correct in Madrid. All useless — and potentially confusing enough to kill a sale — in Rosarito.

The Search Engine Land team named it the "Global Spanish problem" in March 2026. Good name. Better than "geographic lexical bias." We're adopting it.

What didn't work for us

When we started auditing content for Mexican clients seven months ago, the first thing we tried was adding a disclaimer to the site header: "This site is optimized for the Mexican market." We thought the model would read it and adjust. Wasted time. The model doesn't work like a human reading a sign; it grabs page fragments, weighs them by density and lexical authority, and composes a response. A header disclaimer didn't move the needle. We confirmed it by measuring citations before and after on five test queries.

Second thing we tried: translating content into Madrid Spanish to "see if it positioned better in Peninsular responses" and also capture that market. Terrible idea. Mexican clinics that wrote in Peninsular Spanish — with piso, móvil, coste, vosotros — ended up disappearing from responses to explicitly Mexican queries, without gaining anything in Spanish ones because their .com.mx domains and geographic signals still pointed to Mexico. They landed in no man's land. Three months to reverse it.

The lesson, written big: the model reads text and geographic signal together. If one contradicts the other, the model ignores you. Coherence or silence.

The opportunity: be aggressively Mexican

Here's the inversion. If the model defaults to Peninsular when content is neutral, then writing explicitly Mexican content wins visibility by default on any query with Mexican intent. And most Mexican professional-services sites we see are written in a neutral, washed variety, written by copywriters who learned to "avoid regionalisms" to "reach Latin markets." That was prudent in 2018 for Google. In 2026 for ChatGPT it's shooting yourself in the foot.

What we recommend as a minimum spec for regionally explicit copy for a Mexican service page aiming to be cited by a model when the query is Mexican:

None of this is expensive. It's an editorial decision. It costs what it costs to sit down and rewrite the service pages with regional discipline. An afternoon per service if the writer knows what they're doing. A week if they're learning.

Why the window closes fast

The models are improving. GPT-4o already distinguishes Spanish varieties better than GPT-3.5, per recent work in the NLP variation-and-dialects community. In two years, probably, the models will infer geography precisely enough that neutral Mexican content is enough for them. Not today. Today there's a window where writing explicitly Mexican wins citations by a high margin, because there's very little Mexican professional content that does it deliberately.

Any Mexican clinic or real-estate firm that gets serious about this between 2026 and 2027 enters the gap before it closes. Anyone who waits until it's obvious arrives when we're already competing against everyone else who also figured it out. It's the same curve as 2010 SEO with long-tail keywords, except this time the lever isn't ranking, it's the citation.

It promises nothing in thirty days. These changes show up in citation metrics in six to twelve weeks, like the rest of the GEO tactics. What's remarkable here is that the ceiling — the percentage of Mexican responses your site can win against generic content — is high, because real competition is very low. Still.

The model isn't going to learn on its own that your clinic is in Tijuana and not in Toledo. You have to write it. In words only used on your side of the Atlantic.