If you've ever built a scraper, a dropshipping importer, or a PIM (Product Information Management) system, you know the absolute nightmare of dealing with unstructured product data.
You scrape a supplier's website expecting a clean table with sizes and colors, but instead, you get this raw text string:
"Nike Air Max mens sneakers size 42 blue synthetic material"
Or even worse, it's in a foreign language:
"Zapatillas de running Nike Air Max uomo blu taglia 42"
The old way: The Regex Nightmare ❌
Historically, we had to write dozens of regular expressions to catch variations of "Size", "SZ", "Taglia", or map 50 different color names to a standard English list. One typo from the supplier, and the script breaks. Your Shopify catalog ends up with weird tags like Color: blu scuro impermeabile.
The new way: Structured AI Outputs ✅
I got tired of fixing broken parsers, so I built a dedicated backend using Node.js, Express, and GPT-4o-mini with strict JSON schemas.
Instead of searching for keywords, the LLM reads the context, translates everything to standard English, and maps it to specific e-commerce attributes.
If you send the messy text from above, the API returns this exact JSON structure:
json
{
"success": true,
"data": {
"brand": "Nike",
"model": "Air Max",
"category": "sneakers",
"gender": "men",
"size": "42",
"color": "blue",
"material": "synthetic",
"pack_size": null,
"normalized_title": "Nike Air Max sneakers men blue size 42"
}
}
I wrapped it into a public API
Since building the prompt logic, handling LLM latency, and hosting the infrastructure takes a lot of time, I wrapped the whole logic into a plug-and-play API.
If you are building an automated Shopify importer, doing local SEO catalogs, or just formatting messy supplier CSVs with Python or Zapier, you can use it right now.
👉 Check out E-commerce Product Normalizer (AI) on RapidAPI
There is a free tier available (50 calls/month) so you can test it directly in the RapidAPI playground without any commitment.
I'd love to hear your feedback! How do you guys currently handle messy product feeds from clients or suppliers?




