E-commerce Market
—
Data and Content Market
We are used to measuring e-commerce in orders and revenue. But behind every order lies an infrastructure: specifications, descriptions, images, formats, translations, and endless approvals. This page captures the market scale in figures and processes.
How to read this page
This article is intentionally detailed. We do not condense the material into 'a few talking points' because the scale of the problem is only revealed through the combination of figures, data chains, and operational effects.
We are not detailing how solutions and technologies work—that's covered elsewhere. This section outlines the fundamentals: how the data market is structured today and why its current form is failing to scale.
Not an 'opinion,' but the observable mechanics of the market.
Where time is spent and where data is lost.
Why this is a systemic challenge, not a localized one.
The Scale of the Data Market
How Many Participants Are Involved
There are approximately 28 million e-commerce stores operating worldwide. Of these, about 14 million are in the US, and several million are in Europe (e.g., UK ≈ 1.1 million, Germany ≈ 0.7 million, France ≈ 0.6 million).
In addition to retail, millions of manufacturers and suppliers are involved in e-commerce chains. In Europe alone, there are about 2.3 million manufacturing companies potentially supplying product data.
Even if the physical product is the same, digitally it multiplies: different sales channels, different storefront requirements, different languages, and formats.
| Player Type | Typical Catalog |
|---|---|
| Small Business | 100–1 000 SKU |
| Mid-sized retailer | 10 000–100 000 SKU |
| Large retailers | 100 000–500 000 SKU |
| Marketplaces | hundreds of millions of SKUs |
How many information units exist
When accounting for languages, formats, and channels (website, marketplaces, advertising, feeds), the number of unique product information units (SKU × language × format × channel) reaches hundreds of billions of data fragments.
For reference: open catalogs like Icecat contain 25+ million datasheets in 77 languages—this illustrates the scale of multilingual versions.
Content Flow Chain
Where Data Gets Lost
Product information rarely goes directly from the manufacturer to the buyer. The typical chain is: manufacturer → distributor → supplier → store → CMS → marketing.
At every stage, data is transformed, formats change, some information is dropped, and some is manually rewritten. If a manufacturer provided about 20 attributes, often only 10–15 make it to the storefront.
Creates the source data: attributes, SKUs, images. Often—a single language version and internal standards.
Maps data to its own templates, adds fields (stock/codes), loses marketing details, alters the format.
Imports into its structure, adds SEO and categories. Manual work introduces typos, omissions, and inconsistencies.
Require separate feeds and format restrictions. Any mismatch leads to product rejection or errors in the channel.
The Cost of Content as a Process
Manual Labor
In mass e-commerce, processing a single product card without automation typically takes 5–20 minutes, with a direct cost ranging from $1–$5. While some categories are more complex, this range is typical for standard workflows.
Poor data quality leads to measurable losses: 15–25% of revenue is lost due to content; up to 25% of returns are related to unmet expectations.
Small Business vs. Industry Giants
One Problem, Different Capabilities
Large enterprises can dictate formats to suppliers and invest heavily in infrastructure. Small and medium businesses are often forced to adapt to incoming data and limit their assortment due to the inability to process content effectively.
- dictate terms to suppliers
- invest in PIM and integrations
- maintain dedicated data quality teams
- absorb errors through sheer scale
- handles supplier formats
- lacks budget for infrastructure
- reduces assortment due to content limitations
- publishes incomplete listings 'as is'
Suppliers as a systemic bottleneck
The supplier is the content starting point
Suppliers and manufacturers are the primary source of product information: specifications, SKUs, images, packaging, certificates, and technical descriptions. However, the mere existence of data does not mean it is ready for market use: data is rarely structured from the outset to seamlessly traverse the entire chain up to the storefront.
Logistics has established standards and roles (carrier, warehouse, fulfillment). Product data often lacks both standards and a 'network operator': the supplier is forced to act as both the data producer and its integrator—all without the necessary infrastructure.
Diversity of formats and single sources of truth
For the same product range, a supplier often maintains several parallel sources: some data resides in the ERP, some in spreadsheets, some in PDFs, and some in emails and approvals. For retail, this translates into constant 'data gathering,' validation, and manual corrections.
SKUs, stock levels, packaging, partial attributes.
Client-specific templates, manual adjustments.
Marketing descriptions and specifications.
Clarifications, missing photos, exceptions.
Non-standard attributes
Suppliers usually do not align their data to a unified market characteristic dictionary. They provide what they have: their own field names, different units of measurement, and varying levels of detail. Consequently, 'standardization' is effectively performed on the retailer's or marketplace's side.
| Single meaning | As it appears in the data | What retail does |
|---|---|---|
| Color | Color / Colour / Col / ЦВЕТ / Shade | Maps, normalizes |
| Size | Size / Dimensions | Standardizes units and format |
| Material | Material / Composition | Creates value dictionaries |
One supplier — up to 5–10 formats
In practice, a supplier working with numerous partners is forced to maintain **up to 5–10 different templates** and attribute systems. Beyond that point, maintenance costs start growing faster than the benefits—and the supplier either reduces quality or resorts to intermediaries, losing control.
Manual support and infrequent updates.
Risk of errors and desynchronization increases.
Intermediaries emerge, leading to loss of control.
Why a supplier cannot 'adapt to everyone'
The reasons are usually not about 'unwillingness,' but about process economics: supporting multiple formats becomes a separate product in itself. Below are typical limitations.
Data is scattered across various sources ▾
ERPs, price lists, files, catalogs, and communications are rarely consolidated into a single structure—a single 'source of truth' is missing.
Too many category-specific exceptions ▾
Different categories require varying levels of detail and different attributes; a universal 'one-size-fits-all' template doesn't exist.
Limited IT resources and competing priorities ▾
For many suppliers, 'content' is a secondary process relative to manufacturing and sales, meaning it rarely receives systematic investment.
Growing client base multiplies the cost of updates ▾
Every update turns into a chain of emails and manual edits across numerous templates; data desynchronization becomes the norm.
From 'raw material' to storefront—through loss and duplication
Section Summary
Supplier content is the "raw material" of e-commerce. The market treats it as a finished product, even though a layer of transformation lies between the raw material and the storefront: dictionaries, normalization, localization, quality control, and updates.
As long as this layer is missing as infrastructure, every market participant builds it independently—which is why the problem is not solved locally.
- Neutral Data Transformation Layer
- Unified Attribute and Unit Dictionaries
- Automated Updates and Quality Control
What These Figures Show
1) Content is logistics, but without the industry backbone
In e-commerce, money and delivery have long been industrialized. Yet, product data is still moved manually, fragmented, and with losses—across dozens of versions and formats.
2) The problem is systemic—and therefore cannot be solved 'within a single company'
Every player is forced to build their own data transformation layer: attribute mapping, normalization, quality control, localization, feeds. But this doesn't scale at the market level—the work is duplicated across thousands of companies simultaneously.
3) The gap between large and small players is widening
Large players can impose formats on suppliers and invest in infrastructure. Smaller ones are often forced to publish 'as is,' reduce assortment, and lose efficiency due to the inability to process the content flow.
This is why siloed solutions fail to achieve scale, whereas an ecosystem-level approach succeeds.
All figures on this page represent order-of-magnitude estimates and typical ranges based on the international market. While they may vary by country and category, the underlying market mechanics (duplication, losses, manual effort) remain constant.