Market in Numbers • E-commerce as a Data Market

E-commerce Market

Data and Content Market

We are used to measuring e-commerce in orders and revenue. But behind every order lies an infrastructure: specifications, descriptions, images, formats, translations, and endless approvals. This page captures the market scale in figures and processes.

Key Metrics
order of magnitude
28 million+
e-commerce stores worldwide
14 million
of which in the USA
350–600 million
SKUs managed by market leaders
15–25%
losses due to data issues
A single SKU transforms into dozens of data versions: sales channels, languages, formats, platform requirements.
Page Navigation
The figures are order-of-magnitude estimates. They may vary across markets, but the overall picture remains consistent.

How to read this page

This article is intentionally detailed. We do not condense the material into 'a few talking points' because the scale of the problem is only revealed through the combination of figures, data chains, and operational effects.

We are not detailing how solutions and technologies work—that's covered elsewhere. This section outlines the fundamentals: how the data market is structured today and why its current form is failing to scale.

Approach
Data + Processes

Not an 'opinion,' but the observable mechanics of the market.

Focus
Content Logistics

Where time is spent and where data is lost.

The result
Understanding the Scale

Why this is a systemic challenge, not a localized one.

In the coming years, the cost of publishing a product online will equal, and then surpass, the cost of its physical delivery. Logistics has been optimized for decades, yet content logistics remains manual and fragmented.
Section 1

The Scale of the Data Market

How Many Participants Are Involved

There are approximately 28 million e-commerce stores operating worldwide. Of these, about 14 million are in the US, and several million are in Europe (e.g., UK ≈ 1.1 million, Germany ≈ 0.7 million, France ≈ 0.6 million).

In addition to retail, millions of manufacturers and suppliers are involved in e-commerce chains. In Europe alone, there are about 2.3 million manufacturing companies potentially supplying product data.

World
≈ 28 million
e-commerce sites
USA
≈ 14 million
about 50% of the world
Europe
millions
UK 1,1 / DE 0,7 / FR 0,6
Infographic: Stores by region (approximate)
scale is illustrative
USA
≈ 14 million
Europe
≈ 5–7 million
Rest of the world
≈ 7–9 million
*Europe and the rest of the world represent a rough estimate of distribution, as different sources aggregate websites differently.
The Key Effect of Scale
1 SKU → dozens of data versions

Even if the physical product is the same, digitally it multiplies: different sales channels, different storefront requirements, different languages, and formats.

Infographic: The Formula
1 SKU × 5–10 Channels × 5–20 Languages
= 25–200+ Versions
The same product in digital form is generated dozens of times. A single SKU is rewritten and reformatted at every step of the chain instead of becoming a single source of truth once.
How many SKUs and listings
Player Type Typical Catalog
Small Business 100–1 000 SKU
Mid-sized retailer 10 000–100 000 SKU
Large retailers 100 000–500 000 SKU
Marketplaces hundreds of millions of SKUs
The largest platforms operate on a different scale: market leaders handle 350–600 million SKUs.

How many information units exist

When accounting for languages, formats, and channels (website, marketplaces, advertising, feeds), the number of unique product information units (SKU × language × format × channel) reaches hundreds of billions of data fragments.

For reference: open catalogs like Icecat contain 25+ million datasheets in 77 languages—this illustrates the scale of multilingual versions.

Infographic: Complexity Multipliers
SKU base
Languages 5–20
Sales Channels 5–10
Formats / Requirements many
Result: 25–200+ data versions per product (order of magnitude).
Up to 80% of e-commerce product content is still transferred via Excel and similar formats. In the age of APIs, cloud computing, and artificial intelligence, the market relies on 1990s practices: manual copying, errors, and data loss at every step.
Section 2

Content Flow Chain

Where Data Gets Lost

Product information rarely goes directly from the manufacturer to the buyer. The typical chain is: manufacturer → distributor → supplier → store → CMS → marketing.

At every stage, data is transformed, formats change, some information is dropped, and some is manually rewritten. If a manufacturer provided about 20 attributes, often only 10–15 make it to the storefront.

Infographic: Losses along the chain
Manufacturer
100%
Distributor
80–90%
Supplier
70–80%
CMS
50–70%
Marketing
40–60%
The result: the same product is often manually rewritten 3–5 times by different participants in the chain.
What happens at each stage
Manufacturer

Creates the source data: attributes, SKUs, images. Often—a single language version and internal standards.

Distributor / Supplier

Maps data to its own templates, adds fields (stock/codes), loses marketing details, alters the format.

Store / CMS

Imports into its structure, adds SEO and categories. Manual work introduces typos, omissions, and inconsistencies.

Channels / Marketing

Require separate feeds and format restrictions. Any mismatch leads to product rejection or errors in the channel.

The market has long automated payments and logistics, but has yet to automate data. Payments and warehousing operate like an industry, while product content remains a craft.
Infographics: where meaning is 'lost'
Format Incompatibility high
Manual Edits high
Channel Losses Medium
Qualitative assessment: where errors and omissions occur most frequently.
Section 3

The Cost of Content as a Process

Manual Labor

In mass e-commerce, processing a single product card without automation typically takes 5–20 minutes, with a direct cost ranging from $1–$5. While some categories are more complex, this range is typical for standard workflows.

Scenario A
5 min
$1 / SKU
Scenario B
10 min
$3 / SKU
Scenario C
20 min
$5 / SKU
Infographic: What 1,000 SKUs Mean
A: 5 мин
≈ 83 hour
≈ $1 000
B: 10 мин
≈ 167 hours
≈ $3 000
C: 20 мин
≈ 333 hour
≈ $5 000
And this is just for initial processing. Updates repeat these costs over and over again.
Another scale: manual reconciliation
For small businesses, 8–12 hours of manual work per week for reconciliation and edits is typical. This equates to $10,000–$18,000 in direct time costs annually.
Overhead costs for product publishing become comparable to logistics. Where product logistics are optimized, content logistics often remain manual and uncontrolled.
The Cost of Errors

Poor data quality leads to measurable losses: 15–25% of revenue is lost due to content; up to 25% of returns are related to unmet expectations.

Poor Searchability8–12%
Returns / Misalignments5–8%
Data and Availability Errors5–7%
Bars represent the visualization of shares within total losses (illustrative).
What is consuming time
Data Collection
files / emails
Normalization
formats / units
Quality
checks / corrections
Channels
feeds / rules
Every market participant today is both their own 'power plant' and 'power grid'. In mature industries, there are separate infrastructure providers. In e-commerce content, this layer is almost non-existent—which is why everyone does everything themselves.
Section 4

Small Business vs. Industry Giants

One Problem, Different Capabilities

Large enterprises can dictate formats to suppliers and invest heavily in infrastructure. Small and medium businesses are often forced to adapt to incoming data and limit their assortment due to the inability to process content effectively.

Industry Giants
  • dictate terms to suppliers
  • invest in PIM and integrations
  • maintain dedicated data quality teams
  • absorb errors through sheer scale
Small / Medium Business
  • handles supplier formats
  • lacks budget for infrastructure
  • reduces assortment due to content limitations
  • publishes incomplete listings 'as is'
Content becomes a hidden barrier to growth: insufficient resources mean the catalog doesn't expand.
Infographic: Where Format Power Lies
Large players
demand standardization
Small players
adapt to incoming data
In the digital economy, the scale of a business is increasingly defined by the scale of its data. If your catalog cannot be published quickly and accurately, growth is limited not by demand, but by operations.
Section 5

Suppliers as a systemic bottleneck

The supplier is the content starting point

Suppliers and manufacturers are the primary source of product information: specifications, SKUs, images, packaging, certificates, and technical descriptions. However, the mere existence of data does not mean it is ready for market use: data is rarely structured from the outset to seamlessly traverse the entire chain up to the storefront.

Logistics has established standards and roles (carrier, warehouse, fulfillment). Product data often lacks both standards and a 'network operator': the supplier is forced to act as both the data producer and its integrator—all without the necessary infrastructure.

The key reason

Diversity of formats and single sources of truth

For the same product range, a supplier often maintains several parallel sources: some data resides in the ERP, some in spreadsheets, some in PDFs, and some in emails and approvals. For retail, this translates into constant 'data gathering,' validation, and manual corrections.

Infographic: Where Supplier Data Resides
Systems
ERP / Warehouse / Price Lists

SKUs, stock levels, packaging, partial attributes.

Files
Excel / CSV

Client-specific templates, manual adjustments.

Documents
PDFs / Catalogs

Marketing descriptions and specifications.

Communications
Emails / Messengers

Clarifications, missing photos, exceptions.

Non-standard attributes

Suppliers usually do not align their data to a unified market characteristic dictionary. They provide what they have: their own field names, different units of measurement, and varying levels of detail. Consequently, 'standardization' is effectively performed on the retailer's or marketplace's side.

Single meaning As it appears in the data What retail does
Color Color / Colour / Col / ЦВЕТ / Shade Maps, normalizes
Size Size / Dimensions Standardizes units and format
Material Material / Composition Creates value dictionaries
Repetitiveness: Thousands of companies perform the same normalization in parallel—and pay for it repeatedly.

One supplier — up to 5–10 formats

In practice, a supplier working with numerous partners is forced to maintain **up to 5–10 different templates** and attribute systems. Beyond that point, maintenance costs start growing faster than the benefits—and the supplier either reduces quality or resorts to intermediaries, losing control.

≈ Up to 5
Still functional

Manual support and infrequent updates.

≈ 5–10
Overload zone

Risk of errors and desynchronization increases.

≈ 10+
Scaling breakdown

Intermediaries emerge, leading to loss of control.

Why a supplier cannot 'adapt to everyone'

The reasons are usually not about 'unwillingness,' but about process economics: supporting multiple formats becomes a separate product in itself. Below are typical limitations.

Data is scattered across various sources

ERPs, price lists, files, catalogs, and communications are rarely consolidated into a single structure—a single 'source of truth' is missing.

Too many category-specific exceptions

Different categories require varying levels of detail and different attributes; a universal 'one-size-fits-all' template doesn't exist.

Limited IT resources and competing priorities

For many suppliers, 'content' is a secondary process relative to manufacturing and sales, meaning it rarely receives systematic investment.

Growing client base multiplies the cost of updates

Every update turns into a chain of emails and manual edits across numerous templates; data desynchronization becomes the norm.

Infographic: Data degradation along the path

From 'raw material' to storefront—through loss and duplication

Supplier provides data 'as is'
Proprietary fields, units, versions, often incomplete structure.
Retailer reprocesses
Attribute mapping, normalization, quality control.
Channels demand their own formats
Feeds and restrictions create additional data versions.
The result: the market pays multiple times for the same thing
Meaning is lost, work is duplicated, updates restart the cycle.

Section Summary

Supplier content is the "raw material" of e-commerce. The market treats it as a finished product, even though a layer of transformation lies between the raw material and the storefront: dictionaries, normalization, localization, quality control, and updates.

As long as this layer is missing as infrastructure, every market participant builds it independently—which is why the problem is not solved locally.

The supplier is not to 'blame' for the chaos. They have no incentive to become the integrator for the entire market. Yet, the market systematically demands exactly that from them.
Who controls the format
Large retailers demand a standard
Small stores accept what comes in
Format asymmetry widens the data quality gap between segments.
In hard metrics
Versions per SKU 25–200+
Chain Copying 3–5
Data Loss 15–25%
Market Requirements
  • Neutral Data Transformation Layer
  • Unified Attribute and Unit Dictionaries
  • Automated Updates and Quality Control
Conclusion

What These Figures Show

1) Content is logistics, but without the industry backbone

In e-commerce, money and delivery have long been industrialized. Yet, product data is still moved manually, fragmented, and with losses—across dozens of versions and formats.

2) The problem is systemic—and therefore cannot be solved 'within a single company'

Every player is forced to build their own data transformation layer: attribute mapping, normalization, quality control, localization, feeds. But this doesn't scale at the market level—the work is duplicated across thousands of companies simultaneously.

3) The gap between large and small players is widening

Large players can impose formats on suppliers and invest in infrastructure. Smaller ones are often forced to publish 'as is,' reduce assortment, and lose efficiency due to the inability to process the content flow.

Why NotPIM Emerged We view this problem as infrastructural: the market needs a neutral layer that reduces duplication, minimizes data loss, and automates content logistics without trying to 'force the market to conform.' Not 'just another storefront,' not 'just another format,' but a way to connect market participants at the data level.
The Core Statement
The market needs data infrastructure just as much as it needs logistics infrastructure.

This is why siloed solutions fail to achieve scale, whereas an ecosystem-level approach succeeds.

Transparency

All figures on this page represent order-of-magnitude estimates and typical ranges based on the international market. While they may vary by country and category, the underlying market mechanics (duplication, losses, manual effort) remain constant.

Data and estimates on this page are current as of December 2025.