Datasets
The foundation, in detail.
Pauhu®'s corpus aligns structurally with the Common European Data Spaces framework, sourced row by row across fourteen domains in twenty-four European languages. An approved provider to the European Language Data Space since 8 July 2025, registry ID 66. Live DCAT3 catalog. Helsinki, EU jurisdiction. VAT FI07681718.
| Credential | Verify at |
|---|---|
| European Language Data Space | Approved participant, registry ID 66, since 8 July 2025: language-data-space.eu/catalogue/list-of-participants |
| Common European Data Spaces framework | European Commission strategy: digital-strategy.ec.europa.eu/en/policies/data-spaces |
| Pauhu® DCAT3 catalog | Live: api.pauhu.eu/v1/lds/_catalog |
The foundation
One corpus. Fourteen domains. More than eleven million sourced rows. Twenty-four European languages.
Pauhu®'s corpus is one cited foundation, structurally aligned with the Common European Data Spaces framework. Each access path is designed around the same catalog; per-domain row counts show what is live and what is still filling.
The corpus spans the fourteen sectoral domains named in the European Commission's 2022 strategy.
Coverage today
| Domain | Coverage area | Rows, approx. launch-time |
|---|---|---|
| Agriculture | Common European Agriculture Data Space (CAEDS) | 90,500 |
| Cultural Heritage | Common European Cultural Heritage Data Space | 129,500 |
| Energy | Common European Energy Data Space (CEEDS) | 367,000 |
| Finance | Common European Finance Data Space | 778,000 |
| Green Deal | Common European Green Deal Data Space (GDDS) | 456,000 |
| Health | European Health Data Space (EHDS), Regulation (EU) 2025/327, two-axis filter applied | 10,000 |
| Language | Common European Language Data Space | 7,082,000 |
| Manufacturing | Common European Manufacturing Data Space | 71,000 |
| Media | Common European Media Data Space | 1,700 |
| Mobility | Common European Mobility Data Space | 551,000 |
| Public Administration | Common European Public Administration Data Space | 1,771,000 |
| Research and Innovation | Common European Research and Innovation Data Space, EOSC-adjacent | 260,000 |
| Skills | Common European Skills Data Space | 91,000 |
| Tourism | Common European Tourism Data Space | 2,400 |
Where the foundation is dense, Pauhu® returns sourced answers row by row, with source URL, paragraph-precise identifier, and timestamp. Where it is still filling, Pauhu® returns an honest gap that names what would close it. Both behaviours are the product. Row counts above are launch-time. Live counts resolve at each domain's API URL inside the catalog.
Where Pauhu® fits
Where Pauhu® fits.
The European Commission's strategy names the Common European Data Spaces as the framework for sharing sector-specific data across Europe. The European Language Data Space was the first sectoral pilot. Pauhu® is listed in its registry, ID 66.
The Commission states the shared mission in its own words: "empower the Multilingual Digital Single Market while preserving Europe's language diversity through digital means" and "advance Europe's digital autonomy and technical sovereignty" (DG CNECT). Pauhu® has been building the foundation in production since 2023. Today the foundation is approved as an LDS provider, listed in the registry, and live.
Verify
Verify the foundation.
| Claim | Verify at |
|---|---|
| Approved participant in the European Language Data Space, LDS Governance Board, 8 July 2025, registry ID 66 | language-data-space.eu/catalogue/list-of-participants |
| Common European Data Spaces framework, European Commission strategy | digital-strategy.ec.europa.eu/en/policies/data-spaces |
| EHDS regulatory framework, Health domain | Regulation (EU) 2025/327, Official Journal of the European Union |
| Public DCAT3 catalog, open discovery across all fourteen domains | api.pauhu.eu/v1/lds/_catalog |
| Per-domain data envelope, example path (access required), Energy | api.pauhu.eu/v1/lds/pauhu-energy |
| Per-domain rows, example path (access required), Energy | api.pauhu.eu/v1/lds/pauhu-energy/rows?limit=5 |
Licensing
Licensing.
Pick the data spaces you need, per seat. Every plan delivers the same foundation through MCP, REST, and the structured stream.