Dataset Descriptor
Purpose
A dataset descriptor is a compact metadata structure returned in the dataset search results from the Dateno API. It provides summary information about a dataset but omits the full content available in a dataset card.
Descriptors are primarily used in search responses to support filtering, display dataset titles, and provide basic data attributes. Unlike dataset cards, descriptors do not include resource descriptors or full-length descriptions.
Relation to Dataset Card
A dataset descriptor and a dataset card refer to the same dataset but serve different purposes and have different structures.
Aspect | Dataset Descriptor | Dataset Card |
---|---|---|
Usage | Included in search results | Retrieved via direct dataset request |
Content | Summary metadata only | Full metadata including resource descriptors |
Resources | Not included | Listed in the resources property |
Long Description | Not included | Available in the description field |
Use in Filtering | Yes, via structured fields | Not used for filtering |
Structure
Each dataset descriptor is a JSON object located in the _source
field of a hits.hits
array element. It contains the following top-level parts:
Property | Description |
---|---|
id | Unique identifier of the dataset in the Dateno registry |
source | Catalog metadata |
dataset | Dataset metadata assigned in the registry |
scores | Numeric scores for dataset ranking in search results |
source
Describes the catalog and organization that published the dataset.
Property | Description |
---|---|
schema | Plain string indicating the source schema (e.g., ckan , arcgishub ) |
uid | Unique identifier of the catalog in Dateno |
owner_name | Name of the publisher |
software | Directory entry describing the maintenance software |
catalog_type | Directory entry identifying the catalog type |
name | Name of the catalog |
macroregions | Array of directory entries for macroregions (see dataset attributes) |
countries | Array of directory entries for countries |
subregions | Array of directory entries for subregions |
langs | Array of directory entries for languages |
owner_type | Plain string describing the type of organization (e.g., "Central government" , "Academy" ) |
url | URL of the original catalog |
TIP
Dataset descriptors and dataset cards include basic metadata about the associated catalog. To retrieve more comprehensive metadata about a specific catalog use the Fetch Single Catalog request in the Dateno API. Use theuid
value with the Fetch Single Catalog request to retrieve full catalog metadata.
dataset
Describes dataset-level metadata assigned during catalog indexing.
Property | Description |
---|---|
title | Title of the dataset |
short_text | Brief description for previews |
formats | List of data formats (e.g., .csv , .json ) |
tags | Array of user-supplied tags |
topics | Normalized topics assigned by indexer |
topics_original | Original topics from the source catalog |
geotopics | Geospatial topics if assigned |
url | Link to the original dataset page |
num_resources | Number of resources in the dataset |
date_created | Dataset creation date |
date_changed | Dataset last updated date |
datatypes | Data types describing the nature of data |
responsible | Array of responsible parties (e.g., publishers or creators) |
id | Internal identifier of the dataset |
scores
Contains scores used by the search engine to rank search results.
Property | Description |
---|---|
feature_score | Score used to order and prioritize results |
Filtering
Most fields in the source
and dataset
objects are used for filtering search results. These include:
source.countries.name
source.langs.id
dataset.formats
dataset.topics
source.catalog_type
For detailed filter syntax, see Using Filters in Requests.
Example
{
"int_id": "4dc106c6-0027-478e-af07-1c67226a90b0",
"source": {
"schema": "ckan",
"uid": "cdi00000310",
"owner_name": "Government of Alberta",
"software": {
"name": "CKAN",
"id": "ckan"
},
"catalog_type": "Open data portal",
"name": "Government of Alberta open datasets",
"macroregions": [
{
"name": "Northern America",
"id": "021"
}
],
"langs": [
{
"name": "English",
"id": "EN"
}
],
"countries": [
{
"name": "Canada",
"id": "CA"
}
],
"subregions": [
{
"name": "Alberta",
"id": "CA-AB"
}
],
"owner_type": "Regional government",
"url": "https://open.alberta.ca"
},
"id": "8dbfbe735938a118b2a69a6fc8e21c4839561007687ea350c4947ea6f53dbbc5",
"scores": {
"feature_score": 95
},
"dataset": {
"topics_original": [],
"geotopics": [],
"formats": [
".pdf"
],
"topics": [],
"date_created": "2018-07-24T19:01:14.899150",
"short_text": "Summarizes information about rabbit and rodent management in Alberta,",
"num_resources": 1,
"description": "Summarizes information about rabbit and rodent management
in Alberta, and the role of Alberta's Wildlife Act in regulating how
they can be harvested or controlled in the province.",
"title": "Rabbit and rodent management in Alberta",
"date_changed": "2023-08-31T17:33:25.714755",
"url": "https://open.alberta.ca/dataset/rabbit-and-rodent-management-in-alberta",
"datatypes": [
"documents"
],
"has_archive": false,
"tags": [
"rabbits",
"rodents",
"wild species"
],
"responsible": [
{
"role": "Publisher",
"id": "environment1971-1992--1999-2011",
"title": "Environment (1971-1992, 1999-2011)"
}
],
"id": "4dc106c6-0027-478e-af07-1c67226a90b0"
}
}