Skip to main content

Dataset Descriptor

Purpose

A dataset descriptor is a compact metadata structure returned in the dataset search results from the Dateno API. It provides summary information about a dataset but omits the full content available in a dataset card.

Descriptors are primarily used in search responses to support filtering, display dataset titles, and provide basic data attributes. They are returned inline as part of search hits together with ranking scores. Dataset cards expose the same core metadata plus a consistent resource list via a dedicated endpoint.

Relation to Dataset Card

A dataset descriptor and a dataset card refer to the same dataset but serve different purposes and have different structures.

AspectDataset DescriptorDataset Card
UsageIncluded in search resultsRetrieved via direct dataset request
ContentCore metadata subset optimised for search resultsFull metadata including resource descriptors
ResourcesMay include a list of resourcesListed in the resources property
Long DescriptionAvailable in the dataset.description fieldSame field, typically used when viewing a single dataset
Use in FilteringYes, via structured fieldsNot used directly for filtering

Structure

Each dataset descriptor is a JSON object located in the _source field of a hits.hits array element. It contains the following top-level parts:

PropertyDescription
idUnique identifier of the dataset in the Dateno registry
int_idInternal identifier coming from the source catalog
sourceCatalog metadata
datasetDataset metadata assigned in the registry
resourcesList of resource descriptors, if available
sourcesArray of catalog entries that reference this dataset (primary one is source)
scoresNumeric scores for dataset ranking and quality assessment in search results

source

Describes the catalog and organization that published the dataset.

PropertyDescription
schemaPlain string indicating the source schema (e.g., ckan, arcgishub)
uidUnique identifier of the catalog in Dateno
owner_nameName of the publisher
softwareDirectory entry describing the maintenance software
catalog_typeDirectory entry identifying the catalog type
nameName of the catalog
macroregionsArray of directory entries for macroregions (see dataset attributes)
countriesArray of directory entries for countries
subregionsArray of directory entries for subregions
langsArray of directory entries for languages
owner_typePlain string describing the type of organization (e.g., "Central government", "Academy")
urlURL of the original catalog

TIP
Dataset descriptors and dataset cards include basic metadata about the associated catalog. To retrieve more comprehensive metadata about a specific catalog use the Fetch Single Catalog request in the Dateno API. Use the uid value with the Fetch Single Catalog request to retrieve full catalog metadata.

dataset

Describes dataset-level metadata assigned during catalog indexing.

PropertyDescription
titleTitle of the dataset
short_textBrief description used in previews and search results
descriptionFull-length description of the dataset’s content and purpose
formatsList of data formats (e.g., .csv, .json)
tagsArray of user-supplied tags
topicsNormalized topics assigned by indexer
topics_originalOriginal topics from the source catalog
geotopicsGeospatial topics if assigned
urlLink to the original dataset page
num_resourcesNumber of resources in the dataset
date_createdDataset creation date
date_changedDataset last updated date
datatypesData types describing the nature of data
has_archiveBoolean flag indicating whether the dataset is an archive file
responsibleArray of responsible parties (e.g., publishers or creators)
idInternal identifier of the dataset

scores

Contains scores used by the search engine to rank and assess datasets in search results. All fields are optional and may be null if a specific score is not available.

PropertyDescription
completenessDataset completeness score
freshnessDataset freshness / recency score
accessibilityDataset accessibility and usability score

Filtering

Most fields in the source and dataset objects are used for filtering search results. These include:

  • source.countries.name
  • source.langs.id
  • dataset.formats
  • dataset.topics
  • source.catalog_type

For detailed filter syntax, see Using Filters in Requests.

Example

{
"int_id": "4dc106c6-0027-478e-af07-1c67226a90b0",
"source": {
"schema": "ckan",
"uid": "cdi00000310",
"owner_name": "Government of Alberta",
"software": {
"name": "CKAN",
"id": "ckan"
},
"catalog_type": "Open data portal",
"name": "Government of Alberta open datasets",
"macroregions": [
{
"name": "Northern America",
"id": "021"
}
],
"langs": [
{
"name": "English",
"id": "EN"
}
],
"countries": [
{
"name": "Canada",
"id": "CA"
}
],
"subregions": [
{
"name": "Alberta",
"id": "CA-AB"
}
],
"owner_type": "Regional government",
"url": "https://open.alberta.ca"
},
"id": "8dbfbe735938a118b2a69a6fc8e21c4839561007687ea350c4947ea6f53dbbc5",
"scores": {
"completeness": 0.9,
"freshness": 0.8,
"accessibility": 0.85
},
"dataset": {
"topics_original": [],
"geotopics": [],
"formats": [
".pdf"
],
"topics": [],
"date_created": "2018-07-24T19:01:14.899150",
"short_text": "Summarizes information about rabbit and rodent management in Alberta,",
"num_resources": 1,
"description": "Summarizes information about rabbit and rodent management in Alberta, and the role of Alberta's Wildlife Act in regulating how they can be harvested or controlled in the province.",
"title": "Rabbit and rodent management in Alberta",
"date_changed": "2023-08-31T17:33:25.714755",
"url": "https://open.alberta.ca/dataset/rabbit-and-rodent-management-in-alberta",
"datatypes": [
"documents"
],
"has_archive": false,
"tags": [
"rabbits",
"rodents",
"wild species"
],
"responsible": [
{
"role": "Publisher",
"id": "environment1971-1992--1999-2011",
"title": "Environment (1971-1992, 1999-2011)"
}
],
"id": "4dc106c6-0027-478e-af07-1c67226a90b0"
}
}