Skip to main content

Dataset Resources

Dataset resources are the information items associated with a dataset. Each dataset has a collection of resources.

Dataset Resource Types

The following dataset resource types are recognized:

  • Data files
  • Link to a data source
  • Metadata

Data Files

A data file contains the dataset payload. Data files commonly belong to one of the following categories:

  • Parsable data files
  • Human-readable data files
  • Archives

A parsable data file represents data in one of the known parsable formats such as CSV, XML, JSON, etc. The particular data structure depends on the dataset. A dataset publisher is supposed to provide some kind of documentation if the structure of a data file is not obvious.

A human-readable file works well when individuals are about to review it. Automated extraction of data from human-readable files might not require the use of sophisticated converters or AI-enabled tools. Word processor files, PDF documents, and HTML documents are typically human-readable. Most image files are human-readable unless you possess tools for their automated processing.

An archive is a container file that encloses a set of payload files, either parsable or human-readable. Documentation and metadata files can also be provided in an archive.

NOTE
It is common for datasets to contain a few data files exposing the same information in different formats.

NOTE We usually expect to obtain data files as a result of a successful dataset hunt. However, a dataset can lack data files if the publisher has not provided resources of this type. If a dataset you have found does not have links to data files, then try to use another dataset or check out the dataset later.

A data source is a resource on the Internet that allows a user or application to access data. The most common type of data source is a REST API endpoint. Data sources of other types are also possible.

Dataset Metadata

Metadata comprises the information that describes a dataset. It allows Dateno to build a helpful index over the available dataset descriptions and select datasets relevant to users' requests.

Dateno web application users do not need to deal with metadata directly. You interact with the metadata of a dataset when you are reviewing a dataset card.

Developers who implement dataset retrieval in their applications process dataset metadata represented as JSON data transfer objects.