How to Write Helpful Queries?
Base on Dataset Descriptions
Searching for datasets is different from searching for texts available on websites. Google or another search service captures every word that has appeared in a text. Conversely, the Dateno search engine works with dataset metadata rather than the data itself. Therefore, the more subject-specific a query is, the less likely it is to match dataset descriptions. On the other hand, an overly general subject may match too few datasets.
Use Terms of Intermediate Generality
How General Is Your Query?
No one creates datasets that encompass absolutely everything at once, nor do publishers often create datasets about extremely specific topics. Instead, most datasets are focused on subjects of intermediate generality. For example, a dataset might cover wildlife populations in North America rather than all animals worldwide or habitat details of a single species of bird.
It is important to adjust your search to align with this reality:
- For wide topics, expect to find multiple datasets, each covering a specific part of the broader subject. For instance, a query about "nature" might lead you to datasets on forests, oceans, or wildlife populations.
- For narrow topics, consider that your specific subject might be embedded within a dataset addressing a broader category. For example, data about rabbits might be part of a dataset about domestic animals or companion animals.
By aligning your query with intermediate generality, you increase the chances of locating relevant datasets while avoiding overly broad or overly narrow searches.
Specifying a Query That Is Too General
If your query is too broad, refine it by breaking the general term into several specific subcategories. For example:
- A query for nature might be refined into more specific searches for forests, oceans, or wild animals.
- Use multiple specific terms in a single search query. For example, searching for forests and oceans together doesn’t require both terms to appear in a single dataset. Instead, the search engine will rank results by relevance, prioritizing datasets that include either or both terms. This approach helps you explore related datasets without overly narrowing your results.
Combining specific terms can provide a more focused search without excluding important information.
Generalizing a Query That Is Too Specific
If your query is too narrow, broaden it by generalizing the terms:
- A query for rabbits might be expanded to include broader categories such as wild animals, domestic animals, or companion animals.
- Consider using terms like pets or wildlife, which may align better with dataset descriptions, as dataset titles and metadata often avoid extremely specific terms.
By finding the right level of generality, you’ll be better equipped to locate datasets that include your desired information.
Example
Why not move from theory to practice?
Let us find datasets where rabbits are mentioned. The phrase companion animals seems to be a good suggestion for a search term.
-
Try the query companion animals.
After submitting the query companion animals, we get a set of links.
-
Examine the first link to see a set of PDF resources.
-
Finally, review the first PDF.
Rabbits are with us now. We did it!