About

Datasheet Page

Guidance for writing dataset pages, organizing frontmatter, and presenting metadata and statistics consistently.

Each dataset is represented by a .mdx file in the src/datasets directory. This page should list all details about the dataset, including its description, usage, and statistics. Think of this page as a datasheet as outlined in the Datasheets for Datasets article.

Computed Data

In general, inferred data should be stored in the frontmatter of the document, where it can be easily written and read by different components of the project. Inside the text, the frontmatter is accessible via the frontmatter variable.

Frontmatter

Required fields in the frontmatter:

  • title: The title of the dataset.
  • network-type: List of suggested network interpretations for the dataset.
  • source: A link to the original source of the dataset. If possible, this should be a link to the dataset description page, not a deeplink to the dataset file.

Optional fields that are automatically used when present:

  • license: The license of the dataset. This can be either a string or an object with spdx and link fields. Prefer the latter for standard licenses for easier presentation.
  • citation: List of DOIs or BibTeX entries that should be cited when using the dataset. DOIs are automatically resolved.
  • tags: List of tags that describe the dataset. These are used for filtering datasets in the UI.

To improve consistency, we enforce various rules on the frontmatter fields, such as required fields, field types, and field ordering. The provided datasheet_linter tool checks these rules and provides feedback on how to fix any issue found in the frontmatter.

Attributes

  • All attributes appearing as metadata in the dataset must be listed on the dataset page.
  • Non-standardized attributes must be described in the text.

Statistics

Charts

Label Distribution

For datasets with labeled nodes, we provide a chart that shows the distribution of labels.

import LabelDistributionChart from "@/components/chart/label-distribution-chart";
<LabelDistributionChart title="Node Label Distribution" labels={frontmatter["label-count"]} />;

Label Distribution

Shape

For time-evolving simplicial complexes, we provide a chart that shows the number of simplices over time. The chart is interactive and allows aggregating the shape by different time intervals.

import TemporalShapeChart from "@/components/chart/temporal-shape";
<TemporalShapeChart shape={frontmatter.shape} minUnit="hour" maxUnit="week" />;

Dataset Shape Over Time