ahorn-loader
ahorn-loader is both a command-line tool and a Python package designed to interact with the AHORN dataset repository.
It allows users to easily download datasets and loading them into Python for analysis and experimentation.
It is also home of our validation script to check the correctness of a dataset before publishing it.
Command-line Usage
To install and use ahorn-loader from the command line, you can run the following command:
uvx ahorn-loader [command] [args]
Commands include:
ls: List available datasets in AHORN.download: Download a dataset from AHORN.validate: Validate a specific dataset file (e.g., before adding it to AHORN).
To get a full help of available commands and options, run ahorn-loader --help.
Python Package Usage
You can install ahorn-loader as a Python package from PyPI via pip (or some other package manager of your choice):
pip install ahorn-loader
Then, you can use it in your Python scripts:
import ahorn_loader
# Download a dataset:
ahorn_loader.download_dataset("dataset_name", "target_path")
# Download and read a dataset:
# The dataset will be stored in your system's cache. For a more permanent storage
# location, use `ahorn_loader.download_dataset` instead.
with ahorn_loader.read_dataset("dataset_name") as dataset:
for line in dataset:
...
# Validate a specific dataset (e.g., before adding it to AHORN):
ahorn_loader.validate("path_to_dataset_file")