Dataset

DBLP Co-Authorship

Network Type

The DBLP co-authorship network is a hypergraph where nodes are authors from the DBLP computer science bibliography database and hyperedges are papers, representing sets of authors who collaborated on a publication. Each author is labeled with their primary research area, inferred from the majority research area of their publications. The research areas include Database, Data Mining, AI, Information Retrieval, Computer Vision, and Machine Learning. This dataset is commonly used as a benchmark in hypergraph learning tasks for author classification and link prediction.

Dataset Statistics

Nodes

Nodes

22,363

Node Type

Author

Node Label

Research Area

Node Degree

Min1
Q12
Median3
Q34
Max197

Node Label Distribution

Hyperedges

Hyperedges

32,304

Hyperedge Type

Publication

Hyperedge Degree

Min2
Q12
Median2
Q33
Max18

Changelog

Revision 2
  • Update to format version 0.3.
  • Drop papers with only one author.