Dataset
Semantic Scholar Co-Authorship Sample
Network Type
Constructed from a subsample of the Semantic Scholar Open Research Corpus, this dataset is a simplicial complex where nodes are authors and simplices are sets of co-authors on papers. Values on every simplex indicate the sum of citations of all papers co-authored at least by the corresponding set of authors.
Attributes
This dataset has the following simplex attributes:
citations : int: The sum of citations of all papers co-authored at least by the corresponding set of authors.
Example
| Paper | Authors | Citations |
|---|---|---|
| Paper 1 | A, B, C | 100 |
| Paper 2 | A, B | 50 |
| Paper 3 | A, D | 10 |
| Paper 4 | C, D | 4 |
Take note that edge (A, B) has value 150 since both Paper 1 and Paper 2 were co-authored by at least A and B. Similarly, node C has value 104 since C co-authored Paper 1 (100 citations) and Paper 4 (4 citations).
Dataset Statistics
Nodes
Nodes
352
Node Type
Author
Node Degree
Min1
Q163
Median127
Q3511
Max3,787
Hyperedges
Simplices
24,200
Simplex Type
Co-Authorship
Maximal Simplex Size
Min2
Q16
Median7
Q39
Max11