The algorithmAuthor connectivityWe can characterize the existence and strength of the co-authorship between two authors ${i,j}$ by introducing the matrix of connectivity $C_{ij}$. The simplest definition of connectivity is to have $C_{ij}=1$ if authors $i$ and $j$ have at least one paper in common, and $C_{ij}=0$ otherwise. Weighted connectivities can be used in order to better reflect certain properties of co-authorships:
PropagationHaving introduced the connectivity between pairs of authors, we can now consider the graph of co-authorship and characterize the properties of random walks starting from a given author. First, we assume that, starting from author $i$, the probability $P_{ij}$ to reach author $j$ within one step is proportional to their connectivity $C_{ij}$. We then have $$ \forall\, j,~ \mathrm{P}_{ij}=\frac{\mathrm{C}_{ij}}{\sum_{j} \mathrm{C}_{ij}} $$ where the normalisation ensures that the sum of the probabilities of moving from $i$ to all its neighbors $j$ is unity, i.e. $\sum_j \rm P_{ij}=1$. Note that this normalisation introduces a directionality in the graph, as $\rm P_{ij}\ne P_{ji}$. We now introduce the vector $\Delta$ representing a probability density in the space of authors. If we first consider a unique author $k$, we have $\Delta_i=1$ with $i=k$ and $\Delta_i=0$ otherwise. If we consider a list of authors $\Delta$ needs to be normalized such as $\sum \Delta_i =1$. If $\Delta^{(t=0)}$ is the vector representing the initial author density, the probability to be on any author after one propagation in the network of co-authorship is simply given by $ \rm {\Delta^{(t=1)}}= \rm P\times \rm \Delta^{(t=0)}\,. $ This process can be reiterated in order to further propagate the random walks within the graph and reach authors who are not directly connected with those defined by the initial author density: $$ \mathrm{\Delta}^{(t=n)}= \mathrm{P_{}}\times \rm \mathrm{\Delta}^{(t=n-1)}\,. $$ The convergence properties of $\Delta^{(t)}$ provide us with useful information on the proximity between authors. First, when $n\rightarrow \infty$, the distribution $\Delta^{(t=n)}$ of probabilities among the authors no longer depends on the starting point $\Delta^{(t=0)}$. Considering an initial author state $\Delta^{(t=0)}$, and a given author $i$, two types of convergence can be observed as $n$ increases from 0 to $\infty$:
2. $\Delta_i^{(t)}$ presents a peak after a few steps and then decreases monotically to reach its limiting value. In this case, author $i$ belongs to the close neighborhood of $\Delta^{(0)}$. Paper proximityWe can now use the above author proximity to compute the proximity of a given paper. Motivated by our definition of connectivity, we propose to use $$ \mathrm{Paper~Proximity}= \sum_i \frac{\mathrm{Prox}(\Delta^{(0)} \rightarrow i)}{r_i+1} $$ where the sum is over all co-authors of the considered paper and $r_i$ is the rank (starting from 0) of author $i$ in the list of co-authors. The corresponding algorithm has been implemented and is available at http://arxivsorter.org.RemarksOn arxiv.org each author is represented by an identifier of the form name_f where f is the first letter of the first name. Therefore two authors sharing these parameters will be considered as a single entity and therefore one node of the co-authorship network. As a result their connectivity $\mathrm{C}_{ij}$ becomes the sum of their individual connectivities. In practice, this affects mostly Chinese names.The connectivity matrix $\mathrm{C}_{ij}$ encodes the properties of the co-authorship network. Considering papers on astro-ph, it turns out that the vast majority of authors are interconnected via other authors. They form the so-called giant cluster, for which it is possible to go from any author $i$ to $j$ using the random walks described above. However, about two percents of the community is not connected to the giant cluster and the proximity of these authors to a member of the giant cluster is by definition zero. Note that a future version of Arxivsorter will alleviate this limitation. Jean-Philippe Magué (jean.philippe.mague [at] gmail.com) Brice Ménard (menard [at] cita.utoronto.ca) Special thanks to Mubdi Rahman for the web support. |