Hayafumi Watanabe, Yukie Sano, Hideki Takayasu, Misako Takayasu

In analyses of social media data, one of the most important basic objects is the time series representing the appearance of considered keywords. We aim to describe this fluctuation precisely, whereas the majority of previous research has focused on “trends” in the time series (i.e., nonrandom parts of the time series) for practical reasons.

To elucidate the nontrivial empirical statistical properties of fluctuations of a typical nonsteady time series representing the appearance of words in blogs, we investigated approximately 3 billion Japanese blog articles over a period of six years and analyse some corresponding mathematical models.

First, we introduce a solvable nonsteady extension of the random diffusion model, which can be deduced by modeling the behavior of heterogeneous random bloggers. Next, we deduce theoretical expressions for both the temporal and ensemble fluctuation scalings of this model, and demonstrate that these expressions can reproduce all empirical scalings over eight orders of magnitude. Furthermore, we show that the model can reproduce other statistical properties of time series representing the appearance of words in blogs, such as functional forms of the probability density and correlations in the total number of blogs. As an application, we quantify the abnormality of special nationwide events by measuring the fluctuation scalings of 1771 basic adjectives.

[1] Phys. Rev. E 94, 052317 (2016)