The methodology is divided into several phases. The first step involved data collection focusing on six digital assets: Bitcoin, Bitcoin Cash, Ethereum, Ethereum Classic, Litecoin and Ripple, which together accounted for 71 percent of the market capitalization of 100 digital assets. The digital asset price data was obtained from coinmarketcap.com. This study also sourced user tweets on Twitter for digital assets using the Twitter Search Application Programming Interface (API). We collected Twitter data that includes usernames, hashtags, tweets and some retweets. We used RapidMiner software to collect the 160 data per day for each digital asset using the keywords ‘Bitcoin’, ‘Bitcoin Cash’, ‘Ethereum’, ‘Ethereum Classic’, ‘Litecoin’ and ‘Ripple’. In addition, data related to web searches, news searches, and YouTube searches based on these keywords were obtained from Google Trend data using RStudio software. The data has values ranging from 0 to 100. A value of 100 indicates the highest popularity for the search term and the reverse for 0 value. Daily data was collected between 1 September 2019 and 31 January 2020.
The second step was calculating the volatility for mentioned digital assets and cleaning the collected tweets. Starting with market data for digital assets, we used the price of Bitcoin, Bitcoin Cash, Ethereum, Ethereum Classic, Litecoin and Ripple Pi,t to calculate the return, as in Eq. (1):
$$Return_{i,t} = \frac{{P_{i,t} – P_i\left( {t – 1} \right)}}{{P_i\left( {t – 1} \right)}} $$
(1)
The variance of digital asset returns was then estimated using the generalized autoregressive conditional heteroskedasticity (GARCH) approach in this study. The following are the conditional mean and variance specifications:
$$\begin{array}{ll}Return_{i,t} = {\beta _0 + \beta _1Return_{i,t – 1} + \varepsilon _{i,t}} \\\qquad\qquad\quad {\varepsilon _{i,t} = {{{\mathrm{\eta }}}}_{i,t}\sqrt {{{{\mathrm{h}}}}_{i,t,}} {{{\mathrm{\eta }}}}_{i,t}\sim N\left( {0,1} \right)}\end{array}$$
(2)
$$\sigma _{i,t}^2 = x + \alpha \varepsilon _{i,t – 1}^2 + \beta \sigma _{i,t – 1}^2$$
(3)
Where, Returni,t is the current digital asset rate of return (i = 1,2,3,4,5,6) at time t. Next, Returni,t-1 refers to the past digital asset rate of return for Bitcoin, Bitcoin Cash, Ethereum, Ethereum Classic, Litecoin and Ripple, and εi,t is an error term. The parameters of x > 0, α ≥ 0, β ≥ 0 and ηi,t are independent and similar random variables distributed with zero mean and unit variance. The hi,t are the matrix covariances. The error terms behave normally, and maximum likelihood is used to estimate parameters. After that, the composite tweets were then processed to remove any noise components. In this study, user sentiments were analyzed using the Valence Aware Dictionary for Sentiment Reasoning (VADER) (Hutto and Gilbert, 2014). This dictionary is useful for deciphering some punctuation, symbols and numbers in tweets. Our study used the VADER dictionary to clean the data, as demonstrated by Öztürk and Bilgiç (2021). Tweets have been cleaned of all types of punctuation except #, $, @, ‘, ‘, !, “, ?, ., and web page links. In addition, all uppercase letters have been changed to lowercase.
The third phase of this study involved sentiment analyzes on the cleaned tweets. The VADER approach is a lexicon and rule-based sentiment analysis and is specifically trained and suitable for sentiments expressed on Twitter (Elbagir and Jing, 2019; Kraaijeveld and De Smedt, 2020). Valencia et al. (2019) stated that VADER has several additional advantages, and it is particularly useful for analyzing tweet content and extracting sentiment values from emotions, emojis, punctuation, use of grammar, slang, and acronyms compared to machine learning techniques. Moreover, VADER can produce three types of sentiment which are positive sentiment, neutral sentiment and negative sentiment. Furthermore, VADER was used to estimate the composite score. Based on this, each tweet collected for the study was divided into three sentiment categories. A tweet with a score of -1 was categorized as a negative sentiment, while a tweet with a score of +1 was classified as a positive sentiment. Hutto and Gilbert (2014) also indicated that composite scores of ≥0.05 indicated positive sentiments, while neutral sentiments ranged between >−0.05 and <0.05, and ≤−0.05 for negative sentiments. This range of scores was also used in earlier studies that used the VADER dictionary (Kraaijeveld and De Smedt, 2020; Öztürk and Bilgiç, 2021; Suardi et al. 2022). The total number of positive, neutral and negative sentiments were counted and individually categorized into daily tweet datasets after the sentiment analysis was completed. The Python softwareFootnote 7 was used for the cleaning process and sentiment analysis.
In the fourth step, the datasets for digital assets are organized independently. The variables in this study had to be renormalized as the sentiment and Google trend data were highly volatile compared to other variables. The Z transformation was used to standardize all the time series: Zt = (Xt−μx)/σx, where μx and σx are respectively defined as the mean and standard deviation of each time series. Due to the equal scale and variance of all the data, researchers were able to quantify the effects of the changes in numerical analysis (Garcia et al. 2015). Before proceeding with the VAR analysis, this study adopted the Augmented Dickey Fuller (ADF) to assess each time series stationary characteristics (Fuller, 2009). The null hypothesis of the Augmented Dickey–Fuller t-test is 0 Ho θ = (ie the data must differ to make it stationary) versus the alternative hypothesis of 0: H1 θ < (ie the data are stationary and do not have to differ does not become). All variables must be tested using ADF, and if they are rejected by the null hypothesis, the data are considered stationary and significant at level I(0).
Finally, to investigate the implication of Google searches on the volatility of digital assets, a VAR method was used in the following form:
$$Y_{i,t} = a + \mathop {\sum}\limits_{i = 1}^p {A_iY_{i,t – i}} + \mathop {\sum}\limits_{j = 1} ^k {\beta _iX_{i,t – j} + \varepsilon _{i,t}}$$
(4)
where a is a vector of constant white noise innovations while εi,t is an independent vector. Yi,t is the vector y of variable volatility for Bitcoin, Bitcoin Cash, Ethereum, Ethereum Classic, Litecoin and Ripple. Xi,tj represents the vector containing different variables such as web search, news search, YouTube search, positive sentiment, neutral sentiment and negative sentiment. The lag selection is based on the Schwarz Criterion (SC), the Akaike Information Criterion (AIC) and the Hannan-Quinn (HQ) Criterion. However, the proposed delay for Bitcoin and Ripple has an autocorrelation problem. To solve this problem, other delays were chosen, namely lag 3 for Bitcoin and lag 2 for Ripple. In addition, lag 1 was selected for Bitcoin Cash, Ethereum, Ethereum Classic, and Litecoin based on the lag selection criteria SC, AIC, and HQ. With this VAR model, this study then conducted a linear Granger causality test (Granger, 1969). The Granger causality test is written as follows for a linear system:
$$\Delta Y_{i,t} = \beta _0 + \mathop {\sum}\limits_{i = 1}^n {\beta _{1i}} \Delta Y_{i,t – 1} + \ mathhop {\sum}\limits_{i = 1}^m {\beta _{2i}} \Delta X_{i,t – 1} + \varepsilon _{i,t}$$
(5)
This research performed the Impulse Response Function (IRF) analysis, which is a fundamental method in a VAR model (Dizaji, 2019; Siriopoulos et al. 2021). IRF shows how the volatility of digital assets responds to a shock in web search, news search, YouTube search, positive sentiment, negative sentiment, neutral sentiment, and volatility. In IRF, the vertical line represents the magnitude of response to shocks and the horizontal line indicates the period after the initial shock. The dashed lines represent 95% confidence intervals, while the solid lines depict the impulse response. When the horizontal line falls between confidence bands, the impulse response is not statistically significant. On the other hand, the ordering of variables may influence the IRF findings. Dizaji (2019) suggested that the ordering of variables should conform to economic theory. These were mostly exogenous variables to endogenous variables. Google trend and sentiment variables were placed as the first and second variables which are the most exogenous variables in our model. Volatility variables come next in the Cholesky sequence, after the first and second variables. Hence volatility is the most endogenous variable in the VAR system. Next, a diagnostic test of the estimated VAR model was performed separately using the inverse roots of the AR characteristic polynomial (VAR stability) and the VAR residual serial correlation Lagrange Multiplier (LM) test for digital assets datasets.
Disclaimer for Uncirculars, with a Touch of Personality:
While we love diving into the exciting world of crypto here at Uncirculars, remember that this post, and all our content, is purely for your information and exploration. Think of it as your crypto compass, pointing you in the right direction to do your own research and make informed decisions.
No legal, tax, investment, or financial advice should be inferred from these pixels. We’re not fortune tellers or stockbrokers, just passionate crypto enthusiasts sharing our knowledge.
And just like that rollercoaster ride in your favorite DeFi protocol, past performance isn’t a guarantee of future thrills. The value of crypto assets can be as unpredictable as a moon landing, so buckle up and do your due diligence before taking the plunge.
Ultimately, any crypto adventure you embark on is yours alone. We’re just happy to be your crypto companion, cheering you on from the sidelines (and maybe sharing some snacks along the way). So research, explore, and remember, with a little knowledge and a lot of curiosity, you can navigate the crypto cosmos like a pro!
UnCirculars – Cutting through the noise, delivering unbiased crypto news