Climate change is considered by many to be the single most serious threat to human life today. In fact, a 2018 report from the UN stated that “rapid, far-reaching and unprecedented changes in all aspects of society” are needed to avoid reaching an irreversible tipping point – and we don’t have much time to make them.
The problem is, we still don’t have enough data to accurately pinpoint the causes – or even the effects – of many of our problems.
But how can that be, in a world where more than 2.5 quintillion bytes of data are created every day? Surely we must have enough records by now to establish exactly where and when we’ve been going wrong?
Unfortunately, it’s not that simple. Yes, we have a lot of data on topics and events that fall under the umbrella of climate change concerns – but data alone is not going to help us.
Why big data hasn’t been much help so far
In a report titled, ‘A Big Data Guide to Understanding Climate Change,’ authors James H. Faghmous and Vipin Kumar admit that “the slow progress [towards understanding climate change] has been vexing given that climate science has become one of the most data-rich domains in terms of data volume, velocity, and variety.”
According to them, there are three main factors that have hindered progress in the field:
“First, the data that climate science uses violate many of the assumptions and practices held in traditional data science … Second, the field of data science has historically focused on certain tasks and evaluation metrics that are not applicable to some of climate science’s biggest needs. Finally, and this is only a matter of time, climate science, its data, and challenges have not been exposed to the broader data science community until recently.”
Put simply: climate change data is unconventional. It does not fit the patterns and trends that we are used to, and has not been around long enough for us to understand every anomaly and nuance.
What’s more, even the data we do have cannot be considered wholly accurate or reliable.
“Changes in instruments and data processing algorithms put into question the applicability of such data to study long-term climate,” explain Faghmous and Kumar. They give the example of tropical cyclones or hurricanes, which “have been routinely observed since the mid-1940s.” The data we have appears to show a spike in such events since the 1970s, but this is misleading.
“Before satellite monitoring became routine in the late 1970s, tropical cyclones were prone to be missed if they were not observed through landfall, a ship, or airplane reconnaissance,” the study says. “Thus, there is an upward trend in the total number of tropical cyclones in the Atlantic, but it is unclear if it is due to changes in the observational system or due to climate change.”
The authors note that we have an “abundance of climate data spanning the same period,” again with evident changes in patterns. Unfortunately, because of changes in the way this data was collected, it’s difficult to accurately comment on how climate change has affected weather systems in the long term.
This isn’t to say we don’t know that they are – we just can’t be certain of the extent.
In a similar vein, because we have only recently begun collecting data specifically for the purposes of monitoring climate change, it is difficult to retroactively apply similar datasets to the same purpose.
“With large datasets where one measures anything and everything, it can be difficult to understand how that data were collected and for what purpose,” the study says.
Then there are the problems of data availability and heterogeneity. Many datasets only cover a short period of time (quite often a decade or less), and even those that do span a longer period involve such a huge set of variables that it’s difficult to interpret links and causations.
And then there’s the glaring issue with how we process big data.
“Traditional data science, and machine learning specifically, has relied on attribute-value data as input to most learning models. However, numerous climate phenomena cannot be represented in attribute-value form,” the study says. “For example, a hurricane is an evolving pattern over a significant spatiotemporal span. Thus, one cannot represent a hurricane with a binary value as it does not simply appear and then disappear.”
How we can adapt our approach
Though big data technology might not be have been designed with climate change analysis in mind, it can still be utilised to further our understanding of the problem.
“While the need to study our planet will most certainly spur numerous data science innovations,” Faghmous and Kumar say, “we believe that the highest impact change will occur when we remove the emphasis on differentiating between data-driven and hypothesis-driven or theory-driven research.”
Rather than rely solely on artificial intelligence and conventional big data analytics approaches, then, we need to adapt our methods. Faghmous and Kumar suggest “theory-guided data science methods” to blend our current analytics approach with a higher degree of caution and human-driven scientific theory.
Ultimately, big data has the potential to have a huge impact on how we understand and deal with climate change – we just need to take a more proactive approach.