Whats the Damage?

What’s the Damage?

This week I’m shifting toward looking at the damage of each tornado. The dataset I’ve been working with records the total injuries, fatalities, and property loss of each tornado. It also associates a magnitude with each tornado ranging from 0 to 5. What I expect to see is the amount of damage to increase as the magnitude increases.

Starting off with the injuries I first extracted the mag and inj from the original dataset. I then used a ListLogPlot to visualize the data. Below are the plots for injuries, fatalities, and property loss.

The data for the property loss was separated into two formats. From 1950 to 1996 a number was used to represent an estimated amount of loss. Below are the conversations of those numbers as stated in the dataset documentation.

From this I needed to convert all those entries to million dollar amounts. This is because the entries from 1996 onward are recorded in millions of dollars. To make this data not looks skewed compared to the injuries and fatalities, I then divided each entry by 1 million. Below is the process in which this is done. For each of the iterations I picked the dollar loss upper bound.

From the plots there does appear to be an upward trend compared with the magnitude. To get a better understanding of this, I added each column of damage data (injuries, fatalities, and property loss), and then added each row of magnitude. This gave me the total amount of damage for each tornado magnitude.

Putting these values in a line plot clearly shows a upward slope from 0 to 3.

The magnitudes of 4 and 5, however, do not show as much of an increase as the others. In fact, 5 decreases in damage. I believe that this is because the amount of magnitude 5 tornados is much less than 3 or 4. Taking a tally of the magnitudes shows this to be correct.

As seen, there are only 88 occurrences of magnitude 5 tornados, while there are 2703 and 714 magnitude 3 and 4 tornados respectively.

Angle Distributions

Angle Distributions

As previously stated, the Tornado directions seem to mostly lean toward the Northeast direction. To quantify this trend, I decided to look at the distribution of each Tornado spatial angle. The very helpful GeoDirection function of Mathematica can analyze each of the Tornado path angles.

This function outputs an angle in degrees called the azimuth. Below is an azimuth circle that shows what direction each angle means. The 0 degrees starts at North, while an increasing angle travels clockwise around the circle. From the Northeast prediction, there should be a higher distribution of angles around the 40-60 degree mark.

Oklahoma

At first the histogram produced for Oklahoma showed a very normal distribution near this mark, however, there is also a large volume of tornadoes at the 200 degree mark.

To solve why this was the case, I tested each of the slope clusters I had created from the clustering post. The undefined/0 slope cluster is the culprit as shown below.

So I removed the cluster of Tornados with no slope from the original histogram to get a better distribution. This distribution follows the Northeast trend as I expected, with most tornadoes in between that 40-60 degree area.

Florida

I expect Florida to follow the same pattern but with less Tornadoes in general, and it does on a more normal scale.

Louisiana

Louisiana also follows this trend.

For the Future

I want to now move away from the Tornado paths and onto my other questions previously stated.

 

 

Length Distributions

Length Distributions

Oklahoma

This time I looked at the distributions of tornado lengths. I did this for Oklahoma, Florida, and Louisiana. Using the GeoDistance function I could easily find the distance between two spatial points.

Here I take the starting and ending latitude and longitude to calculate the lengths. I also use QuantityMagnitude to remove the miles units from each entry. This makes calculations on the set easier. Next I plotted the histogram of the log normal distribution of lengths.

This histogram vaguely resembles a log normal distribution. To get a more precise idea of this, I used the QuantilePlot and Kurtosis functions for further analysis.

This shows that the distribution is relatively close to that of a normal distribution. A normal distribution has a kurtosis value of 3. 4.3 is close enough to assume a normal distribution. The graph also shows that only the tails of the data deviate from the distribution.

Florida

Florida has about half the amount of tornadoes that Oklahoma has. Initially plotting the paths of each tornadoes based on the three same clusters shows that same Northeast pattern.

Analyzing the length distribution shows the same log normal distribution.

This distribution is a bit softer on the ends compared with Oklahoma. The quantile plot and kurtosis show this in more detail. Kurtosis comes back with a higher value of 4.6.

Louisiana

Finally, I plotted the paths of tornadoes in Louisiana and found the distribution of those lengths. The distribution come out once again with a fairly log normal distribution.

The quantile plot and kurtosis tell the same story as before, with a kurtosis value of 4.8.

For the Future

I want to focus next on plotting the distribution of angles to further back the Northeast direction hypothesis. I’m expecting this distribution to be spiked near 40-50 degrees.

 

Clustering – Oklahoma

Oklahoma

This post is mainly going to be about finding a way to plot the beginning and ending latitude and longitude points of the dataset. To simplify things in this first analysis, I am going to only worry about the data points associated with Oklahoma. The reason for choosing Oklahoma is because the state is in one of the main hotspots for the overall tornado dataset. Below are the starting and end datasets derived from the main dataset.

From these sets I then used the GeoHistogram function to better visualize the tornado density and, hopefully, the direction.

Something of note here is that the ending dataset has less points than the starting one. This is because some of the ending latitudes and longitudes for the tornados were not recorded in some instances (showing a 0). Since this is the case, I am going to disregard those cases during analysis.

Clustering

The above histograms don’t give a good depiction on the starting and ending points of each tornado, so I need to implement a way I can effectively visualize each tornado path. I first selected a dataset of the four points that are from Oklahoma and that don’t equal zero. I then created four lists for each of the four latitudes and longitudes.

For my clustering I knew that visualizing a line for each tornado path would be a good depiction of the tornado’s activity. Through the GeoPath and Table functions I was able to plot each of the tornados on the Oklahoma map.

From this visualization I noticed a pattern for a majority of the tornados. It seems each travels in a Northeast direction. From here I separated the data into three different clusters: the tornados with a positive slope, a negative slope, and no/undefined slope.

To figure out the slopes of each tornado path, I created a function:

This function takes in the two starting and two ending points and calculates their slope. If x2-x1 = 0, then the slope is undefined as a divide by zero would occur. From this I created the three clusters.

Finally, these clusters were used with the GeoPath function to visualize if the above Northeast prediction was correct.

From here I can safely say that this Northeast trend exists.

For the Future

I want to plot the lengths of each of the tornado paths on a graph to see if there is any specific distribution. I also want to do the same analysis with two other states to see if the Northeast theory upholds.

Initial Data Organization

The Data

I have imported the dataset into Mathematica; below is how I imported it. I chose to use the “dataset” option to import the set neatly as a dataset for Mathematica to interpret. I also used the HeaderLines option to signify that the first row of values were headers. This will make it easy to index each column for storing specific values in the future.

I first chose to extract the starting and ending coordinate pairs of each tornado in the dataset. I can easily do this by indexing the specific headers.

Since these extract as datasets with the corresponding headers, I needed to convert each to a coordinate pair matrix. Using [All, Values]//Normal gets the data to display properly.

Using the GeoPosition function, I can now interpret each coordinate pair as a geo location. This function returns 64,825 points for analysis.

I finally used the GeoHistogram function, as well as, the United States Entity Class from Wolfram to plot the bins. I used a bin size of 100 to help clearly define where each tornado is located. I should make the bin size smaller for more accurate future analysis.

As shown the maximum amount of tornados in a single spot is around 175. From this display I was surprised to notice that Florida has a significant amount of tornados. It is clear that the central United States is what is impacted by tornados the most. However, tornados have existed almost everywhere else in the United States at some time from 1950 to 2018.

 

Looking Forward

I want to start to focus in on specific states for further analysis. I also can now use the dataset methods to organize the data to my liking; such as finding the max amount of damage from any one tornado. I have also added another question to my list:

  • How has tornado frequency changed as time has gone on?

Chosen Dataset – Tornado Tracks

I have finally decided upon a dataset in which I will analyze for the majority of this semester.

The Dataset

This dataset contains 61k+ tornados from 1950 to 2018.
Each entry has a starting point and ending point.
Other details, like magnitude, property loss, injuries, width, and what states it went through.

Questions

Originally I wanted to know what causes a tornado in the first place, but it doesn’t feel like this dataset is going to be able to provide me with that information.
So instead I’m going to look more at questions like:

  1. What amount of damage does a tornado cause on average?
  2. What states are mainly targeted by tornados?
  3. How long does a tornado last on average?

I’m hoping to have answers to some if not all the questions by the end of the semester.

Dataset: https://oasishub.co/dataset/usa-tornado-historical-tracks-noaa