Length Distributions

Length Distributions

Oklahoma

This time I looked at the distributions of tornado lengths. I did this for Oklahoma, Florida, and Louisiana. Using the GeoDistance function I could easily find the distance between two spatial points.

Here I take the starting and ending latitude and longitude to calculate the lengths. I also use QuantityMagnitude to remove the miles units from each entry. This makes calculations on the set easier. Next I plotted the histogram of the log normal distribution of lengths.

This histogram vaguely resembles a log normal distribution. To get a more precise idea of this, I used the QuantilePlot and Kurtosis functions for further analysis.

This shows that the distribution is relatively close to that of a normal distribution. A normal distribution has a kurtosis value of 3. 4.3 is close enough to assume a normal distribution. The graph also shows that only the tails of the data deviate from the distribution.

Florida

Florida has about half the amount of tornadoes that Oklahoma has. Initially plotting the paths of each tornadoes based on the three same clusters shows that same Northeast pattern.

Analyzing the length distribution shows the same log normal distribution.

This distribution is a bit softer on the ends compared with Oklahoma. The quantile plot and kurtosis show this in more detail. Kurtosis comes back with a higher value of 4.6.

Louisiana

Finally, I plotted the paths of tornadoes in Louisiana and found the distribution of those lengths. The distribution come out once again with a fairly log normal distribution.

The quantile plot and kurtosis tell the same story as before, with a kurtosis value of 4.8.

For the Future

I want to focus next on plotting the distribution of angles to further back the Northeast direction hypothesis. I’m expecting this distribution to be spiked near 40-50 degrees.

 

Clustering – Oklahoma

Oklahoma

This post is mainly going to be about finding a way to plot the beginning and ending latitude and longitude points of the dataset. To simplify things in this first analysis, I am going to only worry about the data points associated with Oklahoma. The reason for choosing Oklahoma is because the state is in one of the main hotspots for the overall tornado dataset. Below are the starting and end datasets derived from the main dataset.

From these sets I then used the GeoHistogram function to better visualize the tornado density and, hopefully, the direction.

Something of note here is that the ending dataset has less points than the starting one. This is because some of the ending latitudes and longitudes for the tornados were not recorded in some instances (showing a 0). Since this is the case, I am going to disregard those cases during analysis.

Clustering

The above histograms don’t give a good depiction on the starting and ending points of each tornado, so I need to implement a way I can effectively visualize each tornado path. I first selected a dataset of the four points that are from Oklahoma and that don’t equal zero. I then created four lists for each of the four latitudes and longitudes.

For my clustering I knew that visualizing a line for each tornado path would be a good depiction of the tornado’s activity. Through the GeoPath and Table functions I was able to plot each of the tornados on the Oklahoma map.

From this visualization I noticed a pattern for a majority of the tornados. It seems each travels in a Northeast direction. From here I separated the data into three different clusters: the tornados with a positive slope, a negative slope, and no/undefined slope.

To figure out the slopes of each tornado path, I created a function:

This function takes in the two starting and two ending points and calculates their slope. If x2-x1 = 0, then the slope is undefined as a divide by zero would occur. From this I created the three clusters.

Finally, these clusters were used with the GeoPath function to visualize if the above Northeast prediction was correct.

From here I can safely say that this Northeast trend exists.

For the Future

I want to plot the lengths of each of the tornado paths on a graph to see if there is any specific distribution. I also want to do the same analysis with two other states to see if the Northeast theory upholds.

Initial Data Organization

The Data

I have imported the dataset into Mathematica; below is how I imported it. I chose to use the “dataset” option to import the set neatly as a dataset for Mathematica to interpret. I also used the HeaderLines option to signify that the first row of values were headers. This will make it easy to index each column for storing specific values in the future.

I first chose to extract the starting and ending coordinate pairs of each tornado in the dataset. I can easily do this by indexing the specific headers.

Since these extract as datasets with the corresponding headers, I needed to convert each to a coordinate pair matrix. Using [All, Values]//Normal gets the data to display properly.

Using the GeoPosition function, I can now interpret each coordinate pair as a geo location. This function returns 64,825 points for analysis.

I finally used the GeoHistogram function, as well as, the United States Entity Class from Wolfram to plot the bins. I used a bin size of 100 to help clearly define where each tornado is located. I should make the bin size smaller for more accurate future analysis.

As shown the maximum amount of tornados in a single spot is around 175. From this display I was surprised to notice that Florida has a significant amount of tornados. It is clear that the central United States is what is impacted by tornados the most. However, tornados have existed almost everywhere else in the United States at some time from 1950 to 2018.

 

Looking Forward

I want to start to focus in on specific states for further analysis. I also can now use the dataset methods to organize the data to my liking; such as finding the max amount of damage from any one tornado. I have also added another question to my list:

  • How has tornado frequency changed as time has gone on?