
Tides are special. Almost all aspects of the weather are the outcome of complex interactions: cloud formation, winds, or rain patterns. For any of these dimensions, advanced numerical modeling is required in order to produce reliable forecasts, often limited to a window of four days. These models rely on large datasets, churning a lot of information to gather knowledge that is constantly updated.
In contrast, tide levels can be reliably predicted for the upcoming year with very little information and with an accuracy within inches. That is what makes tides so special. While some weather events are so complex that they may appear random, tides are as regular as a clock. Relatively speaking, they are easy to predict.
How they are predicted has very much to do with their causes, that is gravitational forces from the moon, the sun and planets, each pulling the water (and the earth) towards them. Depending on the earth’s position, water is either closer or farther away to the celestial body, creating a difference in the attraction force. The net effect is the creation of bulges on opposite sides of earth: high water. Local factors also have an impact: shallow waters, the position on earth, natural resonance induced by topology and conflicting currents may affect the height, speed and shape of tides. An excellent summary of the theory of tides can be found in this video.
When the sun and the moon are aligned, gravitational forces add up and the tides are higher (spring tides). When they are perpendicular to each other (with respect to earth), the gravitational forces are acting in different directions and the tides are weaker, generating neap tides.
But as good as the video can be, it remains at an high level. It does not actually produce tide tables. For the purpose of forecasting, it turns out that 37 numbers, water level data and a desktop computer can yield pretty good estimates. The only significant factor needing a decision is how far in the future the forecast needs to be.
Why Tides Matter to Sailors?
The two canonical question for sailors are finding the height of the tide at a given time and its converse, that is finding the time for a given tidal height. All navigational questions related to anchoring, tidal gates and passages, as well as basic water level and currents requirements can be translated into these two questions. A modern way to obtain answers is with a navigational application, but the self-sufficiency mentality ask for methods that a sailor can do « on his own ».
The old school methods for answering the canonical questions rely on tide tables. For a given tide cycle, a manual interpolation is produced, allowing in turn to answer the questions. For that very reason, carrying tide tables onboard was the standard approach if one ever had to adapt passage plans.
The interpolation techniques vary with sailing schools. I previously wrote about the RYA technique. They rely on a graphical (non-parametric) representation of a typical tidal curve. The RYA technique works very well when tides are structurally constant (e.g. as in the Solent). It incorporates all historical average deviations from a sine curve. However, when tides are shape shifting, as it is the case for mixed tides, this approach loses its appeal, as several graphs must be used to adapt to circumstances.
In a previous blog post, I outlined that the Canadian interpolation techniques taught to sailors are pretty much identical despite their apparent differences. The famous « rule of twelfth » and the Fisheries and Oceans Canada tables 5 and 5A are an approximation of the sine curve. Both will yield the same interpolation within at most five centimeters of difference and they perform poorly if the tide is significantly different from a sine curve. I concluded by wondering if carrying a tide forecasting model could be a viable form of redundancy rather than several pages of documents.
The exercice I am about to present convinced me to keep 37 numbers onboard, as they are the sole required input to predict tides anywhere in the world… provided that you can get your hands on tidal data. Moreover, if the forecasts can be done as close as a few hours in advance, a combined forecasting technique can be used to predict tidal heights with even a greater accuracy. Finally, the forecasting technique works at any time of the day, so there is no need for interpolation.
As attractive as the method may be, it did not convince me to abandon standard methods. First, it is technically demanding. I do not see myself writing python code every time I need a tide forecast. Standard applications are much easier to use. And for a quick, dirty estimation of where the tide goes, I still find the rule of twelfth pretty hard to beat.
I learned about the modeling technique by relying on four references, which are the 2007 NOAA document on tidal analysis and predictions (Parker, 2007), a paper by Foremann, Cherniwawsky and Ballantine (2009) on tidal predictions, a reference handbook on tidal predictions (Foremann, Crawford and Marsden, 1995) and the Wikipedia page on least-squares spectral analysis (Wikipedia, n.d.). In particular, the chapter 3 of Parker (2007) is quite helpful to bridge the gap between statistical methods and physics.
There Will Be Math
I make no claim that this method is for everyone. It requires undergraduate or graduate mathematics, specifically some understanding of Fourier series and linear regressions. Fourier series are mathematical tools aimed at modeling periodic phenomenons (such as tides), while linear regressions are all about finding the best mathematical model that fits empirical (tidal) data. So you may read what follows for fun and leave the forecasts to math people. You may also want to embrace the approach and perform the forecasts yourself. All of what is below can be done with free software. The choice is yours, but you have been warned: the math switch is now turned on.
Tidal Modeling 101
Periodic phenomenons can be modeled as a function of time t through a sum of cosines. Those are called a Fourier series. A canonical representation would be:
h(t) = \beta_0 + \sum_{i=1}^n H_i\cos(\alpha_i t+k_i),
where h(t) is the phenomenon of interest (e.g. the height of tide) and \beta_0, H_i, \alpha_i and k_i, i = 1 \dots n are unknowns. Choosing those unknowns properly is key to reproduce the phenomenon. In some case, this cosine decomposition is merely a mathematical approximation. Increasing the number of cosines increases the modelling accuracy (see the video below), but may not have any deep connection with the phenomenon.
In the case of tides, there is a deep causal connection: each cosine represents some of the orbital effects of the moon and the sun on the water. If the orbits of the moon and the earth were perfectly circular and aligned in the same plane, only two cosines would be needed to perfectly model the tides (n would be equal to two). But as these orbits are elliptic and unaligned, additional cosines are required to account for the non-circularity of movements. As the title suggests, 37 cosines do a pretty good job for modelling purposes (n=37), although there can be many more.
Historically, forecasting tides was about finding the unknowns H_i, \alpha_i and k_i. The variables \alpha_i are now known constants related to the frequency of orbits. They measure the speed at which phenomenons are repeated and are measured in degrees per hour. The other unknowns are trickier to find and dependent of local geography. Originally, they were decomposed into separate components, added up, and adjusted manually to improve the accuracy.
There is however a clever trick to avoid finding those values through physical models. With the use of a trigonomectric identity, the expression for h(t) can be transformed in a linear expression of the unknowns. The linear expression can then be estimated on local tide data from which the unknowns can be inferred. So instead of figuring the coefficients from first principles, they were deduced from statistical methods. This method is still in use today and is called the least square harmonic model.
How to Estimate Tidal Coefficients
Recall that the addition of two angles in a cosine can be decomposed as a sum and product of individual angles through the identity:
\cos(a+b) = \cos(a)\cos(b)+\sin(a)\sin(b).
Using this for the cosine expression for the height of tides h(t), it translates to :
\cos(\alpha_i t +k_i) = \cos(\alpha_i t)\cos(k_i) + \sin(\alpha_i t)\sin(k_i).
Replacing this in the model gives:
h(t) = \beta_0 + \sum_{i=1}^n\left(\cos(\alpha_i t)\underbrace{H_i\cos(k_i)}_{\equiv\beta_{1i}}+ \sin(\alpha_i t)\underbrace{H_i\sin(k_i)}_{\equiv\beta_{2i}}\right).
Recall that both H_i and k_i are unknowns, so each product H_i\cos(k_i), H_i\sin(k_i), can be thought of unknowns on their own. Relabeling them \beta yields:
h(t) = \beta_0 + \sum_{i=1}^n\left(\cos(\alpha_i t)\beta_{1i} + \sin(\alpha_i t)\beta_{2i}\right).
If the phases \alpha_i are known, then the cosines and sines are known expressions. Thus, the function h(t) is linear in the unknowns \beta_i and can be found through linear regression.
For large datasets, finding an estimation method that is « fast enough » is a practical problem. Some estimation may work in theory or on small samples, but take so much time for large datasets that they become practically irrelevant. An upper practical bound is that the time required to compute a forecast must be lower than the time of the forecasting window. In other words, a numerical method is pointless if it forecasts the next three days of weather … after four days of calculations.
In that respect, being able to convert a sum of cosines in a linear expression is a pretty big deal. Linear regressions are fast, robust and impervious to data gaps. On a standard computer, a linear regression takes a few milliseconds to perform yearly forecasts. That would not be the case through other (non-linear) estimation methods. This linear conversion can be achieved because we understand the orbital motion of the moon and the earth (which allows us to use the \alpha_i). As it is the case for most practical applications, combining theoretical knowledge with empirical methods provides a functional approach.
What Are the 37 Numbers?
Taken from Parker (2007, pp 88-89):
\alpha_1\dots\alpha_9 | \alpha_{10}\dots\alpha_{18} | \alpha_{19}\dots\alpha_{27} | \alpha_{28}\dots\alpha_{36} |
28.9841042 | 44.0251729 | 13.3986609 | 1.0980331 |
57.9682084 | 12.8542862 | 15.5854433 | 28.5125831 |
86.9523127 | 30 | 57.4238337 | 29.4556253 |
115.9364169 | 31.0158958 | 29.5284789 | 27.8953548 |
15.0410686 | 58.9841042 | 27.9682084 | 13.4715145 |
90 | 60 | 0.5443747 | 0.0410686 |
13.9430356 | 14.4920521 | 14.9589314 | 0.0821373 |
16.1391017 | 43.4761563 | 30.0821373 | 15 |
42.9271398 | 28.4397295 | 1.0158958 | 30.0410667 |
\alpha_{37} | 29.9588333 |
These numbers are related to the orbital period of the moon and the earth. The two most important numbers are earth’s orbital speed and the moon’s orbital speed (called M2 and S2 in the litterature). Those two numbers are explained in the Waterlust video above. Other orbital periods are beyond the scope of this text. One can look at Parker (2007) or Foremann et al. (1995) for a deeper explanation. For a forecasting perspective, the 37 numbers are all that is needed.
Two Examples
I use the least-squares harmonic model to forecast tides in two places: Rimouski (Québec) and Point Atkinson (British Columbia). Manual calculations of tides in Rimouski is a classic Sail Canada exam question. Point Atkinson’s data was provided to me by Fisheries and Oceans Canada as training data (thanks!).
Fisheries and Oceans Canada shares its tidal data publicly, although only a subset of tidal gages are available and with a limited time range. They share a public API (here) from which data can be downloaded. I downloaded 5 years of hourly data for Rimouski. Although the dataset is smaller, it is recent as of the day of writing the article. The dataset provided to me for Point Atkinson has 25 years of hourly data (with gaps), but stops in 2019. The datasets are subject to the disclaimers in Appendix 1 and the Python code for data retrieval can be found in Appendix 2. A summary description of the datasets can be found in the table below.
Information | Rimouski (QC) | Atkinson Point (BC) |
Sampling frequency | Hourly | Hourly |
Observations | 35 134 | 214 216 |
Timezone for time | GMT | GMT |
Water level unit | m | m |
Rimouski, QC
The last five hundred hours of data are shown in the Figure below for the sake of illustration.

The gist of the computer code allowing for model estimation is displayed below. Line 35 define the 37 numbers discussed above. Lines 39 to 45 define the cosine and sine expressions discussed earlier. Lines 47 and 48 perform the linear regression, while line 49 displays the estimation results.

The resulting estimated model is (partially) displayed in the image below. The image emphasizes an R-squared of 0.97 (upper-right red square), meaning that the model is able to reproduce 97% of the variance contained in the data. The bottom table shows the estimated coefficients for each of the sines and cosines. Some of them show a p-value higher than five percent (e.g. centered red square), suggesting that they bring noise to the model (t statistics and confidence intervals convey the same information, although presented differently). Removing the underlying sine/cosine from the model thus improves the forecast by reducing the margin of error.

Which coefficients should be jointly removed can be obtained either by computing the joint F-statistic of all coefficients to be removed, but linear regressions are estimated so quickly that an equivalent method is to remove one coefficient at the time, re-estimate the model, and re-evaluate which coefficients are still statistically non-significant. This iterative process can be repeated until all coefficients can be statistically distinguished from zero. The resulting linear regression estimation (not shown) still shows an R-squared 0.97. The mean square error is of 0.027, that is roughly 3 centimeters.
The forecasted model is shown below for the 400 last hours of data and with the next following 100 hours that are yet to be realized (at the time of writing this).

The forecast shows that the model has more difficulties estimating neap tides (the blue and yellow lines differ). The forecasts are definitively close to the Fisheries and Oceans Forecasts, but a quick examination at tides.gc.ca shows that their forecast are also less performant during neap tides (more on this below).
Date | Model Forecast (m) | Fisheries and Oceans Canada Forecast (m) | Realized data (m) | Error (m) |
2023-11-28 0000 GMT | 1.24 | 0.8 | 1.36 | 0.12 |
2023-11-29 0000 GMT | 1.82 | 1.33 | 1.22 | 0.60 |
2023-11-30 0000 GMT | 2.27 | 1.98 | 2.04 | 0.23 |
2023-12-01 0000 GMT | 2.53 | 2.46 | 2.80 | 0.27 |
Point Atkinson, BC
Point Atkinson has mixed tide patterns. The last 500 hours of the dataset are shown below. I also present the forecasted data from the resulting model that follows the same methodology (code in Appendix below). The brevity of this section reveals perhaps the generality of the harmonic estimation: it can be applied anywhere, with little changes in methodology.


Is There Room for Improvements?
A single mathematical equation estimated through the simplest inference method can provide tidal predictions for arbitrarily long periods of time. This is an incredible feat of scientific inquiry and the result of cumulative work over centuries. The approach is incredibly helpful and gives a good idea of how tides are projected several months in advance.
Yet, an examination of the difference between the realized and forecasted values in the figures above sometimes show important differences, especially during neap tides. In some cases, the difference is up to 20 centimeters (8 inches) at the extremum. How can this be improved?
An easy improvement is to use the forecast errors of the tidal model above as a predictor of future forecast errors. Recall that tides are the result of astronomical factors, but also of local conditions. The rationale for including lagged variables would thus be rooted in the missing local conditions that an harmonic model cannot account for. Mathematically, this can be done by introducing lags in the model.
h(t) =\beta_0 +\sum_{i=1}^n\left(\cos(\alpha_it)\beta_{1i} + \sin(\alpha_it)\beta_{2i} \right) + \sum_{l=1}^L\rho_l h(t-l)
This can significantly improve the forecasting accuracy, at the cost of dramatically reducing the forecasting window. For instance, if the data from the previous week is used to forecast the error, then one can only forecast one week ahead. The typical trade-off in these types of models is then to evaluate how far ahead to forecast vs how much precision is needed.
The starting point is to look at what past values could significantly improve the current forecasting values. This is done by examining the partial autocorellation function (or PACF) on the forecasting errors. This can be done with the code snippet shown below (post-estimation, on Rimouski values):

Line 63 computes the forecast error while line 64 computes the partial autocorelation graph, which is shown below. It might not be apparent, the but 95% margin of error to reject the null hypothesis is also in the figure, very close to the horizontal axis.
This means that there is a tremendous amount of information hidden in the past forecasting errors that can be used to improve the forecasting accuracy. What obviously jumps to the eye are the forecasting power of the last three hours, shown as high spikes to the left of the graph. Those would definitely be the best predictors, but they would also condemn the model to forecast only one hour ahead, as the three previous hours would be required.

The inclusion of past lags transforms the harmonic model in an arimax model. Those models are usually estimated with maximum likelihood techniques, but can be consistently approached with least squares if the dataset is large enough.
I do so by adding the three first lagged variables of water levels, as shown in the code snippet below. Lines 69 to 74 create the lagged variables, while lines 77 to 80 re-estimate the model.

The resulting forecast is shown below. There is now almost no difference between the forecast and the real data. If the goal is only to reproduce the data, or to forecast only an hour ahead, then it is a model that is pretty hard to beat. It is however of very little help for long forecasts.

Conclusion
The fact that those 37 numbers can be used to build a decent long term model of tides is fairly impressive. Given how central it is in the NOAA references, it gives a fairly good idea of how tide tables are built. I would not be surprised if it were built-in in navigation applications.
Maybe it is a feat of the two dataset I experimented with, but the model seems to underperform during neap tides. A forecasting complement would be to include lags of water levels, as in the last section, to increase the forecasting accuracy.
There is one last aspect that may make me revisit tide predictions, which is the fact that short-run predictions done by Fisheries and Oceans Canada are over a week and with data sampled every minute. They certainly present an harmonic model, but they also present an improved short-run model, very much in the spirit of the previous section, but over a week. I yet have to figure out how they do it.
Can an harmonic model be a substitute for tide tables onboard? Maybe so. It can pack much more information than the tide tables… if the data is available. The fact that it relies solely on 37 numbers also makes it quite elegant. It may be good tool to use when there is local data available with no official forecasts. For a short-run adjustment, I would however rely on existing tidal applications with a known track record. It is just easier to use.
Acknowledgements
I am thankful to Mike Foreman for helping me finding references on tidal prediction techniques. All errors remain my own.
References
3Blue1Brown (2018). But What is a Fourier Transform? From Heat Flow to Drawing With Circles, YouTube video retrieved online in November 2023 at this address.
Admiralty Maritime Data Solutions (n.d.). Calshot Castle, England [Tide Data], dynamic webpage retrieved in Novembre 2023 at this address.
Fisheries and Oceans Canada (n.d.). Rimouski (02985) [Tidal Station Information], dynamic webpage retrieved in November 2023 at this address.
_________________________ (n.d.). API for the Integrated Water Level System, webpage retrieved online in November 2023 at this address.
Foremann, M. G. G., Cherniawsky, J. Y. and V. A. Ballantyne (2009). Versatile Harmonic Tidal Analysis: Improvements and Applications, Journal of Atmospheric and Oceanic Technology V(26), pp 806-817.
Foremann, M. G. G., Crawford, W. R. and R. F. Marsden (1995). De-Tiding: Theory and Practice, Chapter 10 of Coastal and Estuarine Studies V(47), pp 203-239.
Jeandusud.com (2023). RYA YachtMaster and Sail Canada Cruising Programs: 11 Differences, text availlable at this address.
_____________ (2023). The Rules of Twelfths and Twentieths, text availlable at this address.
Parker, B. B. (2007). Tidal Analysis and Prediction, NOAA Special Publication NOS CO-OPS 3, retrieved online in November 2023 at this address.
Waterlust (2022). How the Tides REALLY Work, YouTube video retrived in November 2023 at this address.
Wikipedia (n.d.). Fourier Series, webpage retrieved in November 2023 at this address.
________ (n.d.). Least-Square Spectral Analysis, webpage retrieved in November 2023 at this address.
________ (n.d.). Linear Regressions, webpage retrieved in November 2023 at this address.
________ (n.d.). List of Trigonometric Identities, webpage retrieved in November 2023 at this address.
________ (n.d.). Partial Auto-correlation Function, webpage retrieved in November 2023 at this address.
Appendix 1: Government of Canada Disclaimer
This product is not to be used for navigation.
This product was made by or for Jeandusud.com and contains intellectual property (Data) of the Canadian Hydrographic Service of the Department of Fisheries and Oceans.
The copyrights in the Data are and remain the property of His Majesty the King in Right of Canada and shall not be sold, licensed, leased, assigned or given to a third party.
The incorporation of the Data in this product does not constitute an endorsement or an approval of this product by the Canadian Hydrographic Service, the Department of Fisheries and Oceans or His Majesty the King in Right of Canada.
Appendix 2: Python Code
Data Retrieval
from urllib.request import urlopen
import json, time
import pandas as pd
#Rimouski
api_url_1 = 'https://api-iwls.dfo-mpo.gc.ca/api/v1/stations/5cebf1e03d0f4a073c4bbd92/data?time-series-code=wlo'
api_url_2 = '&resolution=SIXTY_MINUTES'
for some_year in range(2019, 2024):
if some_year == 2019:
month_range = range(8, 13)
elif some_year == 2023:
month_range = range(1, 12)
else:
month_range = range(1, 12)
for some_month in month_range:
if some_year < 2023 or (some_year == 2023 and some_month <= 10):
if some_month != 12:
complete_url = api_url_1 + "&from=" + str(some_year) + "-" + str("%02d" % (some_month,)) + "-" + "27T00%3A00%3A00Z" + "&to=" + str(some_year) + "-" + str("%02d" % (some_month+1,)) + "-" + "27T00%3A00%3A00Z" + api_url_2
else:
complete_url = api_url_1 + "&from=" + str(some_year) + "-" + "12" + "-" + "27T00%3A00%3A00Z" + "&to=" + str(some_year+1) + "-" + "01" + "-" + "27T00%3A00%3A00Z" + api_url_2
response = urlopen(complete_url)
tidal_data = json.loads(response.read())
if some_year == 2019 and some_month == 8:
data = pd.DataFrame(tidal_data)
else:
data_temp = pd.DataFrame(tidal_data)
data = data.append(data_temp, ignore_index=True)
time.sleep(1)
data['eventDate'] = pd.to_datetime(data['eventDate'])
data.to_csv("rimouski.csv")
Rimouski Analysis
import pandas as pd
import numpy as np
import math
from scipy.signal import argrelextrema
from datetime import datetime
from datetime import timedelta
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import statsmodels.api as sm
data = pd.read_csv('rimouski.csv')
data['eventDate'] = pd.to_datetime(data['eventDate'])
data['t'] = data['eventDate'] - data['eventDate'][0]
data['seconds'] = data['t'].apply(timedelta.total_seconds)
data['hours'] = data['seconds']/3600
original_size = data.shape[0]
fig, axs = plt.subplots(figsize=(20, 4))
axs.plot(data.loc[data.index > data.shape[0]-501, 'eventDate'], data.loc[data.index > data.shape[0]-501, 'value'])
plt.xlabel("Date")
plt.ylabel("Level (m)")
plt.show()
#Expanding dataset for forecast
exp = pd.DataFrame(pd.date_range(data['eventDate'][data.shape[0]-1], periods=101, freq='h'))
exp.columns = ['eventDate']
exp = exp[1:]
exp["hours"] = 1
exp["hours"][0] = data['hours'][data.shape[0]-1]+1
exp["hours"] = exp["hours"].cumsum()
exp["value"] = np.nan
data = data.append(exp, ignore_index=True)
period_numbers = [28.9841042, 57.9682084, 86.9523127, 115.9364169, 15.0410686, 90, 13.9430356, 16.1391017, 42.9271398, 44.0251729, 12.8542862, 30, 31.0158958, 58.9841042, 60, 14.4920521, 43.4761563, 28.4397295, 13.3986609, 15.5854433, 57.4238337, 29.5284789, 27.9682084, 0.5443747, 14.9589314, 30.0821373, 1.0158958, 1.0980331, 28.5125831, 29.4556253, 27.8953548, 13.4715145, 0.0410686, 0.0821373, 15, 30.0410667, 29.9588333]
data['b0'] = 1
variables = ['b0']
for i in range(0, len(period_numbers)):
cf = lambda t, j=i : math.cos(2*math.pi/360*t*period_numbers[j])
sf = lambda t, j=i : math.sin(2*math.pi/360*t*period_numbers[j])
data['b1' + str(i)] = data['hours'].apply(cf)
data['b2' + str(i)] = data['hours'].apply(sf)
variables.append('b1'+str(i))
variables.append('b2'+str(i))
for some_var in ['b13','b23', 'b15', 'b25', 'b18', 'b19' , 'b210', 'b214', 'b112', 'b116', 'b216', 'b221', 'b134','b235']:
variables.remove(some_var)
model = sm.OLS(data['value'][0:original_size], data[variables][0:original_size])
results = model.fit()
print(results.summary())
data['forecast'] = results.predict(data[variables])
fig, axs = plt.subplots(figsize=(20, 4))
axs.plot(data.loc[data.index > data.shape[0]-501, 'eventDate'], data.loc[data.index > data.shape[0]-501, 'value'])
axs.plot(data.loc[data.index > data.shape[0]-501, 'eventDate'], data.loc[data.index > data.shape[0]-501, 'forecast'])
plt.xlabel("Date")
plt.ylabel("Level (m)")
plt.legend(['Observed', 'Predicted'])
plt.show()
data['errors'] = data['forecast'] - data['value']
w = np.linspace(0.01, 10, 10000)
f = scipy.signal.lombscargle(data['hours'][0:original_size], data['errors'][0:original_size], w)
plt.plot(w, f)
data['value_L1'] = data['value'].shift(1)
data['value_L2'] = data['value'].shift(2)
data['value_L3'] = data['value'].shift(3)
variables.append('value_L1')
variables.append('value_L2')
variables.append('value_L3')
model = sm.OLS(data['value'][28:original_size], data[variables][28:original_size])
results = model.fit()
print(results.summary())
data['forecast2'] = results.predict(data[variables])
fig, axs = plt.subplots(figsize=(20, 4))
axs.plot(data.loc[data.index > data.shape[0]-501, 'eventDate'], data.loc[data.index > data.shape[0]-501, 'value'])
axs.plot(data.loc[data.index > data.shape[0]-501, 'eventDate'], data.loc[data.index > data.shape[0]-501, 'forecast2'])
plt.xlabel("Date")
plt.ylabel("Level (m)")
plt.legend(['Observed', 'Predicted'])
plt.show()
w = np.linspace(0.01, 10, 10000)
f = scipy.signal.lombscargle(data['hours'][0:original_size], data['errors'][0:original_size], w)
plt.plot(w, f)
#big spike at 6,24
Point Atkinson Analysis
import pandas as pd
import numpy as np
import math
from scipy.signal import argrelextrema
from datetime import datetime
from datetime import timedelta
import matplotlib.dates as mdates
import matplotlib.pyplot as plt
import statsmodels.api as sm
data = pd.read_csv('7795-1995-2019_hrly-slev.csv', sep=";")
columns = data.columns.tolist()
columns[2] = "day"
columns[3] = "hour"
columns[4] = "minute"
data.columns = columns
data["t"] = pd.to_datetime(data[columns[0:4]])
data['t'] = pd.to_datetime(data['t'])
data['delta'] = data['t'] - data['t'][0]
data['seconds'] = data['delta'].apply(timedelta.total_seconds)
data['hours'] = data['seconds']/3600
original_size = data.shape[0]
fig, axs = plt.subplots(figsize=(20, 4))
axs.plot(data.loc[data.index > data.shape[0]-501, 't'], data.loc[data.index > data.shape[0]-501, 'level'])
plt.xlabel("Date")
plt.ylabel("Level (ft)")
plt.show()
#Expanding dataset for forecast
exp = pd.DataFrame(pd.date_range(data['t'][data.shape[0]-1], periods=100, freq='h'))
exp.columns = ['t']
exp["hours"] = 1
exp["hours"][0] = data['hours'][data.shape[0]-1]+1
exp["hours"] = exp["hours"].cumsum()
exp["level"] = np.nan
data = data.append(exp, ignore_index=True)
period_numbers = [28.9841042, 57.9682084, 86.9523127, 115.9364169, 15.0410686, 90, 13.9430356, 16.1391017, 42.9271398, 44.0251729, 12.8542862, 30, 31.0158958, 58.9841042, 60, 14.4920521, 43.4761563, 28.4397295, 13.3986609, 15.5854433, 57.4238337, 29.5284789, 27.9682084, 0.5443747, 14.9589314, 30.0821373, 1.0158958, 1.0980331, 28.5125831, 29.4556253, 27.8953548, 13.4715145, 0.0410686, 0.0821373, 15, 30.0410667, 29.9588333]
data['b0'] = 1
variables = ['b0']
for i in range(0, len(period_numbers)):
cf = lambda t, j=i : math.cos(2*math.pi/360*t*period_numbers[j])
sf = lambda t, j=i : math.sin(2*math.pi/360*t*period_numbers[j])
data['b1' + str(i)] = data['hours'].apply(cf)
data['b2' + str(i)] = data['hours'].apply(sf)
variables.append('b1'+str(i))
variables.append('b2'+str(i))
model = sm.OLS(data['level'][0:original_size], data[variables][0:original_size])
results = model.fit()
print(results.summary())
for some_var in ['b13','b23', 'b15', 'b25', 'b18', 'b29' , 'b114', 'b214', 'b116', 'b216', 'b131', 'b135','b235']:
variables.remove(some_var)
data['forecast'] = results.predict(data[variables])
fig, axs = plt.subplots(figsize=(20, 4))
axs.plot(data.loc[data.index > data.shape[0]-501, 't'], data.loc[data.index > data.shape[0]-501, 'level'])
axs.plot(data.loc[data.index > data.shape[0]-501, 't'], data.loc[data.index > data.shape[0]-501, 'forecast'])
plt.xlabel("Date")
plt.ylabel("Level (m)")
plt.legend(['Observed', 'Predicted'])
plt.show()