Download images from ALL stations
from July 2023 - June 2024, take the
1st day of every month at 9AM
2nd day of every month at 10AM
...
7th day of every month at 3PM
from https://secondary.mesonet.k-state.edu/hub/stationcam/
Sort ~3000+ images
into 'sunny' and 'cloudy' folders
Train model
use an SVM classifier from scikit-learn Python library
SVM = supervised machine learning algorithm
classifier used: support vector classification
used 80% of images for training
used 20% for testing
Output
model that predicts if an image is sunny or cloudy
Preprocessing
find the date images began being collected at each station
create an empty CSV with dates from the start date until the current date
create a function that inputs an image and runs it through the model
Sunny day criteria
for each day, assess images at:
if an image is unavailable or cloudy, skip to the next day
if an image is sunny at all three times, mark that day as sunny
Find the sunny days
for each station, find the sunny days from the start date until the current date using the above criteria
Output
CSV for each station containing dates and whether each day was sunny
Collect hourly data for each sunny day
find the sunrise and sunset times using the suntime Python library
download 5-minute observations at each hour between sunrise and sunset
solar radiation (W/m2)
2-meter temperature (C)
2-meter relative humidity (%)
use hours between sunrise and sunset since solar radiation is 0 at night
Calculate the expected clear sky solar radiation
calculated hourly between sunrise and sunset
latitude & longitude
day of year & time of day
hourly temperature
hourly relative humidity
equation used can be found here
Daily root mean squared error
used the following formula for each sunny day:
\( \text{RMSE} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \hat{x}_i)^2} \)
N = number of hours
i = ith hour
x = measured solor radiation
x̂ = expected clear sky solar radiation
Output
CSV for each station containing sunny dates, RMSE, SR, Max SR, Temp, & RH
Time series plot
plot each sunny day's RMSE, measured solar radiation, and expected solar radiation
Boxplots
group RMSE by season
spring: March 1 to May 31
summer: June 1 to August 31
fall: September 1 to November 30
winter: December 1 to February 28/29
label if a new sensor was added to show if changes in trends took place
Calculate the correlation between RMSE/temperature and RMSE/dew point temperature
correlation (R) strength - uses the Pearson correlation coefficient
strong: R > 0.7 or R < -0.7
moderate: 0.7 < R < 0.4 or -0.4 > R > -0.7
weak: R < 0.4 or R > -0.4
correlation confidence (p-value) - uses Wald test with t-distribution
high: p < 0.05
low: p > 0.05
Output
time series plot with RMSE, mean measured solar radiation, and mean expected solar radiation of sunny days
box plots of RMSE grouped by season
scatter plots showing the y=mx+b line of best fit, strength, and confidence of the correlation between measured/expected SR, RMSE/temp, and RMSE/dew point
Load images into Python
from image start archival, load 1PM image from each day
from https://secondary.mesonet.k-state.edu/hub/stationcam/
Find green pixels
convert image from RGB to HSV
define green range based on Pillow library's HSV setup
upper bound = [92,255,255]
lower bound = [29,16,40]
Calculation
relative percent greenness
count the number of pixels within that green range
divide by the total number of pixels in that image
multiply by 100 to get % greenness
to get relative % greenness, divide every day's value by the day with the maximum % greenness, multiply by 100
Output
CSV for each station containing the relative % greenness at 1PM every day
Download data
from the date images began being collected at each station to the present
from https://mesonet.k-state.edu/rest/wimsstationdata...
get the daily 1PM 'HERB_GSI' value
Output
CSV for each station containing daily 1PM growing season index
Time series plot
plot the daily 1PM GSI and relative % greenness over time
Calculate the correlation between GSI and relative % greenness
correlation (R) strength - uses the Pearson correlation coefficient
strong: R > 0.7 or R < -0.7
moderate: 0.7 < R < 0.4 or -0.4 > R > -0.7
weak: R < 0.4 or R > -0.4
correlation confidence (p-value) - uses Wald test with t-distribution
high: p < 0.05
low: p > 0.05
Output
scatter plot showing the y=mx+b line of best fit, strength, and confidence of the correlation