Data Workflow

Download images from ALL stations

from July 2023 - June 2024, take the

1st day of every month at 9AM

2nd day of every month at 10AM

...

7th day of every month at 3PM

from https://secondary.mesonet.k-state.edu/hub/stationcam/

Sort ~3000+ images

into 'sunny' and 'cloudy' folders

Train model

use an SVM classifier from scikit-learn Python library

SVM = supervised machine learning algorithm

classifier used: support vector classification

used 80% of images for training

used 20% for testing

Output

model that predicts if an image is sunny or cloudy

Preprocessing

find the date images began being collected at each station

create an empty CSV with dates from the start date until the current date

create a function that inputs an image and runs it through the model

Sunny day criteria

for each day, assess images at:

9AM

12 PM

3 PM

if an image is unavailable or cloudy, skip to the next day

if an image is sunny at all three times, mark that day as sunny

Find the sunny days

for each station, find the sunny days from the start date until the current date using the above criteria

Output

CSV for each station containing dates and whether each day was sunny

Collect hourly data for each sunny day

find the sunrise and sunset times using the suntime Python library

download 5-minute observations at each hour between sunrise and sunset

solar radiation (W/m²)

2-meter temperature (C)

2-meter relative humidity (%)

use hours between sunrise and sunset since solar radiation is 0 at night

Calculate the expected clear sky solar radiation

calculated hourly between sunrise and sunset

latitude & longitude

day of year & time of day

hourly temperature

hourly relative humidity

equation used can be found here

Daily root mean squared error

used the following formula for each sunny day:

\( \text{RMSE} = \sqrt{\frac{1}{N} \sum_{i=1}^{N} (x_i - \hat{x}_i)^2} \)

N = number of hours

i = ith hour

x = measured solor radiation

x̂ = expected clear sky solar radiation

Output

CSV for each station containing sunny dates, RMSE, SR, Max SR, Temp, & RH

Time series plot

plot each sunny day's RMSE, measured solar radiation, and expected solar radiation

Boxplots

group RMSE by season

spring: March 1 to May 31

summer: June 1 to August 31

fall: September 1 to November 30

winter: December 1 to February 28/29

label if a new sensor was added to show if changes in trends took place

Calculate the correlation between RMSE/temperature and RMSE/dew point temperature

correlation (R) strength - uses the Pearson correlation coefficient

strong: R > 0.7 or R < -0.7

moderate: 0.7 < R < 0.4 or -0.4 > R > -0.7

weak: R < 0.4 or R > -0.4

correlation confidence (p-value) - uses Wald test with t-distribution

high: p < 0.05

low: p > 0.05

Output

time series plot with RMSE, mean measured solar radiation, and mean expected solar radiation of sunny days

box plots of RMSE grouped by season

scatter plots showing the y=mx+b line of best fit, strength, and confidence of the correlation between measured/expected SR, RMSE/temp, and RMSE/dew point

Load images into Python

from image start archival, load 1PM image from each day

from https://secondary.mesonet.k-state.edu/hub/stationcam/

Find green pixels

convert image from RGB to HSV

define green range based on Pillow library's HSV setup

upper bound = [92,255,255]

lower bound = [29,16,40]

Calculation

relative percent greenness

count the number of pixels within that green range

divide by the total number of pixels in that image

multiply by 100 to get % greenness

to get relative % greenness, divide every day's value by the day with the maximum % greenness, multiply by 100

Output

CSV for each station containing the relative % greenness at 1PM every day

Download data

from the date images began being collected at each station to the present

from https://mesonet.k-state.edu/rest/wimsstationdata...

get the daily 1PM 'HERB_GSI' value

Output

CSV for each station containing daily 1PM growing season index

Time series plot

plot the daily 1PM GSI and relative % greenness over time

Calculate the correlation between GSI and relative % greenness

correlation (R) strength - uses the Pearson correlation coefficient

strong: R > 0.7 or R < -0.7

moderate: 0.7 < R < 0.4 or -0.4 > R > -0.7

weak: R < 0.4 or R > -0.4

correlation confidence (p-value) - uses Wald test with t-distribution

high: p < 0.05

low: p > 0.05

Output

scatter plot showing the y=mx+b line of best fit, strength, and confidence of the correlation