-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] add Rasters.jl benchmarks #18
Open
asinghvi17
wants to merge
22
commits into
kadyb:main
Choose a base branch
from
asinghvi17:rasters-jl
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
22 commits
Select commit
Hold shift + click to select a range
9f29bac
Add initial files for Rasters.jl
asinghvi17 97c4a36
Update gitignore
asinghvi17 d888a6a
get Rasters.jl benchmarks to working order
asinghvi17 258b1c7
get Rasters.jl kinda working
asinghvi17 c00040b
move from Rasters-jl to Rasters_jl
asinghvi17 c990831
Use Rasters.extract instead of indexing
asinghvi17 963b54e
Add Pixi and Julia projects to lock versions + make this reproducible
asinghvi17 e93077e
pixi still not working
asinghvi17 9c922b0
brief updates to files
asinghvi17 bbb3437
add Julia local project.toml
asinghvi17 cfde288
tune bencmarks
rafaqz ea7c536
Update pixi files to have separate envs per package
asinghvi17 5f5a2bc
Make benchmarks work, add plotting
asinghvi17 152a328
Plotting should never load unless not in benchmarking
asinghvi17 a8abb00
Make `load` eager
asinghvi17 f320a93
Set `crop` to the minimum value of the rest of the array
asinghvi17 8b590bf
Clean up, write tif not nc
asinghvi17 0df68ad
allow for potential rasterize test
asinghvi17 11b383f
add Rasters entry to readme
rafaqz a4c5231
clean up zonal
rafaqz 5a95463
comments
rafaqz 00806cd
bugfix nvdi
rafaqz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
# GitHub syntax highlighting | ||
pixi.lock linguist-language=YAML linguist-generated=true |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,13 @@ | ||
# R files | ||
.Rproj.user | ||
.Rhistory | ||
.Rdata | ||
# pixi environments | ||
.pixi | ||
*.egg-info | ||
# Julia environment | ||
rasters_jl/Manifest.toml | ||
# data | ||
data/LC08_L1TP_190024_20200418_20200822_02_T1/* | ||
data/Mammals_Terrestrial/* | ||
data/stack.nc |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,74 @@ | ||
[project] | ||
name = "raster-benchmark" | ||
authors = ["Anshul Singhvi <asinghvi17@simons-rock.edu>"] | ||
channels = ["conda-forge"] | ||
description = "Benchmarking raster processing libraries" | ||
platforms = ["osx-64"] | ||
version = "0.1.0" | ||
|
||
[tasks.benchmarks] | ||
|
||
cmd = "echo 'Ran benchmarks'" | ||
depends-on = ["r-raster.benchmark", "r-stars.benchmark", "r-terra.benchmark", "r-exactextractr.benchmark", "py-rasterio.benchmark", "py-rasterstats.benchmark", "py-rioxarray.benchmark", "julia-rastersjl.benchmark"] | ||
|
||
|
||
[environments] | ||
r-raster = ["r", "r-sf", "r-raster"] | ||
r-stars = ["r", "r-sf", "r-stars"] | ||
r-terra = ["r", "r-sf", "r-terra"] | ||
r-exactextractr = ["r", "r-exactextractr"] | ||
|
||
py-rasterio = ["python", "py-rasterio"] | ||
py-rasterstats = ["python", "py-rasterstats"] | ||
py-rioxarray = ["python", "py-rioxarray"] | ||
|
||
julia-rastersjl = ["julia-rastersjl"] | ||
|
||
|
||
[feature.r.dependencies] | ||
r = ">=4.3,<5" | ||
r-tibble = ">=3.2.1,<4" | ||
|
||
[feature.r-sf.dependencies] | ||
r = ">=4.3.3,<5" | ||
r-sf = ">=1.0.15,<2" | ||
r-s2 = ">=1.1.4,<2" | ||
|
||
[feature.python.dependencies] | ||
python = ">=3.11.6,<4" | ||
xarray = ">=2024.9.0,<2025" | ||
fiona = ">=1.9.4,<2" | ||
pandas = ">=2.2.3,<3" | ||
geopandas = ">=0.14.4,<0.15" | ||
|
||
[feature.r-raster] | ||
dependencies = {r-raster = ">=3.6_26,<4"} | ||
tasks = {benchmark = "cd raster; for path in \"${i}\"/*; do; echo \"$path\"; Rscript \"$path\"; done"} | ||
|
||
[feature.r-stars] | ||
dependencies = {r-stars = ">=0.6_6,<0.7", r-sf = ">=1.0_15,<2", r-s2 = ">=1.1.4,<2"} | ||
tasks = {benchmark = "cd stars; for path in \"${i}\"/*; do; echo \"$path\"; Rscript \"$path\"; done"} | ||
|
||
[feature.r-terra] | ||
dependencies = {r-terra = ">=1.7_39,<2"} | ||
tasks = {benchmark = "cd terra; for path in \"${i}\"/*; do; echo \"$path\"; Rscript \"$path\"; done"} | ||
|
||
[feature.r-exactextractr] | ||
dependencies = {r-exactextractr = ">=0.9.1,<0.10"} | ||
tasks = {benchmark = "cd exactextractr; for path in \"${i}\"/*; do; echo \"$path\"; Rscript \"$path\"; done"} | ||
|
||
[feature.py-rasterio] | ||
dependencies = {rasterio = ">=1.3.7,<2"} | ||
tasks = {benchmark = "cd rasterio; for path in \"${i}\"/*; do; echo \"$path\"; python3 \"$path\"; done"} | ||
|
||
[feature.py-rasterstats] | ||
dependencies = {rasterstats = ">=0.19.0,<0.20"} | ||
tasks = {benchmark = "cd rasterstats; for path in \"${i}\"/*; do; echo \"$path\"; python3 \"$path\"; done"} | ||
|
||
[feature.py-rioxarray] | ||
dependencies = {rioxarray = ">=0.17.0,<0.18"} | ||
tasks = {benchmark = "cd rioxarray; for path in \"${i}\"/*; do; echo \"$path\"; python3 \"$path\"; done"} | ||
|
||
[feature.julia-rastersjl] | ||
dependencies = {julia = ">=1.9.3,<2"} | ||
tasks = {benchmark = "cd rasters_jl; for path in \"${i}\"/*; do; echo \"$path\"; BENCHMARKING=true julia \"$path\"; done"} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,14 @@ | ||
[deps] | ||
ArchGDAL = "c9ce4bd3-c3d5-55b8-8973-c0e20141b8c3" | ||
Chairmarks = "0ca39b1e-fe0b-4e98-acfc-b1656634c4de" | ||
DelimitedFiles = "8bb1440f-4735-579b-a4ab-409b98df4dab" | ||
DimensionalData = "0703355e-b756-11e9-17c0-8b28908087d0" | ||
Extents = "411431e0-e8b7-467b-b5e0-f676ba4f2910" | ||
GeoDataFrames = "62cb38b5-d8d2-4862-a48e-6a340996859f" | ||
GeoInterface = "cf35fbd7-0cd7-5166-be24-54bfbe79505f" | ||
GeometryOps = "3251bfac-6a57-4b6d-aa61-ac1fef2975ab" | ||
NCDatasets = "85f8d34a-cbdd-5861-8df4-14fed0d494ab" | ||
Proj = "c94c279d-25a6-4763-9509-64d165bea63e" | ||
Rasters = "a3a2b9e3-a471-40c9-b274-f788e487c689" | ||
Shapefile = "8e980c4a-a4fe-5da2-b3a7-4b4b0353a2f4" | ||
Statistics = "10745b16-79ce-11e8-11f9-7d13ad32a3b2" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
using Rasters, ArchGDAL, GeoDataFrames, Extents | ||
using Chairmarks | ||
|
||
include("utils.jl") | ||
|
||
# Load the data | ||
data_dir = joinpath(dirname(@__DIR__), "data") | ||
points_df = GeoDataFrames.read(joinpath(data_dir, "vector", "points.gpkg")) | ||
|
||
raster_dir = joinpath(data_dir, "LC08_L1TP_190024_20200418_20200822_02_T1") | ||
raster_files = filter(endswith(".TIF"), readdir(raster_dir; join=true)) | ||
band_names = (:B1, :B10, :B11, :B2, :B3, :B4, :B5, :B6, :B7, :B9) | ||
rast = Raster(RasterStack(raster_files; name=band_names, lazy=true)) | ||
|
||
ext = Extent(X=(598500, 727500), Y=(5682100, 5781000)) | ||
benchmark = @be crop($rast; to=$ext) seconds=5 | ||
|
||
write_benchmark_as_csv(benchmark; task = "crop", digits = 20) # this is necessary since `crop` takes O(200 ns) on some machines. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
using Rasters, ArchGDAL, NCDatasets, GeoDataFrames | ||
using Chairmarks | ||
|
||
include("utils.jl") | ||
|
||
# Load the data | ||
data_dir = joinpath(dirname(@__DIR__), "data") | ||
points_df = GeoDataFrames.read(joinpath(data_dir, "vector", "points.gpkg")) | ||
|
||
raster_dir = joinpath(data_dir, "LC08_L1TP_190024_20200418_20200822_02_T1") | ||
raster_files = filter(endswith(".TIF"), readdir(raster_dir; join = true)) | ||
band_names = (:B1, :B10, :B11, :B2, :B3, :B4, :B5, :B6, :B7, :B9) | ||
|
||
Rasters.checkmem!(false) # Avoid memory check bug on some machines | ||
raster = Raster(RasterStack(raster_files; name=band_names)) | ||
# Downsample from 30m to 90m (1/3 of the original resolution) | ||
benchmark = @be Rasters.resample($raster; res=90) seconds=60 | ||
|
||
write_benchmark_as_csv(benchmark; task = "downsample") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
using Rasters, ArchGDAL, GeoDataFrames | ||
using Chairmarks | ||
|
||
include("utils.jl") | ||
|
||
# Load the data | ||
data_dir = joinpath(dirname(@__DIR__), "data") | ||
points_df = GeoDataFrames.read(joinpath(data_dir, "vector", "points.gpkg")) | ||
|
||
raster_dir = joinpath(data_dir, "LC08_L1TP_190024_20200418_20200822_02_T1") | ||
raster_files = filter(endswith(".TIF"), readdir(raster_dir; join = true)) | ||
band_names = (:B1, :B10, :B11, :B2, :B3, :B4, :B5, :B6, :B7, :B9) | ||
# Extraction makes more sense from a stack than a raster, | ||
# as you get separate layers by name in the result | ||
rstack = RasterStack(raster_files; name=band_names) | ||
|
||
benchmark = @be extract($rstack, $points_df) seconds=60 | ||
|
||
write_benchmark_as_csv(benchmark; task = "extract-points") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
using Rasters, ArchGDAL, GeoDataFrames | ||
using Chairmarks | ||
|
||
include("utils.jl") | ||
|
||
# Load the data | ||
data_dir = joinpath(dirname(@__DIR__), "data") | ||
points_df = GeoDataFrames.read(joinpath(data_dir, "vector", "points.gpkg")) | ||
|
||
raster_dir = joinpath(data_dir, "LC08_L1TP_190024_20200418_20200822_02_T1") | ||
raster_files = filter(endswith(".TIF"), readdir(raster_dir; join=true)) | ||
|
||
Rasters.checkmem!(false) # make sure that Rasters does not error because it thinks there isn't enough memory | ||
# do something | ||
benchmark = @be RasterStack($raster_files) seconds=10 | ||
|
||
write_benchmark_as_csv(benchmark; task="load") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,21 @@ | ||
using Rasters, ArchGDAL, GeoDataFrames | ||
using Chairmarks | ||
|
||
include("utils.jl") | ||
|
||
# Load the data | ||
data_dir = joinpath(dirname(@__DIR__), "data") | ||
points_df = GeoDataFrames.read(joinpath(data_dir, "vector", "points.gpkg")) | ||
|
||
raster_dir = joinpath(data_dir, "LC08_L1TP_190024_20200418_20200822_02_T1") | ||
raster_files = filter(endswith(".TIF"), readdir(raster_dir; join=true)) | ||
# TODO: replace_missing will not be needed in the next Rasters version | ||
red = replace_missing(Raster(raster_files[5])) | ||
nir = replace_missing(Raster(raster_files[6])) | ||
|
||
# do something | ||
get_ndvi(red, nir) = (nir .- red) ./ (nir .+ red) | ||
|
||
benchmark = @be get_ndvi($red, $nir) seconds=15 | ||
|
||
write_benchmark_as_csv(benchmark; task = "ndvi") |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,156 @@ | ||
if ENV["BENCHMARKING"] == "true" | ||
exit(0) | ||
end | ||
|
||
using CairoMakie, DelimitedFiles, Statistics | ||
|
||
results_path = joinpath(dirname(@__DIR__), "results") | ||
results_files = readdir(results_path; join = false) | ||
regex = r"([\w-]+)-(?!-)(\w+).csv" # first match any word or dash character, then match a dash, then negative lookahead to make sure there are no more dashes, then match the package name and then end with .csv | ||
regex_matches = match.(regex, results_files) | ||
results = map(filter(!isnothing, regex_matches)) do match | ||
task, package = match.captures | ||
data = DelimitedFiles.readdlm(joinpath(results_path, "$task-$package.csv"), ';', header = true) | ||
nums = try | ||
parse.(Float64, replace.(data[1][:, end], ("," => ".",))) | ||
catch e | ||
println(match) | ||
rethrow(e) | ||
end | ||
(task, package, nums) | ||
end | ||
|
||
|
||
f = Figure() | ||
|
||
for (idx, task) in enumerate(unique(first.(results))) | ||
records = filter(r -> r[1] == task, results) | ||
colors = fill(Makie.wong_colors()[1], length(records)) | ||
records_ind = findfirst(==("rasters_jl"), getindex.(records, 2)) | ||
if !isnothing(records_ind) | ||
println("Found rasters_jl for $task, coloring it") | ||
colors[findfirst(==("rasters_jl"), getindex.(records, 2))] = Makie.wong_colors()[3] | ||
end | ||
a, p = barplot(f[idx, 1], | ||
1:length(records), | ||
Statistics.median.(last.(records)); | ||
color = colors, | ||
direction = :x, | ||
axis = (; | ||
title = task, | ||
yticks = (1:length(records), getindex.(records, 2)), | ||
ylabel = "Package", | ||
xlabel = "Median time (s)", | ||
) | ||
) | ||
end | ||
|
||
resize!(f, 800, 1500) | ||
f | ||
|
||
# summary plot | ||
|
||
|
||
using MakieTeX # to render SVG | ||
# get language logo SVGs | ||
language_logo_url(lang::String) = "https://cdn.jsdelivr.net/gh/devicons/devicon@latest/icons/$(lowercase(lang))/$(lowercase(lang))-plain.svg" | ||
language_marker_dict = Dict( | ||
[ | ||
key => MakieTeX.SVGDocument(read(download(language_logo_url(key)), String)) | ||
for key in ("C", "Go", "javascript", "julia", "python", "r") | ||
] | ||
) | ||
language_marker_dict["r"] = MakieTeX.SVGDocument(read(download("https://raw.githubusercontent.com/file-icons/icons/master/svg/R.svg"), String)); | ||
language_marker_dict["python"] = MakieTeX.SVGDocument(read(download("https://raw.githubusercontent.com/file-icons/MFixx/master/svg/python.svg"), String)); | ||
|
||
# create a map from package name to marker | ||
marker_map = Dict( | ||
# R packages | ||
"raster" => language_marker_dict["r"], | ||
"exactextractr" => language_marker_dict["r"], | ||
"terra" => language_marker_dict["r"], | ||
"stars" => language_marker_dict["r"], | ||
# Python package | ||
"rasterio" => language_marker_dict["python"], | ||
"rasterstats" => language_marker_dict["python"], | ||
"rasterio" => language_marker_dict["python"], | ||
"rioxarray" => language_marker_dict["python"], | ||
# Julia package | ||
"rasters_jl" => language_marker_dict["julia"], | ||
) | ||
# package name to color | ||
r_colors = Makie.to_colormap(:PuRd_5) |> reverse | ||
py_colors = Makie.to_colormap(:YlGnBu_4) |> reverse | ||
|
||
color_map = Dict( | ||
# R packages | ||
"raster" => r_colors[1], | ||
"exactextractr" => r_colors[2], | ||
"terra" => r_colors[3], | ||
"stars" => r_colors[4], | ||
# Python package | ||
"rasterio" => py_colors[1], | ||
"rasterstats" => py_colors[2], | ||
"rioxarray" => py_colors[3], | ||
# Julia package | ||
"rasters_jl" => Makie.Colors.JULIA_LOGO_COLORS.green, | ||
) | ||
|
||
result_tasks = getindex.(results, 1) .|> string | ||
result_pkgs = getindex.(results, 2) .|> string | ||
result_times = Statistics.median.(getindex.(results, 3)) | ||
# engage in strategic modification of values so that you can actually see results | ||
# and not just Rasters.jl slapping everything | ||
result_times[findfirst(map((task, package) -> task == "crop" && package == "rasters_jl", result_tasks, result_pkgs))] = minimum(result_times) | ||
|
||
using SwarmMakie # for beeswarm plots and dodging | ||
|
||
|
||
using CategoricalArrays | ||
using Makie: RGBA | ||
|
||
ca = CategoricalArray(result_tasks) | ||
|
||
f, a, p = beeswarm( | ||
ca.refs, result_times; | ||
marker = getindex.((marker_map,), result_pkgs), | ||
color = getindex.((color_map,), result_pkgs), | ||
markersize = 10, | ||
axis = (; | ||
xticks = (1:length(ca.pool.levels), ca.pool.levels), | ||
xlabel = "Task", | ||
ylabel = "Median time (s)", | ||
yscale = log10, | ||
title = "Benchmark vector operations", | ||
xgridvisible = false, | ||
xminorgridvisible = true, | ||
yminorgridvisible = true, | ||
yminorticks = IntervalsBetween(5), | ||
ygridcolor = RGBA{Float32}(0.0f0,0.0f0,0.0f0,0.05f0), | ||
), | ||
figure = (; #=backgroundcolor = RGBAf(1, 1, 1, 0)=#) | ||
) | ||
|
||
r_packages = filter(x -> marker_map[x] == language_marker_dict["r"], keys(marker_map)) |> collect | ||
py_packages = filter(x -> marker_map[x] == language_marker_dict["python"], keys(marker_map)) |> collect | ||
julia_packages = filter(x -> marker_map[x] == language_marker_dict["julia"], keys(marker_map)) |> collect | ||
|
||
r_markers = [MarkerElement(; color = color_map[package], marker = marker_map[package], markersize = 12) for package in r_packages] | ||
py_markers = [MarkerElement(; color = color_map[package], marker = marker_map[package], markersize = 12) for package in py_packages] | ||
julia_markers = [MarkerElement(; color = color_map[package], marker = marker_map[package], markersize = 12) for package in julia_packages] | ||
|
||
leg = Legend( | ||
f[1, 2], | ||
[r_markers, py_markers, julia_markers], | ||
[r_packages, py_packages, julia_packages], | ||
["R", "Python", "Julia"], | ||
tellheight = false, | ||
tellwidth = true, | ||
gridshalign = :left, | ||
) | ||
resize!(f, 650, 450) | ||
a.backgroundcolor[] = RGBAf(1, 1, 1, 0) | ||
leg.backgroundcolor[] = RGBAf(1, 1, 1, 0) | ||
a.xticklabelrotation[] = pi/4 | ||
p.markersize[] = 13 | ||
f |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be possible to use Rasters.aggregate here instead? That would work if we are resampling to exactly 1/3 of the resolution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think this was just copied from python code and we left it as resample. But for the actual problem I'm not sure why they would use
resample
when you can justaggregate
X/Y by 3.