<- readLAScatalog("data/ENGINE/catalog/")
ctg plot(ctg)
15 LAScatalog processing engine (2/2)
Chapter 14 showed how to use the LAScatalog
processing engine to apply lidR
functions on a collection of files. This included how to process acquisitions in user defined chunk sizes, loading buffers on-the-fly, saving the output to disk, and parallelizing tasks. These examples all used existing functions provided with the lidR
package. The power of the engine goes beyond limited used cases, allowing developers to design their own applications.
Let’s imagine we want to normalize a data set and colorize it using RGB aerial data. According to the previous sections we could do something like this:
opt_output_files(ctg) <- paste0(tempdir(), "/{*}_normalized") # specify output normalized las disk save location
<- normalize_height(ctg, tin()) # normalize
ctg_norm
# loop through catalog files and apply rgb colour
<- terra::rast("rgb.tif")
rgb for (file in ctg$filename)
{<- tools::file_path_sans_ext(basename(file))
newname <- paste0("folder/", newname, "_colored.laz")
newname <- readLAS(file)
las <- merge_spatial(las, rgb)
las writeLAS(las, newname)
}
This method is however far from optimal. First, the whole point cloud is read two times:
- To normalize
- To colorize
This takes a significant amount of time. It also implies the need to create copies of the acquisition after each processing step, taking up a lot of disk storage. Then, to finish, a custom for
loop is used because merge_spatial()
cannot be applied on a LAScatalog
natively.
What if we were able to do that in a single run?
15.1 catalog_apply()
In this section we outline the catalog_apply()
function, which is used subliminally in every function seen in the Chapter 14. Here we show how to leverage its versatility to design tools that do not yet exist, like the normalization + colorization example from above, but also more innovative processes.
Again, the LAScatalog
class and the LAScatalog
engine are deeply documented in a two dedicated vignettes available here and here. The purpose of these examples is to provide alternative, hands-on documentation with more applied examples and real use cases.
In this section we will use the same 9 files used in Chapter 14 and an RGB raster that encompasses the collection of files.
The RGB image is available for download here.
<- terra::rast("data/ENGINE/catalog/rgb.tif")
rgbmap ::plotRGB(rgbmap) terra
15.2 Create user-defined function
First we need to create a function that combines the normalization + colorization steps. Let’s call it norm_color()
. It’s simple and uses only 2 functions.
<- function(las, rgbmap) # create user-defined function
norm_color
{<- normalize_height(las, tin()) # normalize
nlas <- merge_spatial(nlas, rgbmap) # colorize
colorized return(colorized) # output
}
Let’s try it on a sample plot.
<- clip_circle(ctg, 338800, 5238500, 40)
las <- norm_color(las, rgbmap) # apply user defined function
nlasrgb
plot(nlasrgb, color = "RGB", bg = "white", size = 6) # some plotting
It works! We have a workable function that performs on a point cloud. Remember that norm_color()
is just an example. You could instead choose to make an algorithm for individual tree detection or a method to sample point of interest, let your imagination run wild!
So far it works only when using a LAS
object. Now let’s make it working with a LAScatalog
.
15.3 Create an intermediate function for catalog_apply()
The simple way is:
<- catalog_map(ctg, norm_color, rgbmap = rgbmap) new_ctg
However, here we will learn the hard way in order to understand better how it works internally and understand the full potential of catalog_apply()
for which catalog_map()
is only a simplified version.
The core engine is the function catalog_apply()
but this function does not work with any user-defined function. User-defined functions must respect a specific template (take a look at the documentation in R) and is expected to perform some specific tasks. First, the primary variable must be a chunk from the catalog (see chunk
below), the function must read the chunk, check that it is not empty, then perform the computation.
A valid function is the following:
<- function(chunk, rgbmap) # user defined function
norm_color_chunk
{<- readLAS(chunk) # read the chunk
las if (is.empty(las)) return(NULL) # check if it actually contain points
<- norm_color(las, rgbmap) # apply computation of interest
nlasrgb return(nlasrgb) # output
}
This introduces an additional level of complexity that is crucial. First, catalog chunks will be sequentially fed into the user’s function. Each chunk is read inside the user-defined function with all processing options being automatically respected (filter
, select
, chunk_size
, chunk_buffer
etc.). The las
variable in the code snippet above is a point cloud extracted from the catalog that contains the chunks + a buffer. It’s important to check that the loaded portion of the collection is not empty or subsequent code will likely fail. This may happen depending on the size of the chunk and the filter
options chosen.
This function is now catalog_apply()
compatible and can be applied over the entire ALS acquisition. The output will be a point cloud, so we need to pay attention mitigate memory issues and save to disk. At this stage, we haven’t mitigated that problem yet so we add it below.
opt_output_files(ctg) <- paste0(tempdir(), "/{*}_norm_rgb") # write to disk
<- catalog_apply(ctg, norm_color_chunk, rgbmap = rgbmap) # implement user-defined function using catalog_apply
output str(output)
#> List of 9
#> $ : chr "/tmp/RtmpTQwSlm/tiles_338000_5238000_1_norm_rgb.las"
#> $ : chr "/tmp/RtmpTQwSlm/tiles_338000_5238500_1_norm_rgb.las"
#> $ : chr "/tmp/RtmpTQwSlm/tiles_338000_5239000_1_norm_rgb.las"
#> $ : chr "/tmp/RtmpTQwSlm/tiles_338500_5238000_1_norm_rgb.las"
#> $ : chr "/tmp/RtmpTQwSlm/tiles_338500_5238500_1_norm_rgb.las"
#> $ : chr "/tmp/RtmpTQwSlm/tiles_338500_5239000_1_norm_rgb.las"
#> $ : chr "/tmp/RtmpTQwSlm/tiles_339000_5238000_1_norm_rgb.las"
#> $ : chr "/tmp/RtmpTQwSlm/tiles_339000_5238500_1_norm_rgb.las"
#> $ : chr "/tmp/RtmpTQwSlm/tiles_339000_5239000_1_norm_rgb.las"
We can see that the output is a list
with the name of each file written to disk. This is the default behavior of the engine. It returns a list
with one element per chunk. This isn’t really convenient. We have seen in Chapter 14 that in the case of LAS
files we can return a LAScatalog
, which is far more convenient.
We can use the option automerge = TRUE
so catalog_apply()
will automatically merge the list into something more user-friendly.
opt_output_files(ctg) <- paste0(tempdir(), "/{*}_norm_rgb")
<- list(automerge = TRUE) # merge all the outputs
options <- catalog_apply(ctg, norm_color_chunk, rgbmap = rgbmap, .options = options)
output
output#> class : LAScatalog (v1.0 format 3)
#> extent : 338000, 339500, 5238000, 5239500 (xmin, xmax, ymin, ymax)
#> coord. ref. : WGS 84 / UTM zone 19N
#> area : 2.62 km²
#> points : 1.07 million points
#> density : 0.4 points/m²
#> density : 0.2 pulses/m²
#> num. files : 9
We now have a LAscatalog
. It is much better but:
plot(output)
We still have a problem here. The files are overlapping because we read each file with a buffer and then the outputs have been written with their buffer. This is bad practice. It’s important to always remove the buffer after the computation. In this specific case, the output is a LAS
file. When readLAS()
reads a chunk from the catalog the points in the buffer are flagged so they can be easily manipulated. Since they are flagged, we can easily remove these points by adding this step to our user-defined function. It is always the role of the user to remove the buffer of the output.
<- function(chunk, rgbmap)
norm_color_chunk
{<- readLAS(chunk)
las if (is.empty(las)) return(NULL)
<- norm_color(las, rgbmap)
nlasrgb <- filter_poi(nlasrgb, buffer == 0) # remove buffer
nlasrgb return(nlasrgb)
}
opt_output_files(ctg) <- paste0(tempdir(), "/{*}_norm_rgb")
<- list(automerge = TRUE)
options <- catalog_apply(ctg, norm_color_chunk, rgbmap = rgbmap, .options = options)
output plot(output)
When the outputs are spatial vectors or spatial rasters they can be cropped using the bounding box of the chunk accessible via extent(chunk)
or bbox(chunk)
or st_bbox(chunk)
or ext(chunk)
that match respectively with raster
, sp
, sf/stars
and terra
.
Nice! We created a custom function – norm_color()
– and we upscale its definition to capably and efficiently process an entire ALS acquisition made up of hundreds of files. Using options described in Chapter 14, we can also introduce parallelization, chunk size control, and buffer size control.
library(future)
plan(multisession)
opt_filter(ctg) <- "-keep_class 2"
opt_chunk_size(ctg) <- 300
opt_chunk_buffer(ctg) <- 40
opt_output_files(ctg) <- paste0(tempdir(), "/{*}_norm_rgb")
<- list(automerge = TRUE)
options <- catalog_apply(ctg, norm_color_chunk, rgbmap = rgbmap, .options = options) output
We can check that it worked by loading a sample from somewhere in our new catalog. According to the options put above, we are expecting to get a normalized (we normalized), colored (we merged RGB image), ground points (we used -keep_class 2
).
opt_output_files(output) <- ""
<- clip_circle(output, 338800, 5238500, 60)
las plot(las, color = "RGB", size = 6, bg = "white")
15.4 Make a user-friendly function for third party users
When designing tools that will be used by third parties, we would like to hide catalog_apply()
and intermediate functions inside a more high level and user-friendly function similarly to lidR
functions that work the same both with a LAS
or a LAScatalog
. Moreover we would like to make the function foolproof for users. To do this we can create wrap our codes if if-else
statements to apply to correct code to the correct object.
# Create a generic function
<- function(las, rgbmap)
norm_color
{if (is(las, "LAScatalog")) {
<- list(automerge = TRUE)
options <- catalog_apply(ctg, norm_color, rgbmap = rgbmap, .options = options)
output return(output)
else if (is(las, "LAScluster")) {
} <- readLAS(las)
x if (is.empty(x)) return(NULL)
<- norm_color(x, rgbmap)
nlasrgb <- filter_poi(nlasrgb, buffer == 0)
nlasrgb return(nlasrgb)
else if (is(las, "LAS")) {
} <- normalize_height(las, tin())
nlas <- merge_spatial(nlas, rgbmap)
colorized return(colorized)
else {
} stop("Not supported input")
} }
We now have a single function that can be used seamlessly on a LAS
or a LAScatalog
object.
<- norm_color(las, rgbmap)
las_norm_colored
opt_output_files(ctg) <- paste0(tempdir(), "/{*}_norm_rgb")
<- norm_color(ctg, rgbmap) ctg_norm_colored
The different steps to apply to a LAScluster
(i.e. readLAS
+ remove buffer of the output) are, in this case, very common. It is possible to use catalog_map()
instead, that is capable of doing this tasks for you. But in some cases the output may be not supported, because the cropping step is more complex. In those cases, catalog_apply()
must be used because it allows for a more accurate control.
# Create a generic function
<- function(las, rgbmap)
norm_color
{if (is(las, "LAScatalog")) {
<- catalog_map(ctg, norm_color, rgbmap = rgbmap, .options = options)
output return(output)
else if (is(las, "LAS")) {
} <- normalize_height(las, tin())
nlas <- merge_spatial(nlas, rgbmap)
colorized return(colorized)
else {
} stop("Not supported input")
} }
15.5 Make a safe function for third party users
At this stage our function is almost finished. However it is not foolproof. What if the user of this new function does not provide any output path template? The point cloud in each chunk will be loaded in memory and will be retained in memory until it eventually becomes full and R crashes. We must prevent the use of the function if no path to the disk is given.
Also, the computation of a DTM requires a buffer. If the user sets a 0 m buffer the output of this function will be incorrect. We must prevent the use of the function if a buffer of 0 is given. These two cases are covered by the engine with the options need_output_file = TRUE
and need_buffer = TRUE
. To finish we would like to disable the ability to tune the select
option to ensure no attribute is lost.
<- function(las, rgbmap)
norm_color
{if (is(las, "LAScatalog")) {
opt_select(las) <- "*" # disable select tuning
<- list(automerge = TRUE, need_output_file = TRUE, need_buffer = TRUE) # require output path & buffer size
options <- catalog_map(ctg, norm_color, rgbmap = rgbmap, .options = options)
output return(output)
else if (is(las, "LAS")) {
} <- normalize_height(las, tin())
nlas <- merge_spatial(nlas, rgbmap)
colorized return(colorized)
else {
} stop("Not supported input")
} }
We can test what happens if we use this function naively.
opt_output_files(ctg) <- ""
<- norm_color(ctg, rgbmap)
ctg_norm_colored #> Error: This function requires that the LAScatalog provides an output file template.
It failed with an informative message!
We are done. This is exactly how the lidR
package works and we have presented all the tools needed to extend it with new applications.
More examples can be found further along in this book. For example, Section 17.1.1 presents how to compute a raster of the average distance between first and last returns, and Section 17.1.2 presents how to compute a rumple index of the canopy from the point cloud and, of course, the documentation of the package itself is more comprehensive than this book.