Discrete return ALS sensors record various types of data. Primarily, they capture positional data in three dimensions (X, Y, Z), followed by additional information like the intensity for each point, the position of each point in the return sequence, and the beam incidence angle of each point. Reading, writing, and efficient storage of these ALS data are critical steps prior to any subsequent analysis.
ALS data are most commonly distributed in LAS format, which is specifically designed to store ALS data in a standardized way. These data are officially documented and maintained by the ASPRS. However, LAS files require a large amount of memory because they are not compressed. The LAZ format has become the standard compression scheme because it is free and open-source.
The widespread use, standardization, and open-source nature of the LAS and LAZ formats promoted the development of the lidR package. This package is designed to process LAS and LAZ files both as input and output, leveraging the LASlib and LASzip C++ libraries via the rlas package.
The function readLAS() reads a LAS or LAZ file and returns an object of class LAS. The LAS formal class is documented in detail in a dedicated vignette. To briefly summarize, a LAS file consists of two parts:
The header, which stores summary information about its content, including the bounding box of the file, coordinate reference system, and point format.
The payload, i.e., the point cloud itself.
The function readLAS() reads and creates an object that contains both the header and the payload.
las <-readLAS("files.las")
When printed it displays a summary of its content.
print(las)
class : LAS (v1.2 format 1)
memory : 4.4 Mb
extent : 684766.4, 684993.3, 5017773, 5018007 (xmin, xmax, ymin, ymax)
coord. ref. : NAD83 / UTM zone 17N
area : 51572 m²
points : 81.6 thousand points
type : airborne
density : 1.58 points/m²
density : 1.08 pulses/m²
For a more in-depth print out of the data use the function summary() instead of print().
Parameter select
A LAS file stores the X Y Z coordinates of each point as well as many other data such as intensity, incidence angle, and return sequence position. These data are called attributes. In practice, many attributes are not actually useful but are loaded by default. This can consume a lot of processing memory because R does not allow for choosing data storage modes (see this vignette for more details).
To save memory, readLAS() can take an optional parameter select, which enables the user to selectively load the attributes of interest. For example, one can choose to load only the X Y Z attributes.
las <-readLAS("file.las", select ="xyz") # load XYZ onlylas <-readLAS("file.las", select ="xyzi") # load XYZ and intensity only
Examples of other attribute abbreviations are: t - gpstime, a - scan angle, n - number of returns, r - return number, c - classification, s - synthetic flag, k - keypoint flag, w - withheld flag, o - overlap flag (format 6+), u - user data, p - point source ID, e - edge of flight line flag, d - direction of scan flag
Parameter filter
While select enables the user to choose “columns” (or attributes) while reading files, filter allows selection of “rows” (or points) during the reading process. Removing superfluous data at read time saves memory and increases computation speed. For example, it’s common practice in forestry to process only the first returns.
las <-readLAS("file.las", filter ="-keep_first") # Read only first returns
It is important to understand that the filter option in readLAS() keeps or discards points at read time, i.e., while reading at the C++ level, without involving any R code. For example, the R function filter_poi() may return the same output as the filter option in readLAS():
In the example above, we are (1) reading only the first returns or (2) reading all the points and then filtering the first returns in R. Both outputs are strictly identical, but the first method is faster and more memory-efficient because it doesn’t load the entire file into R and avoids using extra processing memory. It should always be preferred when possible. Multiple filter commands can be used simultaneously to, for example, read only the first returns between 5 and 50 meters.
las <-readLAS("file.las", filter ="-keep_first -drop_z_below 5 -drop_z_above 50")
The full list of available commands can be obtained by using readLAS(filter = "-help"). Users of LAStools may recognize these commands, as both LAStools and lidR use the same libraries (LASlib and LASzip) to read and write LAS and LAZ files.
2.2 Validating LiDAR Data
An important first step in ALS data processing is ensuring that your data is complete and valid according to the ASPRS LAS specifications. Users commonly report bugs arising from invalid data. This is why we introduced the las_check() function to perform a thorough inspection of LAS objects. This function checks whether a LAS object meets the ASPRS LAS specifications and whether it is valid for processing, providing warnings if it does not.
A common issue is that a LAS file contains duplicate points. This can lead to problems such as trees being detected twice, invalid metrics, or errors in DTM generation. We may also encounter invalid return numbers, incoherent return numbers and number of returns attributes, and invalid coordinate reference systems, among other issues. Always make sure to run the las_check() function before delving deeply into your data.
las_check(las)#> Checking the data#> - Checking coordinates... ✓#> - Checking coordinates type... ✓#> - Checking coordinates range... ✓#> - Checking coordinates quantization... ✓#> - Checking attributes type... ✓#> - Checking ReturnNumber validity...#> ⚠ Invalid data: 1 points with a return number equal to 0 found.#> [...]
A check is performed at read time regardless, but the read time check is not as thorough as las_check() for computation time reasons. For example duplicated points are not checked at read time.
las <-readLAS("data/chap1/corrupted.laz")
Warning: Invalid data: 174638 points with a 'return number' greater than the
'number of returns'.
2.3 Reading COPC and Remote COPC Files
COPC files are standard LAZ files with special metadata and a specific point ordering that enables efficient partial reads across a network. COPC stands for Cloud Optimized Point Cloud. Usually, their extension is .copc.laz.
Using rlas (v >= 1.9.0), lidR can read any file processed over a network, but it is preferable to work with COPC files. Everything in this book applies to remote files, but COPC files offer functionality that regular LAS and LAZ files do not. In a COPC file, it is possible to perform depth queries, i.e., reading only a subset of the point cloud at a given resolution.
In the following code snippet, we are reading “depth 0” of a COPC file via a URL, which provides the coarsest resolution. Despite the original file being large, this is virtually instantaneous because we are only downloading 60,000 points out of millions.
We can load more points using, for example, filter = "-max_depth 1", but it will take more time to download the points as we increase the number of points to be read. You can find more about the COPC format at https://copc.io/.
Since COPC is a natively spatially indexed point cloud, spatial queries with functions like clip_roi() (not presented in this book) are natively fast both locally and remotely. It is also possible to use readLAScatalog() on remote files (see Chapter Chapter 15).