Skip to the content.

Exercise 1

Part C: Prepare your data – extracting what you need

Step-by-step instructions for this part of the exercise can be found here.

For the purposes of our exercise, we are primarily interested in the foundations of the buildings at a minimum, as we wish to understand what it would be like to navigate the built space. We also decided that we want to work with meshes rather than point clouds because, while point clouds are aesthetically pleasing and technically the most accurate part of a 3D dataset, point clouds are not the traditional basic unit of construction used in 3D modelling; instead, we will be working with meshes. Like the SfM photogrammetry models you viewed in the previous step, a mesh is formed when triangular polygons are generated between the points of a point cloud until a ‘solid’ mesh is created. The point cloud becomes the vertices of the mesh. This means that, to progress further, we need to translate from a point cloud to a mesh. Conceptually, this means we need to get our point cloud to only include points we want forming the meshed surface. Meshing is the software essentially playing connect-the-dots to draw a picture. If you include dots (laser scan points) that you don’t want to have be part of the picture the software is drawing, it will include them anyway and your picture will look pretty strange.

On the left, a series of 'points' - on the right, the dots have been connected to form a meaningful 'mesh'.

For most projects, you won’t need or want everything in the point cloud, as we noted above. Begin to prepare your data by getting rid of things you don’t need and keeping those that you do need. When working with large datasets, we recommend running through these preparation steps with a small sample area the first time through to make sure the results are satisfactory. Once you’re happy with the process, you would go back and run it on the whole dataset. We’ll follow this ‘test, assess, deploy’ strategy throughout this exercise.

Step 1: Segment your test region

A screenshot of slice 1 of the Malthi dataset visualised with EDL shader and height values as a scalar field. This will be the target area for sampling, as it has grass, trees, and structures in a small area.

A screenshot of the Slice 1 of the Malthi dataset. A faint green box generated by the segment tool is visible around the data in the middle of the screen.

Selecting 'Segment in' (the button surrounded by a large red square in this image), a preview of your selected test dataset is generated. Clicking the green tick button (three buttons to the left of the 'Segment in' button) will approve this division and put the rest of the data on a different layer.

Step 2: Set your Level of Detail

What spatial resolution is needed for your project? What is manageable? This may be a compromise! Reduce the resolution to the level of detail you need. Keep in mind that the total number of points in the Malthi dataset exceeds 650 million points, Profile Slice 1 contains over 121 million points, and the segment generated in the image above contains about 39 million points. The CloudCompare meshing algorithm will generate twice as many mesh faces, and, often, meshes of 10 million faces are difficult to handle on mid-range computers. This segment is still quite a substantial dataset! Our next option to thin out the point cloud is to subsample it. This will decrease the number of points in your point cloud based on different algorithms that are available.

Step 3: Prepare your data – remove noise and unwanted data points

The next step is to clean your point cloud to remove the data that represents objects / items you don’t want in your reconstruction. There is likely a great deal of data included that you do not want incorporated into the mesh – in this case, the patches of grass and the trees growing on the site have been preserved in the dataset. Refer back to what we said about meshes and connecting the dots. To visualise creating a mesh, imagine that you have a blanket and that there is a wall in front of you, clear of any vegetation on either side of the wall. In this case, it would be easy to drape the blanket over the wall and still identify the shape of the wall. If, on the other hand, there is grass and moss growing on top of the wall, a tree growing close to the wall, and tall grass on either side, draping your blanket across the wall may result in a lumpy ‘mesh’ and make it difficult to identify the detail of the wall.

A wall clear of most vegetation is easy to cover with a blanket, simulating the accurate creation of a 3D mesh of the wall.

A wall with significantly more vegetation surrounding it, which does not allow the blanket to lie flat, indicating that this would affect the 3D mesh of such a structure.

Think and Respond: Why is it essential to remove the vegetation from the point cloud, in this case?

Now let’s think about the character of what you are trying to remove. How different is the shape of a tuft of grass compared to a smooth stone wall? How does their difference in shape affect the organisation of the points comprising these features in the point cloud – how are the points situated in relation to one another for each feature? If the point cloud for a tuft of grass is organised quite differently from the point cloud of a stone wall, you may be able to ‘teach’ the software to automatically tell the difference between these sets of points. If they are quite similar you will have to take a more manual approach (see Brodu and Lague 2012 and CloudCompare documentation on the CANUPO plugin).

Think and Respond: Can you think of other features that you may want to separate in other hypothetical laser scan datasets? Are they quite similar to each other in terms of shape, or quite different? How would you describe these similarities/differences? What would the benefits be in separating these features?

Lucky for us, vegetation and walls usually look fairly different in laser scanned data, so we can attempt some automated cleaning.

There are a few automated tools to remove these unwanted features in CloudCompare.

Image of the subsampled Malthi dataset with multiple samples chosen to represent grass and stone.

A single pebble; when considering it alone, it appears relatively smooth.

A pebble beach, which appears quite rough because of the scale at which we are viewing it.

Try it yourself! When selecting segments for your classifiers, was it easier to isolate samples before or after calculating roughness? Can you foresee any issues that may arise when applying your classifier to the rest of the dataset?

A screenshot of the Grass and Stone samples isolated from the rest of the mesh. The 'merge multiple clouds' button is highlighted with a red box.

The dialog box displaying the CANUPO classifer - the purple line represents the statistical difference between the grass and wall point clouds.

Try it yourself! How successful was your classifier at separating the point cloud into vegetation vs non-vegetation geometry?

A visualisation of how CANUPO has classified our Malthi dataset.

As you can see above, sometimes the CANUPO plugin will not catch all of the vegetation – you can also use ‘Roughness’ to remove additional vegetation.

The 'Display Params' histogram has been used to isolate the vegetation from the Malthi data.

Below is an example of the potential results after removing vegetation via CANUPO and using Roughness to remove more vegetation.

The Malthi dataset after using roughness to identify CANUPO samples, creating a CANUPO classifier and removing the vegetation, then applying roughness again to remove further vegetation.

Once you identify your preferred workflow, take special note of this, not only for recording in your paradata, but also because you will need to apply a similar process to the rest of the Malthi dataset to create a full reconstruction of the site.

It is possible that, as you apply the CANUPO classifier from your first dataset to other subsections of the laser scan data, you may find that the classifier wrongly classifies certain features. In the case illustrated below, the CANUPO classifier was defined the northern edge of the site, which is comprised of more open areas and fewer walls than this new subsection, which comes from the centre of Malthi, where structural elements are more densely concentrated.

An area of wall from the centre of Malthi (surrounded by a yellow rectangle) has been misidentified as 'vegetation'.

In the image above, part of a wall has been misidentified as vegetation (surrounded by a yellow box; the red mesh represents geometry classified as non-vegetation features, while the blue mesh represents geometry classified as vegetation). We can be certain of this by comparing this subsampled point cloud to the layer containing the full resolution of the original laser scan data for your selection (see image below). If this is the case, you may either want to redefine your CANUPO classifier, or, if it only affects a small proportion of the site, simply remove the incorrectly classified data and append it to the non-vegetation point cloud.

The original laser scan data imposed over the sub-sampled and classified data, showing that the points classified as 'vegetation' in the previous image were in fact part of the wall.

After this stage, there will still be some stray points that need to be removed. Some of this may be ‘noise’, or isolated points that appear in scan data but do not correlate to a physical feature in the scene. Vegetation, as we have removed in previous steps, is not the same as noise and would not be removed with the ‘SOR Filter’ or the ‘Noise Filter’. Both of these filters operate under the nearest neighbour statistical principle to identify whether a point is a significant distance from the rest of the dataset based on its distance from the nearest three (or more) points. The Noise filter uses this method to calculate a plane of ‘best-fit’ and removes points that deviate too far from that plane, while the SOR filter rejects points that are determined to be too far from another point based on a number of standard deviations set by the user.

As mentioned above, this will not remove all of the points we may want to remove from the point cloud. The vegetation from the trees may need to be removed – either through utilising the CANUPO system again to create a new classifier, or by removing them manually.

The final cleaned subsection of the Malthi dataset.

Once you have finished cleaning the point cloud, you should be able to see the outline of the structures much more clearly. Now we need to create a 3D mesh from the point cloud data.

Note that in a real project, now you would save this, go back and treat the rest of your data – you would reuse the pre-trained classifier to remove vegetation from other tiles and reuse your choices for resolution to remove noise. For this exercise, we will continue to the next step.

Step 4: Prepare your data – calculate normals

Before we can generate a mesh from your laser scan data, the next step is to calculate the normals of your points. Normals indicate the directional trend of the surface of your data – not only are they key to building a mesh, as the software needs to know which side of the point cloud is the exterior-facing ‘surface’, but normals also define how ‘light’ interacts with the mesh and affects shading. With a point cloud, because a singular point is not a surface, the software generates a temporary ‘TIN’ mesh (triangulated irregular network) with the points closest to each other; this is used to identify the direction perpendicular to the surface of the TIN mesh, which is recorded as the ‘normal’ of the point.

Prepare your data – create a mesh

While there are a number of software packages available that can produce a mesh from point cloud data, we will achieve the best result by producing this in CloudCompare. You can also do this in MeshLab, but we’ll work in CloudCompare.

Step 5: Plan your Meshes

Think and Respond: To keep our datasets manageable, we will need to subdivide the Malthi dataset into smaller chunks for processing, but we will reassemble the site in the reconstruction later. Decide what you will want to have available as a discrete object. Do you want to have individual walls? Walls and surrounding ground? How will you separate buildings that share walls? Whole buildings or sections of the site? Use the published plans of the site layout to plan how these features can be divided. Crop your data appropriately using the segment tool. (Do note that your meshes will maintain their local spatial coordinates, even when exported into different software, so the spatial relationship between two walls that have been saved as different meshes will remain).

Now, make a mesh for each element. Following our ‘test, assess, deploy’ method, pick one or two to work with initially. There is a certain amount of experimentation that is involved in identifying the ideal settings to use when meshing a laser scan, as some will depend on the processing capabilities of your computer.

What is a water-tight mesh? This is a mesh that has no ‘holes’, and so is comprised of a single closed surface. There are a number of reasons that a mesh would need to be water-tight: to avoid complications in 3D printing, to allow for certain analyses (like calculating the volume of an object), or to avoid graphical glitches. This is a point of concern in many disciplines that work with 3D objects, so it is not surprising that CloudCompare’s meshing algorithm aims to produce a water-tight mesh.

The cluster of structures from our original point cloud have been meshed in this image.

Despite removing the global coordinates, the meshes produced from the Malthi laser scan data will still retain their spatial relationships to each other when exported (as long as you have ensured that the same ‘global shift’ numbers have been applied when importing each dataset). Sub-setting or tiling your dataset is a standard approach to landscape-scale or site-scale 3D datasets. The process does require planning, however, so ensure that your tiled datasets are clearly labelled and correlate to either a digital or physical plan of the site. This will help you catch any errors in orientation or coordinates that might occur in later steps. Below is an example of how the dataset can be divided into tiles (labelled A1-F3). Be sure that these are named in a planned, sensible way to make it easier to piece together in later Exercises.

An example of how you can subdivide the Malthi site dataset - ensure that you name these tiles in sensible ways to make piecing it together easy in later exercises!

n.b. In a real project, you would now go back and deploy this routine over all your point clouds you want to convert to meshes.

Think and Respond: By the end of this process, you should have a mesh of all or part of the site of Malthi. Take a closer look at the mesh – is this a sufficient level of detail for what you will need to reconstruct the site? What essential parts are present, if the aim of the reconstruction is to get a sense of how the site would be navigated? Which essential parts are missing? Are there any stray bits of mesh that need to be cleaned up/removed?

Step 6: Export your data

Ensure that, at a minimum, you have saved your completed mesh(es) and their associated point cloud as a backup .bin file in case you ever wish to make adjustments. As noted above, you may want to use the map from the preliminary Malthi report to record the names you gave subsets of the data.

To export your data, all you need to do is highlight the mesh you wish to export in the DB Tree and hit the ‘Save’ icon. The dialog box will provide a number of file format options, but we need to consider which will be accepted by the software we will use in later steps. Most 3D modelling software packages will work with .obj, .ply, .stl, and .fbx file types. If we had colour or texture data to export, this may have impacted our decision (.stl files will not save colour or textures, .ply files can only maintain a single texture file, .obj files can occasionally become disassociated, as we saw with the photogrammetric models before, and .fbx files will usually save colour, normals, and textures because they are often used in animation projects). However, our biggest concern at the moment will be the file size of the object, as we will be assembling a substantial number of models in a single file.

Try saving one of your meshes in the .obj, .ply, .stl, and .fbx file types. Take note of the size of each file in the file explorer window. Which is the smallest file size? However, we also need to take archival standards into account – generally, digital archives will prefer either .obj or .stl formats as they are simple formats and well-established, so almost all 3D software will accommodate them. Though we will likely be working with .fbx files in later exercises, we’ll need .obj files to do some additional processing before we can piece our reconstruction together. So, be sure to export your meshes as .obj files twice – once so that we have a pristine copy for the later archival stages of your project, and one to work with in other software through the next steps. Consistency is key, as sometimes different file types will import into software in different ways (.obj files will import into Blender rotated at a 90˚ angle, for instance, while .ply files will not rotate them).

The files generated by exporting one of each file type from tile B2. The FBX file is 127 MB, the OBJ is 692 MB, the PLY is 222 MB, and the STL is 412MB.

Once you have saved your files with sensible names in formats suited to their intended use, be sure that your data is organised in a clear way. If you have files that you intend to use in the reconstruction, save them in a folder that indicates that purpose. If you have files that have been saved for the archive, put these in a separate folder. Often, moving files into new folders after they have been used in other software (like Blender) will cause directory issues, so it’s best to keep them in one place for the course of the project.

Well done! You now have the shapes of the walls, and they are georeferenced. This is what you would need for a reconstruction that helps you to think through movement around the site.

In addition to the information from the archive, there are a number of other resources with which you may want to familiarise yourself.

What other information is available?

Think and Respond: At this point, you should be able to fill out a good deal of your Design Document, including the Project Brief, Overview, Project Aims, many of the Resources, and some of the Constraints. Review your previous ‘Read and Respond’ answers for ideas!

Resources for Exercise 1:

Directly cited:

Further reading:

Back to Exercise 1 Part B To Exercise 2