Understanding Big Data and the Opportunities for Major Projects

Posted in : Blog Author : Hassan Emam

The first of a series of blogs for the MPA looking at the world of major projects through the extraordinary lens of Big Data.

In recent years, there has been much emphasis on utilising ‘big data’  to enhance construction projects’ productivity, predictability and management. This article looks at what that really means and what opportunities it can present for the future, including the use of new Machine Learning techniques.

What is Big Data?

The term ‘big data’ is commonly used in the industry without a full understanding of its meaning. Hence, we will explore the most cited definition:

“Big data is data that contains greater variety arriving in increasing volumes and with ever-higher velocity. This is known as the three Vs” Gartner (2001).

These ‘three Vs’ are:

  • Variety: the different types of data being used, including audio, video, text or numerical
  • Volume: the enormity of data available and the rate at which this volume is growing
  • Velocity: the speed at which increasing volumes of data can be accessed

Additionally, there have been debates as to whether the ‘three Vs’ model sufficiently defines it. Consequently, three additional properties have been cited:

  • Variability: the number of data inconsistencies resulting from different data sources and types, affecting the homogeneity of data
  • Value: whether the data is relevant to business use cases or not
  • Veracity: the levels of data quality and accuracy; and the avoidance of errors such as noise and biases in the data

Why is Big Data Important?

As we have seen from the definition, it deals with a wide variety of data formats. Using new Machine Learning techniques like Natural Language Processing (NLP) and Computer Vision (CV), unstructured data, such as images, videos, and text documents, can be transformed into a structured format. Structured data is easy to use for further modelling, including predictive and prescriptive modelling. It will also:

  • Reduce effort in collecting data
  • Allow more human time for analysis instead of reporting
  • Increase the accuracy of the data, since it is not subject to human error
  • Establish consistency of the collated data, thereby reducing the need for data cleansing

What Can Be Achieved with the Current State of Knowledge?

One example of employing CV to extract unstructured data is using object detection for health and safety. It is often used to detect compliance with personal protective equipment (PPE) safety standards, such as high visibility vests, hard hats, safety boots, etc.

Video analysis can be used for recognising human activity that subsequently allows productivity analysis of workforce and equipment. Using surveillance cameras on-site, images are transmitted to a server that is trained to recognise types of activity and report working and idle time for resources. Human Activity Recognition (HAR) can be combined with quantifying achievement using video footage from multiple cameras to calculate the works accomplished. Employing photogrammetry analysis to produce point clouds (collections of data points defined by coordinates) can then be correlated to BIM models to calculate progress based on the quantity of works completed. The combination of productive time from resources and quantity of works achieved is used to produce structured data for production control.

What are the Challenges?

The construction industry has fragmented data generated from various systems and produced in different structures that do not correlate to each other. In project controls, a well-known issue is correlating programme activities with cost estimates, as the programme data follows a Work Breakdown Structure (WBS) that is centred on pragmatic delivery methods. On the other hand, cost estimating uses Cost Breakdown Structures (CBS) with a focus on cost centres for benchmarking and budgeting purposes. The relationship between WBS and CBS can be very complex, based on the type of project. However, it is crucial to map out programme and cost for production control.

There are further challenges in applying new Machine Learning to big data in major projects, such as:

  • The diversity of skills needed for using AI (statistics, mathematics, probability, computer programming, and domain knowledge).
  • Unstructured data formats that are not usable in their original format.
  • Computers’ limited ability to interpret natural language.
  • Computers’ limited ability to understand images and videos; current algorithms require more maturity to analyse these.


Big data is currently underutilised in the construction industry. However, there is considerable potential to harness it to improve productivity. The current industry surveys have reported predictability of construction projects in terms of time and cost to be between 60% and 70% (Glenigan, 2018). The industry needs to work on removing the barriers for big data analytics implementation. The main barriers are leadership, availability of data and upskilling practitioners. The first adopters of this will gain a competitive edge by having more confidence and optimised methods that reduce cost and time.

If you’d like to better harness big data on your project, get in touch!

You can call on +44 (0)20 7404 4826 or email us at info@logikalprojects.com

Or you can use the contact us form here

About Hassan Emam

Hassan is an Associate Director and the head of the Research and Development Group at LogiKal.

Click here to contact Hassan