14 February 2013Print This Post

My needle is still the same size but my haystack just keeps getting bigger…

Can you organise and search these in the next couple of hours, Ms Smith?

Posted by Steven England of Litigation Futures sponsor K2 Legal Support

There is much discussion on these and other pages about the problems caused by electronic data and how to manage reviewing it. Actually, the software developers have already solved many of these problems but many people are still unaware of this or unwilling to suggest contemporaneous solutions at the case management conference for fear of being embarrassed by getting it wrong.

Recently I received a call from a frustrated secretary working in the dispute resolution department of a national firm. She had been given the unenviable task of indexing around 16,000 electronic documents provided in no particular order on a portable hard drive. The documents were named by number only and, though an index was supplied, it wasn’t possible to cross-reference it with the document names, rendering it largely useless.

The proposal put to the secretary was that she open each document individually and create a meaningful index by transcribing the information contained within on to a spread sheet. If we assume optimistically that this process would have taken three minutes per document, and further that, even more optimistically, the secretary in question worked on it full time and did not get interrupted, she would have completed this task in around 450 hours – or nine weeks. At what cost to the client or the sanity of the secretary?

In this instance, as is invariably the case, there was a simple solution. The correct application of technology meant that for less than £500 I was able to provide an index using information extracted directly from the electronic files themselves.

Once the secretary has had time to review the index I supplied, she might start thinking about the next step. Some 16,000 documents is a relatively small data set but it still amounts to around 140,000 pages of material to be reviewed.

If everything were printed, this would amount to approximately 300 lever-arch files of paper; however, once again the secretary and her colleagues would be faced with the prospect of having to read all of this material. It could be made as easy as possible – by printing everything in chronological order, for example. Also the files would be labelled in such a way as to be easily identifiable, but this is only an aid. It does nothing to avoid the laborious and repetitive process of manually trawling through this coarse unfiltered material. Nor would it remove the likelihood of inadvertently reviewing the same document twice.

The alternative is to review the documents in their native electronic format, but what are the advantages of doing so?

When a document is printed:

  1. All of the embedded information that it contains is lost. This information is called metadata. Metadata (data about data) is information within a document about that document (when it was created, who created it, if there are any other documents embedded within it – e-mail attachments for example – and a whole host of other information).
  2. The ability to search the text of the document to find specific words is lost. Reviewing the document in its original native format means that you keep all of this and can use it to assist your review.

The ability to ‘search’ for keywords is only the start – with larger volumes of material it is possible to use assisted review techniques where the computer develops an ‘understanding’ of the content of documents by looking at the words in their overall context.

The idea is to look for relationships between words and groups of words within the wider context of a document and in so doing produce groups of documents that are contextually similar – albeit they may not necessarily contain exactly the same words. The documents are then ‘ranked’ according to their relevance as determined by the software, depending on the tool you chose this will either be binary (Yes/No) or sliding scale (% relevance). Documents can then be reviewed and disclosed as appropriate, but based on a smaller sample of the data.

Application of the correct technology (not just technology for technology’s sake) will not only avoid the time-consuming and unnecessary task of indexing electronic documents, but it will also enable you to become (even!) more efficient and therefore provide a better service to your clients.

Often, when faced with a choice between reviewing documents electronically or printing them, lawyers will predictably favour the option which is more expensive, less efficient and less reliable – paper.

Next time you are asked, or asking someone, to undertake any manual process involving electronic documents, please ask yourself if technology could achieve the same result faster and more cheaply through utilising metadata that already exists within the documents.