Operations Automation Using Spreadsheets
As we automate company operations, we often encounter the need to process MS Word and Excel files. This is quite surprising, since this way of exchanging information has been around for many years and something new is expected to come to replace it. But still, it is one of the most popular methods to convey information to the other party. Of course, there are still such alternative formats as csv and pdf, which are just as suitable for this, and maybe even better. Recall that these documents (DOC, XLS, PPT) currently follow the Office Open XML format. But it was not always so. Initially, Microsoft used the proprietary binary format of Microsoft Office 97-2003.
But somewhere at the beginning of the century in many products of the software giant there was a movement to free standards. This was probably a far-reaching plan to capture all areas of interest. We also see this on the initiative of standardization and transition to open source of .Net technologies. In 2008, this standard was finally agreed and adopted jointly with ECMA. It was not an easy process that took several years of negotiations, approvals and appeals.
At the moment, this is a very common format for documents. It works with both free and proprietary software. The standard can be used by other vendors without restrictions. Popular editors such as Google Docs, Apache OpenOffice, OpenOffice.org, Apple QuickLook, LibreOffice, and others can work with this format. From the developer’s point of view, it is a zip archive with a fixed structure. Inside it are a set of xml documents, graphics and other elements.
Our developers were faced with the task of providing the ability to create different document templates for each counterparty. But all of them somehow had to accept the same type and structure of the input data. Therefore, it was decided to develop a separate tool for creating document templates. Moreover, it has a special syntax, now everyone who is familiar with it, for example, business analysts, can now use it.
To work with these formats in the .net platform, the Open XML SDK library can be used. If we talk about the nuances of use from the point of view of an engineer, then this is a rather complex format with a large number of nested structures. Not surprising, because the complete documentation for the format is over 7,000 pages long. Therefore, it takes quite a lot of time to master it. Of course, there are many guides and articles on the Internet.
Let’s take an example from practice. Let’s say we need to add a new element to the style collection. After performing this operation, you must remember to update the element counter in the collection. It’s very strange, isn’t it? It was also a discovery that it doesn’t directly support pasting HTML text. Of course, there are workarounds in the form of using the RichText format, and converting HTML to it. But there are also difficulties with this, since there are not many such libraries and they contain a lot of bugs. This was an example of one of the tasks that our systems support. But since it is already used in several projects, this module has been moved to a separate system. If necessary, it can be easily converted into a micro-service.
Let’s summarize. At the moment, the Office Open XML format is supported by many editors for reading and exporting, but few develop software based on it. Other developers prefer open alternatives like OpenDocument because they don’t want to depend on Microsoft in the future.