AI message parsing

Since the industrial revolution European nations had created or were forced to share resources and infrastructures that transcended national boundaries. This resulted in the creation of common rules for maintenance and use to counter risks and challenges [1]. This fact also comes with international dependence and difficulties regarding infrastructures.
One of the most common problems of the aviation industry is flight delays [2]. In the US about one third of the delays are caused by airline internal problems and cause large costs for society. Research recommends solving the delay problem by adding capacity [3]. However not all airports have the budget or space available to make this a reality. Another idea regarding the organization of the airport is presented by comparing European airports with those from the US. The solution would require a more extensive hub-spoke system and slot coordination, something which might not be politically feasible [2].
This paper focuses on a solution in communication to make common rules in aviation more efficient and help prevent flight delays. In the aviation industry airports need to inform one another about the load of passengers and cargo on flights. This communication takes place through IATA Type B messages, which are read by automatic functioning parsers [4]. However, a large amount of these IATA Type B messages are not written according to the standard format. Hence causing problems for the current automatic parsers and giving rise to the need for a new solution. The cause for this problem is that IATA Type B requires a strong understanding of its code language or interpretation according to the standard format. Since there are no other apparent solutions, this paper explores machine learning models as possible solutions to make automatic processing a reality.
The current communication system used by airports, the French Telex messaging system, is outdated [5]. This old message standard continues to be used today. A lot has changed in the airport industry, but the standard messaging format has not. This requires employees to find new ways to convey information through an old standard, resulting in dialects that cannot be interpreted automatically. Eventually this leads to the need for messages to be manually corrected, which takes a lot of time and effort. In addition to Telex being an outdated form of communication, it is used by many airports around the world. Which means altering or updating the system is not an option at this time, since it would require for all the airports to go out of service simultaneously. 
IATA (International Air Transport Association) assigned the name Type B to the format of messages that is used within the aviation ecosystem. Type B is primarily communicated over private networks that are operated by ARINC and SITA (creator of aviation software Sitatex) [6]. IATA Type B is used to communicate LDM (Load Distribution Message) between involved airports. The LDMs are written in code language according to a standard format described in a detailed manual [4]. This paper wants to use data parsing to make the automated processing of non-compliant LDM IATA Type B messages possible.
Within the aviation industry the airport of destination must be informed by the airport of departure, about the cargo of the arriving plane through LDM messages. These LDM messages are communicated through the Sitatex software by the IATA Type B code language. One of the interviews with airport staff has found that most airports do not comply with the standard formatting of LDM messaging, which causes syntax errors for the automatic parsing function of Sitatex. All of the resulting errors have to be fixed manually by Sitatex users.
If some kind of automated processing function were able to interpret the various dialects and errors of LDM messages, the employees of airports who use Sitatex daily could save time and effort for other tasks. In order to create such a solution, a closer look at IATA Type B was taken as well as its context and current processing functions, data parsing and machine learning.

Findings

To understand the problem of IATA Type B messaging the type of problems that were found within the IATA Type B messaging system, Telex and Sitatex were explored.

From the five machine learning processes that were researched in relation to IATA Type B messaging systems the following can be concluded. KNN can be used for anomaly detection; this process could be useful because many anomalies emerge within the IATA Type B messaging. On account of the fact that there are a large number of parameters within IATA Type B messaging, KNN could be ineffective. The second machine learning system K-means can detect anomalies within IATA Type B messaging systems. This goes through recognizing anomalies within a data cluster. This will be difficult to achieve since anomalies of IATA Type B messaging systems are unpredictable, since new dialects are emerging all the time. Both KNN and K-means can recognise the common dialects in code language. The data of IATA Type B can be recognized by these models even when dealing with inconsistencies to make predictions about the kind of error that is occuring. The third machine learning system is conditional random fields. This can be effective in labeling inconsistent data. However, since the messaging systems do not have a solid grammar form through its inconsistencies CRF implementation would be difficult to be operational on the messaging system. The fourth machine learning system HMM will require a deep understanding of the LDM grammar. Every inconsistency within this grammar must be carefully selected beforehand which could result in a large model. As anomalies within grammar of the IATA Type B messaging systems are not persistent.
CRF and HMM which are association models have often been used for natural language processing (NLP) and part of speech tagging (POS tagging). Therefore association carries potential for automating part of the automated processing task out. Lastly the RFM could be useful in interpreting inconsistencies within IATA Type B messaging systems. This could result in an overview of error messages in a recognized format. The RFM can be used for regression and classification type problems. The extent in which the regression problem type can be applied to IATA Type B messages is limited to automatic calculations of the given values. However, this feat continues to be a challenge to accomplish due to the inconsistent indication of related parameters. RFM can also be used for classification type problems. Allowing it to relate non-compliant data with standard format parameters and values. This step is key for a bottom-up automated processing solution. In order to test this IATA Type B messaging systems could be organized by a proof of concept shown through Python. This concludes that 66,67% can be parsed. Despite the code being clear in its operation, problems continue to emerge with the anomalies. Overall the problem with the messaging system is that the input is complex. Without the implementation it is difficult to select a fitting automated processing method. As shown in the proof of concept with Python all automated processing models in this research are capable of parsing non-compliant IATA Type B messages in their own way. However the implementation has to be done in order to verify this.

  1. Disco, N., & Kranakis, E. (2013). Cosmopolitan Commons. Amsterdam University Press.
  2. Santos, G., & Robin, M. (2010). Determinants of delays at European airports. Transportation Research Part B: Methodological, 44(3), 392–403. https://orca.cardiff.ac.uk/10689/1/Santos%20and%20Robin%202010%20(3).pdf
  3. Ball, M., Barnhart, C., Dresner, M., Hansen, M., Neels, K., Odoni, A., Peterson, E., Sherry, L., Trani, A., & Zou, B. (2010). Total Delay Impact Study. Amsterdam University Press.
  4. (2015). LDM Specification. Retrieved from The IATA LDM Specification Manuel.
  5. Carré, P. A. (1993). From the telegraph to the telex: a history of technology, early networks and issues in France in the 19th and 20th centuries. Flux, 9(11), 17–31. https://doi.org/10.3406/flux.1993.939
Hani
Researcher
Hani Al-Ers is a researcher in the field of human-machine interactions. He completed his PhD at the Delft University of Technology at the Interactive Intelligence group of the Faculty Computer Science (EEMCS). Philips Research in Eindhoven sponsored his project which was aimed at improving the user experience of Philips tv sets. He completed 2 post-docs at the Delft University of Technology, during which he managed international consortia on topics such as an improved quality of life for the elderly. Currently, he is conducting research in the field of health and education and he leads the Research Education activities at the Dutch Innovation Factory.