Most people have heard of terms such as ‘Data Mining’ and ‘Machine Learning’. However, few people have an idea of the power such techniques hold and the impact that they have on our daily lives. Some might know that email providers use machine learning to filter spam. Or that applications such as Siri, Alexa, and Cortana have a basis in machine learning. However, machine learning can do much more than those two examples. To better understand what technologies such as Non-Obvious Relationship Awareness (NORA) could do, an understanding of data mining and machine learning is useful. Put simply, data mining is carried out by a person who is looking for a pattern in a specific dataset in a specific situation. Whereas machine learning makes use of a computer to predict individual outcomes using a dataset to create an algorithm. [1] In other words, you could say that data mining explains the patterns and machine learning predicts with models. For further information on the basics of machine learning the following (Dutch) article is available on IThappens: http://www.ithappens.nu/toepassingen-van-machine-learning/. To further illustrate the capabilities of such technologies, we will explore the NORA technology in this article.

What is Non-Obvious Relationship Awareness?

According to Merriam Webster, ‘non-obvious’ can be defined as follows: “not easily discovered, seen, or understood”. [2] Within the context of NORA this essentially means discovering relationships between disparate data types and data locations. Technologies such as NORA are data mining technologies using real-time analysis of data and distributed data mining to uncover ‘non-obvious’ relationships. For example, NORA could tell you whether your childhood neighbour’s boss frequents the same bar as a notorious criminal. And NORA goes through this process whenever new data is available.

Though the basis of NORA was first developed by Jeff Jonas in 1983 to identify credit fraud, its rise to fame can be found in early 90’s casinos to protect them from unknowingly doing business with criminals. Building on the capabilities NORA had shown in the gaming industry, the CIA invested in NORA to be further developed. Where NORA could help improve national security by discovering threats of a terroristic nature. Later in 2005 it was acquired by IBM. [3]

How does NORA work?

NORA - IThappens
Figure 1: conceptual architecture of the NORA system [4] [5]

Figure 1 provides a conceptual representation of how NORA works. In order for technologies such as NORA to work effectively, a certain degree of data integrity is imperative. This consists of:

  • Name standardisation – transposing nicknames to their root name (such as Rich, Richie, and Rick to Richard);
  • Address hygiene – correcting typos and abbreviations (Rd. to Road or NYC to New York City);
  • Date quality – formatting standards applied and value validation;
  • Data enhancement – adding additional data to what is already available; and
  • Entity resolution – determining whether or not two or more apparently identical individuals are indeed one and the same.

Making use of historical data is important for NORA’s effectiveness. This includes retaining every name, phone number, driver’s license, credit card, frequent flyer number, address, etc. a person has used. Furthermore, relationships are also recorded and explored. For example, on one address two ‘John Smiths’ are registered. Based on the date of birth, NORA is able to determine that these are two distinct individuals (junior & senior). However, using person resolution NORA discovers that only the month and day are transposed, thus those two ‘John Smits’ are actually one and the same. [3]

One of the defining features of technologies such as NORA is the continues assessment of new information. Where standard queries run at intervals, NORA is continuously scanning and assessing new information. This is one of its greatest strengths as terrorist attacks are time-sensitive events. [5]

Human VS NORA

As explained earlier, the key strength of technologies such as NORA lays at their capacity to continuously process new data and draw non-obvious conclusions. While there is a limit to how much information a human could process, such limits are much greater, if not non-existent given enough resources, for machines. Furthermore, identifying non-obvious relationships in small datasets is difficult for humans. Increase the datasets and even in a perfect world a human would be unable to do this with perfectly matched databases. And in a hyper-connected world with increasing amounts of data, technologies such as NORA are irreplaceable.

 Real-time Outbreak and Disease Surveillance system (RODS)

Another example of a data mining application to improve national security is RODS. Which was developed by the Department of Biomedical Informatics at the University of Pittsburgh. It is a prototype developed at the University of Pittsburgh where real-time clinical data from emergency departments within a geographic region can be integrated to provide an instantaneous picture of symptom patterns and early detection of epidemic events.” [6]

Put simply, RODS is a system which is able to detect natural and man-made epidemics (bio-terrorism) by employing data mining techniques.

 To conclude

Currently an increasing amount of data mining and machine learning technologies are being applied in the real world. Besides spam filtration and virtual assistants, data mining and machine learning are also used to improve national security. NORA and RODS are just two examples, where NORA is a system which draws ‘non-obvious’ conclusions from a variety of databases. Whether or not such an application is good thing depends on who you ask.

 

[1] Shmueli, G., Bruce, P. C., Yahav, I., Patel, N. R., & Lichtendahl Jr, K. C. (2017). Data Mining for Business Analytics: Concepts, Techniques, and Applications in R. John Wiley & Sons.[2] Nonobvious. (n.d.). Retrieved from https://www.merriam-webster.com/dictionary/nonobvious
[3] Jonas, J. (2002, 08 01). Black Hat USA 2002 – Non-Obvious Relationship Awareness (NORA) Technology. Retrieved from YouTube: https://youtu.be/BT02lMMjer0
[4] Bourgeois, D. (2014). Non-Obvious Relationship Awareness. In D. Bourgeois, Information Systems for Business and Beyond. USA: Saylor Academy.
[5] Anthes, G. (2002, 04 15). The Search is On. Retrieved from ComputerWorld: https://www.computerworld.com/article/2587381/business-intelligence/the-search- is-on.html
[6] ASPE. (2002, 06 01). Public Health Related Activities. Retrieved from Assistant Secretary for Planning and Evaluation: https://aspe.hhs.gov/public-health-related- activities-nhii

 

 

Artikel door Yannick de Jong