Edith on GT : A BI solution for Advanced Data Mining


    About the Author-   Edith Ohri heads a pioneering data-mining company in Israel which is dedicated to the application of GT – a new DM solution for unsupervised and complex data. Her background is Industrial & Management Engineering, MSc. She had started researching the issue of data mining in the early 80’s, and has continued with it ever since. She created a new model (GT) which enables larger and more complex data analysis. In 2002 she started in SMU Singapore the development of GT software. She is involved in several areas of implementation, such as: BI, Quality Control, Bio-med and Research. She manages a DM forums with Israel Engineering Association and a DM forum with the Data Warehouse site (Israel). She is a member and active participant in a number of DM forums, give presentations, and write articles.

December 31, 2008

GT data mining of NYSE companies – example

This is an example of data mining with GT, based on web free data from http://www.ics.uci.edu.

The purpose is to demonstrate the ability to create a coherent explanation to complex, partial, incomplete, non-representative and unsupervised data. In this case the data also is restricted to a single point of time and exclude information regarding shares, and therefore is particularly difficult for analytics.

Given: two sets of 1000 records each about companies in the New York Stock Exchange year 2000 (just before the dotcom bubble burst). The records include 22 attribute describing the company field, its state of investments, assets, liabilities, expenses, R&D, sales, profits, dividends and other major elements from the Public Report Statement, except information regarding shares.

The method:

1. Define clusters based on just half of the data, find their characteristics and drivers, and conclude about the phenomena which they may represent.

2. Validate the results by projecting them on the other half of data. Once the stability of conclusions is re-affirmed, the following last part of the analysis.

3. Interpretation takes place. Usually it is done in collaboration with the client, in the example it shows basically just in outlines to give a sense of it.

General observation

The "heart" of the analytics is in the automatic clustering – here the pattern splits to two, and between them an exceptions subgroup:

1. Financially intense industries, such as Banking, Financial Services, Energy and Real Estates; an exception subgroup some of which financial companies have an extremely high sales profit margin – see discussion in Fig.4.

2. The rest of industries – Business Services, Transportation, Communication, Technology at large, Raw Materials, and Health Care. See Cluster map Fig.1.


Fig. 1 Cluster map: strong relations among record clusters are marked by Red Purple, no-relations are marked in Light Green. The map shows polarized patterns, the financial (in the low top) and the rest. Next to the Financial pattern there is a small exception sub-group, titled in Red. Note that the Technology pattern is much diverse

Conclusions and explanations

After clustering of the data, the pattern and characteristics become easier to spot, and their typical behavior is more noticeable. Following is the description of clusters that were found with GT, and an interpretation of their typical behavior.

1. False profitability – a warning sign

GT finds that some Technology companies "behave" like financial companies, instead of their own industry’s behavior. It may be explained by the ease of raising money in 2000 "heated" Stock Exchange, and the practical option that was opened to companies to use the excessive funds for financial activities. In such a case, the reported high profits of companies may be a symptom of a dangerously inflated market rather than sign of sound companies, and while the graphs which show profitability encourage investors to continue in that practice, they are racing toward a dead end – the DotCom crisis.

2. Investments in loosing companies – high risk

In the Technology cluster, there are companies that have substantial losses yet manage to attract massive investments. Their characteristics are: low levels of long term liabilities and long term assets, and a high level of preferred stocks. An unlikely negative relation (instead of a positive one) is found to exist in these companies between Total Assets and Net Income. See Fig.2.


Fig. 2 Technology companies: special behavior. In the red part there is an irrational phenomenon, where losing companies seem to attract investments

3. Conglomerates with "Banks" traits – need to be looked into

GT defines at the margins of the Financial and Technology clusters, a number of conglomerates, all of which have an exceptional "behavior". Although there are mainly industrial companies, their patterns resemble the Energy and the Financial ones.

Remark: knowing the characteristics of the exception behavior enables the analyst to "comb" the entire database and find by the use of a straight query, additional companies that might demonstrate similar irregularity, for close up study.

Fig. 3 In Statistics the special pattern of Technology does not show; it is un-distinguishable from the general pattern of behavior

GT Second edition note

The upgraded new version of GT shades more light over the 2000 phenomenon and reveals among the rest an interesting exceptional behavior of a few financial organizations, which apparently found a different way to make money… Their profit seems to enjoy a much larger net value than of other companies. See the chart below. The organizations are: HSBC Holdings PLC, Chase Manhattan Corp., and Societe Generale Group. This fact may be part of the kind of practices that have led 8 years later in 2008 to the "credit crunch".


Fig. 4 Exceptional high sales profits ratio is observed in a sub-group of financial organizations – companies such as HSBC Holdings, Chase Manhattan, and Societe General

Final words

GT produces a fresh view of complex unsupervised data. It can track down even minute and rare phenomena (3 out of 1000 companies), an give early signals to financial managers and analysts about the things to come, their patterns, spread, drivers and key indicators. The study of this example belongs to a series of applications in which GT has consistently turns out ordinary data to new revelations on "what makes it tick".


© Edith Ohri

Procedureware Ltd. POB 16558 Tel-Aviv 61165

Tel: 972-3-5232164 edit@actcom.co.il

Author: Ajay Ohri


One thought on “Edith on GT : A BI solution for Advanced Data Mining”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s