Interview Rob J Hyndman Forecasting Expert #rstats

Here is an interview with Prof Rob J Hyndman who has created many time series forecasting methods and authored books as well as R packages on the same.

Ajay -Describe your journey from being a student of science to a Professor. What were some key turning points along that journey?
 
Rob- I started a science honours degree at the University of Melbourne in 1985. By the end of 1985 I found myself simultaneously working as a statistical consultant (having completed all of one year of statistics courses!). For the next three years I studied mathematics, statistics and computer science at university, and tried to learn whatever I needed to in order to help my growing group of clients. Often we would cover things in classes that I’d already taught myself through my consulting work. That really set the trend for the rest of my career. I’ve always been an academic on the one hand, and a statistical consultant on the other. The consulting work has led me to learn a lot of things that I would not otherwise have come across, and has also encouraged me to focus on research problems that are of direct relevance to the clients I work with.
I never set out to be an academic. In fact, I thought that I would get a job in the business world as soon as I finished my degree. But once I completed the degree, I was offered a position as a statistical consultant within the University of Melbourne, helping researchers in various disciplines and doing some commercial work. After a year, I was getting bored doing only consulting, and I thought it would be interesting to do a PhD. I was lucky enough to be offered a generous scholarship which meant I was paid more to study than to continue working.
Again, I thought that I would probably go and get a job in the business world after I finished my PhD. But I finished it early and my scholarship was going to be cut off once I submitted my thesis. So instead, I offered to teach classes for free at the university and delayed submitting my thesis until the scholarship period ran out. That turned out to be a smart move because the university saw that I was a good teacher, and offered me a lecturing position starting immediately I submitted my thesis. So I sort of fell into an academic career.
I’ve kept up the consulting work part-time because it is interesting, and it gives me a little extra money. But I’ve also stayed an academic because I love the freedom to be able to work on anything that takes my fancy.
Ajay- Describe your upcoming book on Forecasting.
 
Rob- My first textbook on forecasting (with Makridakis and Wheelwright) was written a few years after I finished my PhD. It has been very popular, but it costs a lot of money (about $140 on Amazon). I estimate that I get about $1 for every book sold. The rest goes to the publisher (Wiley) and all they do is print, market and distribute it. I even typeset the whole thing myself and they print directly from the files I provided. It is now about 15 years since the book was written and it badly needs updating. I had a choice of writing a new edition with Wiley or doing something completely new. I decided to do a new one, largely because I didn’t want a publisher to make a lot of money out of students using my hard work.
It seems to me that students try to avoid buying textbooks and will search around looking for suitable online material instead. Often the online material is of very low quality and contains many errors.
As I wasn’t making much money on my textbook, and the facilities now exist to make online publishing very easy, I decided to try a publishing experiment. So my new textbook will be online and completely free. So far it is about 2/3 completed and is available at http://otexts.com/fpp/. I am hoping that my co-author (George Athanasopoulos) and I will finish it off before the end of 2012.
The book is intended to provide a comprehensive introduction to forecasting methods. We don’t attempt to discuss the theory much, but provide enough information for people to use the methods in practice. It is tied to the forecast package in R, and we provide code to show how to use the various forecasting methods.
The idea of online textbooks makes a lot of sense. They are continuously updated so if we find a mistake we fix it immediately. Also, we can add new sections, or update parts of the book, as required rather than waiting for a new edition to come out. We can also add richer content including video, dynamic graphics, etc.
For readers that want a print edition, we will be aiming to produce a print version of the book every year (available via Amazon).
I like the idea so much I’m trying to set up a new publishing platform (otexts.com) to enable other authors to do the same sort of thing. It is taking longer than I would like to make that happen, but probably next year we should have something ready for other authors to use.
Ajay- How can we make textbooks cheaper for students as well as compensate authors fairly
 
Rob- Well free is definitely cheaper, and there are a few businesses trying to make free online textbooks a reality. Apart from my own efforts, http://www.flatworldknowledge.com/ is producing a lot of free textbooks. And textbookrevolution.org is another great resource.
With otexts.com, we will compensate authors in two ways. First, the print versions of a book will be sold (although at a vastly cheaper rate than other commercial publishers). The royalties on print sales will be split 50/50 with the authors. Second, we plan to have some features of each book available for subscription only (e.g., solutions to exercises, some multimedia content, etc.). Again, the subscription fees will be split 50/50 with the authors.
Ajay- Suppose a person who used to use forecasting software from another company decides to switch to R. How easy and lucid do you think the current documentation on R website for business analytics practitioners such as these – in the corporate world.
 
Rob- The documentation on the R website is not very good for newcomers, but there are a lot of other R resources now available. One of the best introductions is Matloff’s “The Art of R Programming”. Provided someone has done some programming before (e.g., VBA, python or java), learning R is a breeze. The people who have trouble are those who have only ever used menu interfaces such as Excel. Then they are not only learning R, but learning to think about computing in a different way from what they are used to, and that can be tricky. However, it is well worth it. Once you know how to code, you can do so much more.  I wish some basic programming was part of every business and statistics degree.
If you are working in a particular area, then it is often best to find a book that uses R in that discipline. For example, if you want to do forecasting, you can use my book (otexts.com/fpp/). Or if you are using R for data visualization, get hold of Hadley Wickham’s ggplot2 book.
Ajay- In a long and storied career- What is the best forecast you ever made ? and the worst?
 
 Rob- Actually, my best work is not so much in making forecasts as in developing new forecasting methodology. I’m very proud of my forecasting models for electricity demand which are now used for all long-term planning of electricity capacity in Australia (see  http://robjhyndman.com/papers/peak-electricity-demand/  for the details). Also, my methods for population forecasting (http://robjhyndman.com/papers/stochastic-population-forecasts/ ) are pretty good (in my opinion!). These methods are now used by some national governments (but not Australia!) for their official population forecasts.
Of course, I’ve made some bad forecasts, but usually when I’ve tried to do more than is reasonable given the available data. One of my earliest consulting jobs involved forecasting the sales for a large car manufacturer. They wanted forecasts for the next fifteen years using less than ten years of historical data. I should have refused as it is unreasonable to forecast that far ahead using so little data. But I was young and naive and wanted the work. So I did the forecasts, and they were clearly outside the company’s (reasonable) expectations, and they then refused to pay me. Lesson learned. It’s better to refuse work than do it poorly.

Probably the biggest impact I’ve had is in helping the Australian government forecast the national health budget. In 2001 and 2002, they had underestimated health expenditure by nearly $1 billion in each year which is a lot of money to have to find, even for a national government. I was invited to assist them in developing a new forecasting method, which I did. The new method has forecast errors of the order of plus or minus $50 million which is much more manageable. The method I developed for them was the basis of the ETS models discussed in my 2008 book on exponential smoothing (www.exponentialsmoothing.net)

. And now anyone can use the method with the ets() function in the forecast package for R.
About-
Rob J Hyndman is Pro­fessor of Stat­ist­ics in the Depart­ment of Eco­no­met­rics and Busi­ness Stat­ist­ics at Mon­ash Uni­ver­sity and Dir­ector of the Mon­ash Uni­ver­sity Busi­ness & Eco­nomic Fore­cast­ing Unit. He is also Editor-in-Chief of the Inter­na­tional Journal of Fore­cast­ing and a Dir­ector of the Inter­na­tional Insti­tute of Fore­casters. Rob is the author of over 100 research papers in stat­ist­ical sci­ence. In 2007, he received the Moran medal from the Aus­tralian Academy of Sci­ence for his con­tri­bu­tions to stat­ist­ical research, espe­cially in the area of stat­ist­ical fore­cast­ing. For 25 years, Rob has main­tained an act­ive con­sult­ing prac­tice, assist­ing hun­dreds of com­pan­ies and organ­iz­a­tions. His recent con­sult­ing work has involved fore­cast­ing elec­tri­city demand, tour­ism demand, the Aus­tralian gov­ern­ment health budget and case volume at a US call centre.

The economics of software piracy

Software piracy exists because-

1) Lack of appropriate technological controls (like those on DVDs) or on Bit Torrents (an innovation on the centralized server like Napster) or on Streaming etc etc.

Technology to share content has evolved at a much higher pace than technology to restrict content from being shared or limited to purchasers.

2) Huge difference in purchasing power across the globe.

An Itunes song at 99 cents might be okay buy in USA, but in Asia it is very expensive. Maybe if content creators use Purchasing Power Parity to price their goods, it might make an indent.

3) State sponsored intellectual theft as another form of economic warfare- this has been going on since the West stole gunpowder and silk from the Chinese, and Intel decided to win back the IP rights to the microprocessor (from the Japanese client)

4) Lack of consensus in policy makers across the globe on who gets hurt from IP theft, but complete consensus across young people in the globe that they are doing the right thing by downloading stuff for free.

5) There is no such thing as a free lunch. Sometimes software (and movie and songs) piracy help create demand across ignored markets – I always think the NFL can be huge in India if they market it.Sometimes it forces artists to commit suicide because they give up on the life of starving musician.

Mostly piracy has helped break profits of intermediaries between the actual creator and actual consumer.

So how to solve software piracy , assuming it is something that can be solved-

I dont know, but I do care.

I give most of my writings as CC-by-SA and that includes my poems. People (friends and family) sometimes pay me not to sing.

Pirates have existed and will exist as long as civilized men romanticize the notion of piracy and bicker between themselves for narrow gains.

  1. Ephesians 4:28 Let the thief no longer steal, but rather let him labor, doing honest work with his own hands, so that he may have something to share with anyone in need.
  2. A clean confession, combined with a promise never to commit the sin again, when offered before one who has the right to receive it, is the purest type of repentance.-Gandhi
  3. If you steal, I will wash your mouth with soap- Anonymous Mother.
  4. You shall not steal- Moses
  5. Steal may refer to: Theft, the illegal taking of another person’s property without that person’s freely-given consent; The gaining of a stolen base in baseball;

 

 

Interview Prof Benjamin Alamar , Sports Analytics

Here is an interview with Prof Benjamin Alamar, founding editor of the Journal of Quantitative Analysis in Sport, a professor of sports management at Menlo College and the Director of Basketball Analytics and Research for the Oklahoma City Thunder of the NBA.

Ajay – The movie Moneyball recently sparked out mainstream interest in analytics in sports.Describe the role of analytics in sports management

Benjamin- Analytics is impacting sports organizations on both the sport and business side.
On the Sport side, teams are using analytics, including advanced data management, predictive anlaytics, and information systems to gain a competitive edge. The use of analytics results in more accurate player valuations and projections, as well as determining effective strategies against specific opponents.
On the business side, teams are using the tools of analytics to increase revenue in a variety of ways including dynamic ticket pricing and optimizing of the placement of concession stands.
Ajay-  What are the ways analytics is used in specific sports that you have been part of?

Benjamin- A very typical first step for a team is to utilize the tools of predictive analytics to help inform their draft decisions.

Ajay- What are some of the tools, techniques and software that analytics in sports uses?
Benjamin- The tools of sports analytics do not differ much from the tools of business analytics. Regression analysis is fairly common as are other forms of data mining. In terms of software, R is a popular tool as is Excel and many of the other standard analysis tools.
Ajay- Describe your career journey and how you became involved in sports management. What are some of the tips you want to tell young students who wish to enter this field?

Benjamin- I got involved in sports through a company called Protrade Sports. Protrade initially was a fantasy sports company that was looking to develop a fantasy game based on advanced sports statistics and utilize a stock market concept instead of traditional drafting. I was hired due to my background in economics to develop the market aspect of the game.

There I met Roland Beech (who now works for the Mavericks) and Aaron Schatz (owner of footballoutsiders.com) and learned about the developing field of sports statistics. I then changed my research focus from economics to sports statistics and founded the Journal of Quantitative Analysis in Sports. Through the journal and my published research, I was able to establish a reputation of doing quality, useable work.

For students, I recommend developing very strong data management skills (sql and the like) and thinking carefully about what sort of questions a general manager or coach would care about. Being able to demonstrate analytic skills around actionable research will generally attract the attention of pro teams.

About-

Benjamin Alamar, Professor of Sport Management, Menlo College

Benjamin Alamar

Professor Benjamin Alamar is the founding editor of the Journal of Quantitative Analysis in Sport, a professor of sports management at Menlo College and the Director of Basketball Analytics and Research for the Oklahoma City Thunder of the NBA. He has published academic research in football, basketball and baseball, has presented at numerous conferences on sports analytics. He is also a co-creator of ESPN’s Total Quarterback Rating and a regular contributor to the Wall Street Journal. He has consulted for teams in the NBA and NFL, provided statistical analysis for author Michael Lewis for his recent book The Blind Side, and worked with numerous startup companies in the field of sports analytics. Professor Alamar is also an award winning economist who has worked academically and professionally in intellectual property valuation, public finance and public health. He received his PhD in economics from the University of California at Santa Barbara in 2001.

Prof Alamar is a speaker at Predictive Analytics World, San Fransisco and is doing a workshop there

http://www.predictiveanalyticsworld.com/sanfrancisco/2012/agenda.php#day2-17

2:55-3:15pm

All level tracks Track 1: Sports Analytics
Case Study: NFL, MLB, & NBA
Competing & Winning with Sports Analytics

The field of sports analytics ties together the tools of data management, predictive modeling and information systems to provide sports organization a competitive advantage. The field is rapidly developing based on new and expanded data sources, greater recognition of the value, and past success of a variety of sports organizations. Teams in the NFL, MLB, NBA, as well as other organizations have found a competitive edge with the application of sports analytics. The future of sports analytics can be seen through drawing on these past successes and the developments of new tools.

You can know more about Prof Alamar at his blog http://analyticfootball.blogspot.in/ or journal at http://www.degruyter.com/view/j/jqas. His detailed background can be seen at http://menlo.academia.edu/BenjaminAlamar/CurriculumVitae

Cyber Cold War

I try to write on cyber conflict without getting into the politics of why someone is hacking someone else. I always get beaten by someone in the comments thread when I write on politics.

But recent events have forced me to update my usual “how-to” cyber conflict to “why” cyber conflict. This is because of a terrorist attack in my hometown Delhi.

(updated-

http://www.nytimes.com/2012/02/14/world/middleeast/israeli-embassy-officials-attacked-in-india-and-georgia.html?_r=1&hp

Iran allegedly tried  (as per Israel) to assassinate the wife of Israeli Defence Attache in Delhi using a magnetic bomb, India as she went to school to pick up her kids, somebody else put a grenade in Israeli embassy car in Georgia which was found in time. 

Based on reports , initial work suggests the bomb was much more sophisticated than local terrorists, but the terrorists seemed to have some local recce work done.

India has 0 history of antisemitism but this is the second time Israelis have been targeted since 26/11 Mumbai attacks. India buys 12 % of oil annually from Iran (and refuses to join the oil embargo called by US and Europe)

Cyber Conflict is less painful than conflict, which is inevitable as long as mankind exists. Also the Western hemisphere needs a moon shot (cyber conflict could be the Sputnik like moment) and with declining and aging populations but better technology, Western Hemisphere govts need cyber conflict as they are running out of humans to fight their wars. Eastern govt. are even more obnoxious in using children for conflict propaganda, and corruption.

Last week CIA.gov website went down

This week Iranian govt is allegedly blocking https traffic on eve of Annual Revolution Day (what a coincidence!)

 

Some resources to help Internet users in Iran (or maybe this could be a dummy test for the big one – hacking the great firewall of China)

News from Hacker News-

http://news.ycombinator.com/item?id=3575029

 

I’m writing this to report the serious troubles we have regarding accessing Internet in Iran at the moment. Since Thursday Iranian government has shutted down the https protocol which has caused almost all google services (gmail, and google.com itself) to become inaccessible. Almost all websites that reply on Google APIs (like wolfram alpha) won’t work. Accessing to any website that replies on https (just imaging how many websites use this protocol, from Arch Wiki to bank websites). Also accessing many proxies is also impossible. There are almost no official reports on this and with many websites and my email accounts restricted I can just confirm this based on my own and friends experience. I have just found one report here:

Iran Shut Down Gmail , Google , Yahoo and sites using “Https” Protocol

The reason for this horrible shutdown is that the Iranian regime celebrates 1979 Islamic revolution tomorrow.

I just wanted to let you guys know about this. If you have any solution regarding bypassing this restriction please help!

 

The boys at Tor think they can help-

but its not so elegant, as I prefer creating a  batch file rather than explain coding to newbies. 

this is still getting to better and easier interfaces

https://www.torproject.org/projects/obfsproxy-instructions.html.en

Obfsproxy Instructions

client torrc

Step 1: Install dependencies, obfsproxy, and Tor

 

You will need a C compiler (gcc), the autoconf and autotools build system, the git revision control system, pkg-config andlibtoollibevent-2 and its headers, and the development headers of OpenSSL.

On Debian testing or Ubuntu oneiric, you could do:
# apt-get install autoconf autotools-dev gcc git pkg-config libtool libevent-2.0-5 libevent-dev libevent-openssl-2.0-5 libssl-dev

If you’re on a more stable Linux, you can either try our experimental backport libevent2 debs or build libevent2 from source.

Clone obfsproxy from its git repository:
$ git clone https://git.torproject.org/obfsproxy.git
The above command should create and populate a directory named ‘obfsproxy’ in your current directory.

Compile obfsproxy:
$ cd obfsproxy
$ ./autogen.sh && ./configure && make

Optionally, as root install obfsproxy in your system:
# make install

If you prefer not to install obfsproxy as root, you can instead just modify the Transport lines in your torrc file (explained below) to point to your obfsproxy binary.

You will need Tor 0.2.3.11-alpha or later.


Step 2a: If you’re the client…

 

First, you need to learn the address of a bridge that supports obfsproxy. If you don’t know any, try asking a friend to set one up for you. Then the appropriate lines to your tor configuration file:

UseBridges 1
Bridge obfs2 128.31.0.34:1051
ClientTransportPlugin obfs2 exec /usr/local/bin/obfsproxy --managed

Don’t forget to replace 128.31.0.34:1051 with the IP address and port that the bridge’s obfsproxy is listening on.
 Congratulations! Your traffic should now be obfuscated by obfsproxy. You are done! You can now start using Tor.

For old fashioned tunnel creation under Seas of English Channel-

http://dag.wieers.com/howto/ssh-http-tunneling/

Tunneling SSH over HTTP(S)
This document explains how to set up an Apache server and SSH client to allow tunneling SSH over HTTP(S). This can be useful on restricted networks that either firewall everything except HTTP traffic (tcp/80,tcp/443) or require users to use a local (HTTP) proxy.
A lot of people asked why doing it like this if you can just make sshd listen on port 443. Well, that might work if your environment is not hardened like I have seen at several companies, but this setup has a few advantages.

  • You can proxy to anywhere (see the Proxy directive in Apache) based on names
  • You can proxy to any port you like (see the AllowCONNECT directive in Apache)
  • It works even when there is a layer-7 protocol firewall
  • If you enable proxytunnel ssl support, it is indistinguishable from real SSL traffic
  • You can come up with nice hostnames like ‘downloads.yourdomain.com’ and ‘pictures.yourdomain.com’ and for normal users these will look like normal websites when visited.
  • There are many possibilities for doing authentication further along the path
  • You can do proxy-bouncing to the n-th degree to mask where you’re coming from or going to (however this requires more changes to proxytunnel, currently I only added support for one remote proxy)
  • You do not have to dedicate an IP-address for sshd, you can still run an HTTPS site

Related-

http://opensourceandhackystuff.blogspot.in/2012/02/captive-portal-security-part-1.html

and some crypto for young people

http://users.telenet.be/d.rijmenants/en/onetimepad.htm

 

Me- What am I doing about it? I am just writing poems on hacking at http://poemsforkush.com

What are you thankful for?

I am thankful for-1) God and Jesus Christ and Rama and Allah and Buddha and their priests and intermediaries and everybody taking care of me. or atleast trying hard to take care of me.2) Earth my planet for nourishing me despite me polluting her, her fresh air and her delicious fruit and marvellous wine and hearty bread.

3) Fellow Human Beings for being nice to me when they feel curt, for displaying civilized manners, and working together in a vast invisible web of commerce, trade and exchange to meet our needs.

4) Scientists and Engineers who create wonderful technology by spending hours , months , years of their lives and giving it up for free on the Internet.

5) Powerful people who take time to mentor unknown wild cards, and young people to rejuvenate with new exciting ideas.

6) people who appreciate my poetry and people who appreciate my technology. and people who criticize only in the intention of me striving to create something better.

Continue reading “What are you thankful for?”

Interview Eberhard Miethke and Dr. Mamdouh Refaat, Angoss Software

Here is an interview with Eberhard Miethke and Dr. Mamdouh Refaat, of Angoss Software. Angoss is a global leader in delivering business intelligence software and predictive analytics solutions that help businesses capitalize on their data by uncovering new opportunities to increase sales and profitability and to reduce risk.

Ajay-  Describe your personal journey in software. How can we guide young students to pursue more useful software development than just gaming applications.

 Mamdouh- I started using computers long time ago when they were programmed using punched cards! First in Fortran, then C, later C++, and then the rest. Computers and software were viewed as technical/engineering tools, and that’s why we can still see the heavy technical orientation of command languages such as Unix shells and even in the windows Command shell. However, with the introduction of database systems and Microsoft office apps, it was clear that business will be the primary user and field of application for software. My personal trip in software started with scientific applications, then business and database systems, and finally statistical software – which you can think of it as returning to the more scientific orientation. However, with the wide acceptance of businesses of the application of statistical methods in different fields such as marketing and risk management, it is a fast growing field that in need of a lot of innovation.

Ajay – Angoss makes multiple data mining and analytics products. could you please introduce us to your product portfolio and what specific data analytics need they serve.

a- Attached please find our main product flyers for KnowledgeSTUDIO and KnowledgeSEEKER. We have a 3rd product called “strategy builder” which is an add-on to the decision tree modules. This is also described in the flyer.

(see- Angoss Knowledge Studio Product Guide April2011  and http://www.scribd.com/doc/63176430/Angoss-Knowledge-Seeker-Product-Guide-April2011  )

Ajay-  The trend in analytics is for big data and cloud computing- with hadoop enabling processing of massive data sets on scalable infrastructure. What are your plans for cloud computing, tablet based as well as mobile based computing.

a- This is an area where the plan is still being figured out in all organizations. The current explosion of data collected from mobile phones, text messages, and social websites will need radically new applications that can utilize the data from these sources. Current applications are based on the relational database paradigm designed in the 70’s through the 90’s of the 20th century.

But data sources are generating data in volumes and formats that are challenging this paradigm and will need a set of new tools and possibly programming languages to fit these needs. The cloud computing, tablet based and mobile computing (which are the same thing in my opinion, just different sizes of the device) are also two technologies that have not been explored in analytics yet.

The approach taken so far by most companies, including Angoss, is to rely on new xml-based standards to represent data structures for the particular models. In this case, it is the PMML (predictive modelling mark-up language) standard, in order to allow the interoperability between analytics applications. Standardizing on the representation of models is viewed as the first step in order to allow the implementation of these models to emerging platforms, being that the cloud or mobile, or social networking websites.

The second challenge cited above is the rapidly increasing size of the data to be analyzed. Angoss has already identified this challenge early on and is currently offering in-database analytics drivers for several database engines: Netezza, Teradata and SQL Server.

These drivers allow our analytics products to translate their routines into efficient SQL-based scripts that run in the database engine to exploit its performance as well as the powerful hardware on which it runs. Thus, instead of copying the data to a staging format for analytics, these drivers allow the data to be analyzed “in-place” within the database without moving it.

Thus offering performance, security and integrity. The performance is improved because of the use of the well tuned database engines running on powerful hardware.

Extra security is achieved by not copying the data to other platforms, which could be less secure. And finally, the integrity of the results are vastly improved by making sure that the results are always obtained by analyzing the up-to-date data residing in the database rather than an older copy of the data which could be obsolete by the time the analysis is concluded.

Ajay- What are the principal competing products to your offerings, and what makes your products special or differentiated in value to them (for each customer segment).

a- There are two major players in today’s market that we usually encounter as competitors, they are: SAS and IBM.

SAS offers a data mining workbench in the form of SAS Enterprise Miner, which is closely tied to SAS data mining methodology known as SEMMA.

On the other hand, IBM has recently acquired SPSS, which offered its Clementine data mining software. IBM has now rebranded Clementine as IBM SPSS Modeller.

In comparison to these products, our KnowledgeSTUDIO and KnowledgeSEEKER offer three main advantages: ease of use; affordability; and ease of integration into existing BI environments.

Angoss products were designed to look-and-feel-like popular Microsoft office applications. This makes the learning curve indeed very steep. Typically, an intermediate level analyst needs only 2-3 days of training to become proficient in the use of the software with all its advanced features.

Another important feature of Angoss software products is their integration with SAS/base product, and SQL-based database engines. All predictive models generated by Angoss can be automatically translated to SAS and SQL scripts. This allows the generation of scoring code for these common platforms. While the software interface simplifies all the tasks to allow business users to take advantage of the value added by predictive models, the software includes advanced options to allow experienced statisticians to fine-tune their models by adjusting all model parameters as needed.

In addition, Angoss offers a unique product called StrategyBuilder, which allows the analyst to add key performance indicators (KPI’s) to predictive models. KPI’s such as profitability, market share, and loyalty are usually required to be calculated in conjunction with any sales and marketing campaign. Therefore, StrategyBuilder was designed to integrate such KPI’s with the results of a predictive model in order to render the appropriate treatment for each customer segment. These results are all integrated into a deployment strategy that can also be translated into an execution code in SQL or SAS.

The above competitive features offered by the software products of Angoss is behind its success in serving over 4000 users from over 500 clients worldwide.

Ajay -Describe a major case study where using Angoss software helped save a big amount of revenue/costs by innovative data mining.

a-Rogers Telecommunications Inc. is one of the largest Canadian telecommunications providers, serving over 8.5 million customers and a revenue of 11.1 Billion Canadian Dollars (2009). In 2008, Rogers engaged Angoss in order to help with the problem of ballooning accounts receivable for a period of 18 months.

The problem was approached by improving the efficiency of the call centre serving the collections process by a set of predictive models. The first set of models were designed to find accounts likely to default ahead of time in order to take preventative measures. A second set of models were designed to optimize the call centre resources to focus on delinquent accounts likely to pay back most of the outstanding balance. Accounts that were identified as not likely to pack quickly were good candidates for “Early-out” treatment, by forwarding them directly to collection agencies. Angoss hosted Rogers’ data and provided on a regular interval the lists of accounts for each treatment to be deployed by the call centre dialler. As a result of this Rogers estimated an improvement of 10% of the collected sums.

Biography-

Mamdouh has been active in consulting, research, and training in various areas of information technology and software development for the last 20 years. He has worked on numerous projects with major organizations in North America and Europe in the areas of data mining, business analytics, business analysis, and engineering analysis. He has held several consulting positions for solution providers including Predict AG in Basel, Switzerland, and as ANGOSS Corp. Mamdouh is the Director of Professional services for EMEA region of ANGOSS Software. Mamdouh received his PhD in engineering from the University of Toronto and his MBA from the University of Leeds, UK.

Mamdouh is the author of:

"Credit Risk Scorecards: Development and Implmentation using SAS"
 "Data Preparation for Data Mining Using SAS",
 (The Morgan Kaufmann Series in Data Management Systems) (Paperback)
 and co-author of
 "Data Mining: Know it all",Morgan Kaufmann



Eberhard Miethke  works as a senior sales executive for Angoss

 

About Angoss-

Angoss is a global leader in delivering business intelligence software and predictive analytics to businesses looking to improve performance across sales, marketing and risk. With a suite of desktop, client-server and in-database software products and Software-as-a-Service solutions, Angoss delivers powerful approaches to turn information into actionable business decisions and competitive advantage.

Angoss software products and solutions are user-friendly and agile, making predictive analytics accessible and easy to use.

Interview Rapid-I -Ingo Mierswa and Simon Fischer

Here is an interview with Dr Ingo Mierswa , CEO of Rapid -I and Dr Simon Fischer, Head R&D. Rapid-I makes the very popular software Rapid Miner – perhaps one of the earliest leading open source software in business analytics and business intelligence. It is quite easy to use, deploy and with it’s extensions and innovations (including compatibility with R )has continued to grow tremendously through the years.

In an extensive interview Ingo and Simon talk about algorithms marketplace, extensions , big data analytics, hadoop, mobile computing and use of the graphical user interface in analytics.

Special Thanks to Nadja from Rapid I communication team for helping coordinate this interview.( Statuary Blogging Disclosure- Rapid I is a marketing partner with Decisionstats as per the terms in https://decisionstats.com/privacy-3/)

Ajay- Describe your background in science. What are the key lessons that you have learnt while as scientific researcher and what advice would you give to new students today.

Ingo: My time as researcher really was a great experience which has influenced me a lot. I have worked at the AI lab of Prof. Dr. Katharina Morik, one of the persons who brought machine learning and data mining to Europe. Katharina always believed in what we are doing, encouraged us and gave us the space for trying out new things. Funnily enough, I never managed to use my own scientific results in any real-life project so far but I consider this as a quite common gap between science and the “real world”. At Rapid-I, however, we are still heavily connected to the scientific world and try to combine the best of both worlds: solving existing problems with leading-edge technologies.

Simon: In fact, during my academic career I have not worked in the field of data mining at all. I worked on a field some of my colleagues would probably even consider boring, and that is theoretical computer science. To be precise, my research was in the intersection of game theory and network theory. During that time, I have learnt a lot of exciting things, none of which had any business use. Still, I consider that a very valuable experience. When we at Rapid-I hire people coming to us right after graduating, I don’t care whether they know the latest technology with a fancy three-letter acronym – that will be forgotten more quickly than it came. What matters is the way you approach new problems and challenges. And that is also my recommendation to new students: work on whatever you like, as long as you are passionate about it and it brings you forward.

Ajay-  How is the Rapid Miner Extensions marketplace moving along. Do you think there is a scope for people to say create algorithms in a platform like R , and then offer that algorithm as an app for sale just like iTunes or Android apps.

 Simon: Well, of course it is not going to be exactly like iTunes or Android apps are, because of the more business-orientated character. But in fact there is a scope for that, yes. We have talked to several developers, e.g., at our user conference RCOMM, and several people would be interested in such an opportunity. Companies using data mining software need supported software packages, not just something they downloaded from some anonymous server, and that is only possible through a platform like the new Marketplace. Besides that, the marketplace will not only host commercial extensions. It is also meant to be a platform for all the developers that want to publish their extensions to a broader community and make them accessible in a comfortable way. Of course they could just place them on their personal Web pages, but who would find them there? From the Marketplace, they are installable with a single click.

Ingo: What I like most about the new Rapid-I Marketplace is the fact that people can now get something back for their efforts. Developing a new algorithm is a lot of work, in some cases even more that developing a nice app for your mobile phone. It is completely accepted that people buy apps from a store for a couple of Dollars and I foresee the same for sharing and selling algorithms instead of apps. Right now, people can already share algorithms and extensions for free, one of the next versions will also support selling of those contributions. Let’s see what’s happening next, maybe we will add the option to sell complete RapidMiner workflows or even some data pools…

Ajay- What are the recent features in Rapid Miner that support cloud computing, mobile computing and tablets. How do you think the landscape for Big Data (over 1 Tb ) is changing and how is Rapid Miner adapting to it.

Simon: These are areas we are very active in. For instance, we have an In-Database-Mining Extension that allows the user to run their modelling algorithms directly inside the database, without ever loading the data into memory. Using analytic databases like Vectorwise or Infobright, this technology can really boost performance. Our data mining server, RapidAnalytics, already offers functionality to send analysis processes into the cloud. In addition to that, we are currently preparing a research project dealing with data mining in the cloud. A second project is targeted towards the other aspect you mention: the use of mobile devices. This is certainly a growing market, of course not for designing and running analyses, but for inspecting reports and results. But even that is tricky: When you have a large screen you can display fancy and comprehensive interactive dashboards with drill downs and the like. On a mobile device, that does not work, so you must bring your reports and visualizations very much to the point. And this is precisely what data mining can do – and what is hard to do for classical BI.

Ingo: Then there is Radoop, which you may have heard of. It uses the Apache Hadoop framework for large-scale distributed computing to execute RapidMiner processes in the cloud. Radoop has been presented at this year’s RCOMM and people are really excited about the combination of RapidMiner with Hadoop and the scalability this brings.

 Ajay- Describe the Rapid Miner analytics certification program and what steps are you taking to partner with academic universities.

Ingo: The Rapid-I Certification Program was created to recognize professional users of RapidMiner or RapidAnalytics. The idea is that certified users have demonstrated a deep understanding of the data analysis software solutions provided by Rapid-I and how they are used in data analysis projects. Taking part in the Rapid-I Certification Program offers a lot of benefits for IT professionals as well as for employers: professionals can demonstrate their skills and employers can make sure that they hire qualified professionals. We started our certification program only about 6 months ago and until now about 100 professionals have been certified so far.

Simon: During our annual user conference, the RCOMM, we have plenty of opportunities to talk to people from academia. We’re also present at other conferences, e.g. at ECML/PKDD, and we are sponsoring data mining challenges and grants. We maintain strong ties with several universities all over Europe and the world, which is something that I would not want to miss. We are also cooperating with institutes like the ITB in Dublin during their training programmes, e.g. by giving lectures, etc. Also, we are leading or participating in several national or EU-funded research projects, so we are still close to academia. And we offer an academic discount on all our products 🙂

Ajay- Describe the global efforts in making Rapid Miner a truly international software including spread of developers, clients and employees.

Simon: Our clients already are very international. We have a partner network in America, Asia, and Australia, and, while I am responding to these questions, we have a training course in the US. Developers working on the core of RapidMiner and RapidAnalytics, however, are likely to stay in Germany for the foreseeable future. We need specialists for that, and it would be pointless to spread the development team over the globe. That is also owed to the agile philosophy that we are following.

Ingo: Simon is right, Rapid-I already is acting on an international level. Rapid-I now has more than 300 customers from 39 countries in the world which is a great result for a young company like ours. We are of course very strong in Germany and also the rest of Europe, but also concentrate on more countries by means of our very successful partner network. Rapid-I continues to build this partner network and to recruit dynamic and knowledgeable partners and in the future. However, extending and acting globally is definitely part of our strategic roadmap.

Biography

Dr. Ingo Mierswa is working as Chief Executive Officer (CEO) of Rapid-I. He has several years of experience in project management, human resources management, consulting, and leadership including eight years of coordinating and leading the multi-national RapidMiner developer team with about 30 developers and contributors world-wide. He wrote his Phd titled “Non-Convex and Multi-Objective Optimization for Numerical Feature Engineering and Data Mining” at the University of Dortmund under the supervision of Prof. Morik.

Dr. Simon Fischer is heading the research & development at Rapid-I. His interests include game theory and networks, the theory of evolutionary algorithms (e.g. on the Ising model), and theoretical and practical aspects of data mining. He wrote his PhD in Aachen where he worked in the project “Design and Analysis of Self-Regulating Protocols for Spectrum Assignment” within the excellence cluster UMIC. Before, he was working on the vtraffic project within the DFG Programme 1126 “Algorithms for large and complex networks”.

http://rapid-i.com/content/view/181/190/ tells you more on the various types of Rapid Miner licensing for enterprise, individual and developer versions.

(Note from Ajay- to receive an early edition invite to Radoop, click here http://radoop.eu/z1sxe)