Analytics and Journals

Some good journals for reading on analytics-

1) JSS

http://www.jstatsoft.org/

present research that demonstrates the joint evolution of computational and statistical methods and techniques.  Implementations can use languages such as C, C++, S, Fortran, Java, PHP, Python and Ruby or environments such as Mathematica, MATLAB, R, S-PLUS, SAS, Stata, and XLISP-STAT.

There are currently 370 articles, 23 code snippets, 86 book reviews, 4 software reviews, and 7 special volumes in archives

2) R Journal

http://journal.r-project.org/

The  Journal

3) Pharma Programming

http://maney.co.uk/index.php/journals/pha/

Pharmaceutical Programming is the official journal of the Pharmaceutical Users Software Exchange (PhUSE), a non-profit membership society with the objective of educating programmers and their managers working in the pharmaceutical industry. Available both in print and online, Pharmaceutical Programming is an international journal with focus on programming in the regulated environment of the pharmaceutical and life sciences industry.

4) SAS Papers – User Groups

http://www.lexjansen.com/

4569 SAS papers presented
at SGF/SUGI 1996-2010.
1343 SAS papers presented
at PharmaSUG 2000-2010.
1810 SAS papers presented
at NESUG 1997-2009.
1191 SAS papers presented
at SESUG 1999-2009.
463 SAS papers presented
at PhUSE 2005-2009.
787 SAS papers presented
at WUSS 2003-2009.
337 SAS papers presented
at MWSUG 2001, 2004-2009.
188 SAS papers presented
at PNWSUG 2004-2009.
246 SAS papers presented
at SCSUG 2003-2007, 2009.
221 SAS papers related to CDISC.
Easy access to the CDISC Forum.

5) http://analyticsmagazine.com/

Magazine by http://www.informs.org/

6) Data Mining Journals

Academic Journals

Journals relevant to Data Mining

Oracle Open World/ RODM package

From the press release, here comes Oracle Open World. They really have an excellent rock concert in that as well.

.NET and Windows @ Oracle Develop and Oracle OpenWorld 2010

Oracle Develop will again feature a .NET track for Oracle developers. Oracle Develop is suited for all levels of .NET developers, from beginner to advanced. It covers introductory Oracle .NET material, new features, deep dive application tuning, and includes three hours of hands-on labs apply what you learned from the sessions.

To register, go to Oracle Develop registration site.

Oracle OpenWorld will include several sessions on using the Oracle Database on Windows and .NET.

Session schedules and locations for Windows and .NET sessions at Oracle Develop and OpenWorld are now available.

Download: 32-bit ODAC 11.2.0.1.2 for Visual Studio 2010 and .NET Framework 4

With ODAC 11.2.0.1.2, developers can connect to Oracle Database versions 9.2 and higher from Visual Studio 2010 and .NET Framework 4. ODAC components support the full framework, as well as the new .NET Framework Client Profile.

Statement of Direction: Oracle Database and Microsoft Entity Framework

Learn about Oracle’s beta and production plans to support Microsoft Entity Framework with Oracle Database.

Also see http://www.oracle.com/technetwork/articles/datawarehouse/saternos-r-161569.html

for

Data Mining Using the RDOM Package

By Casimir Saternos

Some excerpts-

Open R and enter the following command.

> library(RODM)

This command loads the RODM library and as well the dependent RODBC package. The next step is to make a database connection.

> DB <- RODM_open_dbms_connection(dsn="orcl", uid="dm", pwd="dm")

Subsequent commands use the DB object (an instance of the RODBC class) to connect to the database. The DNS specified in the command is the name you used earlier for the Data Source Name during the ODBC connection configuration. You can view the actual R code being executed by the command by simply typing the function name (without parentheses).

> RODM_open_dbms_connection

And say making a Model in Oracle and R-

> numrows <- length(orange_data[,1])
> orange_data.rows <- length(orange_data[,1])
> orange_data.id <- matrix(seq(1, orange_data.rows),  nrow=orange_data.rows, ncol=1, dimnames= list(NULL, c(“CASE_ID”)))
> orange_data <- cbind(orange_data.id, orange_data)

This adjustment to the data frame then needs to be propagated to the database. You can confirm the change using the sqlColumns function, as listed earlier.

> RODM_create_dbms_table(DB, "orange_data")
> sqlColumns(DB, 'orange_data')$COLUMN_NAME

> glm <- RODM_create_glm_model(
database = DB,
data_table_name = “orange_data”,
case_id_column_name = “CASE_ID”,
target_column_name = “circumference”,
model_name = “GLM_MODEL”,
mining_function = “regression”)

Information about this model can then be obtained by analyzing value returned from the model and stored in the variable named glm.

> glm$model.model_settings
> glm$glm.globals
> $glm.coefficients

Once you have a model, you can apply the model to a new set of data. To begin, create or retrieve sample data in the same format as the training data.

> query<-('select 999 case_id, 1 tree, 120 age, 
32 circumference from dual')

> orange_test<-sqlQuery(DB, query)
> RODM_create_dbms_table(DB, "orange_test")
and 
Finally, the model can be applied to the new data set and the results analyzed.

results <- RODM_apply_model(database = DB, 
data_table_name = "orange_test",
model_name = "GLM_MODEL",
supplemental_cols = "circumference")

When your session is complete, you can clean up objects that were created (if you like) and you should close the database connection:

> RODM_drop_model(database=DB,'GLM_MODEL')
> RODM_drop_dbms_table(DB, "orange_test")
> RODM_drop_dbms_table(DB, "orange_data")
> RODM_close_dbms_connection(DB)

See the full article at http://www.oracle.com/technetwork/articles/datawarehouse/saternos-r-161569.html

KXEN Update

Update from a very good data mining software company, KXEN –

  1. Longtime Chairman and founder Roger Haddad is retiring but would be a Board Member. See his interview with Decisionstats here https://decisionstats.wordpress.com/2009/01/05/interview-roger-haddad-founder-of-kxen-automated-modeling-software/ (note images were hidden due to migration from .com to .wordpress.com )
  2. New Members of Leadership are as-
John Ball, CEOJohn Ball
Chief Executive Officer

John Ball brings 20 years of experience in enterprise software, deep expertise in business intelligence and CRM applications, and a proven track record of success driving rapid growth at highly innovative companies.

Prior to joining KXEN, Mr. Ball served in several executive roles at salesforce.com, the leading provider of SaaS applications. Most recently, John served as VP & General Manager, Analytics and Reporting Products, where he spearheaded salesforce.com’s foray into CRM analytics and business intelligence. John also served as VP & General Manager, Service and Support Applications at salesforce.com, where he successfully grew the business to become the second largest and fastest growing product line at salesforce.com. Before salesforce.com, Ball was founder and CEO of Netonomy, the leading provider of customer self-service solutions for the telecommunications industry. Ball also held a number of executive roles at Business Objects, including General Manager, Web Products, where delivered to market the first 3 versions of WebIntelligence. Ball has a master’s degree in electrical engineering from Georgia Tech and a master’s degree in electric

I hope John atleast helps build a KXEN Force.com application- there are only 2 data mining apps there on App Exchange. Also on the wish list  more social media presence, a Web SaaS/Amazon API for KXEN, greater presence in American/Asian conferences, and a solution for SME’s (which cannot afford the premium pricing of the flagship solution. An alliance with bigger BI vendors like Oracle, SAP or IBM  for selling the great social network analysis.

Bill Russell as Non Executive Chairman-

Bill Russell as Non-executive Chairman of the Board, effective July 16 2010. Russell has 30 years of operational experience in enterprise software, with a special focus on business intelligence, analytics, and databases.Russell held a number of senior-level positions in his more than 20 years at Hewlett-Packard, including Vice President and General Manager of the multi-billion dollar Enterprise Systems Group. He has served as Non-executive Chairman of the Board for Sylantro Systems Corporation, webMethods Inc., and Network Physics, Inc. and has served as a board director for Cognos Inc. In addition to KXEN, Russell currently serves on the boards of Saba, PROS Holdings Inc., Global 360, ParAccel Inc., and B.T. Mancini Company.

Xavier Haffreingue as senior vice president, worldwide professional services and solutions.
He has almost 20 years of international enterprise software experience gained in the CRM, BI, Web and database sectors. Haffreingue joins KXEN from software provider Axway where he was VP global support operations. Prior to Axway, he held various leadership roles in the software industry, including VP self service solutions at Comverse Technologies and VP professional services and support at Netonomy, where he successfully delivered multi-million dollar projects across Europe, Asia-Pacific and Africa. Before that he was with Business Objects and Sybase, where he ran support and services in southern Europe managing over 2,500 customers in more than 20 countries.

David Guercio  as senior vice president, Americas field operations. Guercio brings to the role more than 25 years experience of building and managing high-achieving sales teams in the data mining, business intelligence and CRM markets. Guercio comes to KXEN from product lifecycle management vendor Centric Software, where he was EVP sales and client services. Prior to Centric, he was SVP worldwide sales and client services at Inxight Software, where he was also Chairman and CEO of the company’s Federal Systems Group, a subsidiary of Inxight that saw success in the US Federal Government intelligence market. The success in sales growth and penetration into the federal government led to the acquisition of Inxight by Business Objects in 2007, where Guercio then led the Inxight sales organization until Business Objects was acquired by SAP. Guercio was also a key member of the management team and a co-founder at Neovista, an early pioneer in data mining and predictive analytics. Additionally, he held the positions of director of sales and VP of professional services at Metaphor Computer Systems, one of the first data extraction solutions companies, which was acquired by IBM. During his career, Guercio also held executive positions at Resonate and SiGen.

3) Venture Capital funding to fund expansion-

It has closed $8 million in series D funding to further accelerate its growth and international expansion. The round was led by NextStage and included participation from existing investors XAnge Capital, Sofinnova Ventures, Saints Capital and Motorola Ventures.

This was done after John Ball had joined as CEO.

4) Continued kudos from analysts and customers for it’s technical excellence.

KXEN was named a leader in predictive analytics and data mining by Forrester Research (1) and was rated highest for commercial deployments of social network analytics by Frost & Sullivan (2)

Also it became an alliance partner of Accenture- which is also a prominent SAS partner as well.

In Database Optimization-

In KXEN V5.1, a new data manipulation module (ADM) is provided in conjunction with scoring to optimize database workloads and provide full in-database model deployment. Some leading data mining vendors are only now beginning to offer this kind of functionality, and then with only one or two selected databases, giving KXEN a more than five-year head start. Some other vendors are only offering generic SQL generation, not optimized for each database, and do not provide the wealth of possible outputs for their scoring equations: For example, real operational applications require not only to generate scores, but decision probabilities, error bars, individual input contributions – used to derive reasons of decision and more, which are available in KXEN in-database scoring modules.

Since 2005, KXEN has leveraged databases as the data manipulation engine for analytical dataset generation. In 2008, the ADM (Analytical Data Management) module delivered a major enhancement by providing a very easy to use data manipulation environment with unmatched productivity and efficiency. ADM works as a generator of optimized database-specific SQL code and comes with an integrated layer for the management of meta-data for analytics.

KXEN Modeling Factory- (similar to SAS’s recent product Rapid Predictive Modeler http://www.sas.com/resources/product-brief/rapid-predictive-modeler-brief.pdf and http://jtonedm.com/2010/09/02/first-look-rapid-predictive-modeler/)

KXEN Modeling Factory (KMF) has been designed to automate the development and maintenance of predictive analytics-intensive systems, especially systems that include large numbers of models, vast amounts of data or require frequent model refreshes. Information about each project and model is monitored and disseminated to ensure complete management and oversight and to facilitate continual improvement in business performance.

Main Functions

Schedule: creation of the Analytic Data Set (ADS), setup of how and when to score, setup of when and how to perform model retraining and refreshes …

Report
: Monitormodel execution over time, Track changes in model quality over time, see how useful one variable is by considering its multiple instance in models …

Notification
: Rather than having to wade through pages of event logs, KMF Department allows users to manage by exception through notifications.

Other products from KXEN have been covered here before https://decisionstats.wordpress.com/tag/kxen/ , including Structural Risk Minimization- https://decisionstats.wordpress.com/2009/04/27/kxen-automated-regression-modeling/

Thats all for the KXEN update- all the best to the new management team and a splendid job done by Roger Haddad in creating what is France and Europe’s best known data mining company.

Note- Source – http://www.kxen.com


Google AppInventor -Android and Business Intelligence

Here is a great new tool for techies to start creating Android Apps right away- even if you have no knowledge of the platform. Of course there are existing great number of apps- including my favorite Android Data Mining App in R – called AnalyticDroid http://analyticdroid.togaware.com/

Basically it calls the Rattle (R Analytical Tool To Learn Easily) Data Mining GUI -enabling data mining from an Android Mobile using remote computing.

I dont know if any other statistical application is available on Android Mobiles- though SAS did have a presentation on using SAS on IPhone

http://www.wuss.org/proceedings09/09WUSSProceedings/papers/dpr/DPR-Truong.pdf



SAS Mobile -Iphone App

All you need to do is go to http://appinventor.googlelabs.com/about/index.html and request access (yes there is a 2 week approval waiting line)

Because App Inventor provides access to a GPS-location sensor, you can build apps that know where you are. You can build an app to help you remember where you parked your car, an app that shows the location of your friends or colleagues at a concert or conference, or your own custom tour app of your school, workplace, or a museum.
You can write apps that use the phone features of an Android phone. You can write an app that periodically texts “missing you” to your loved ones, or an app “No Text While Driving” that responds to all texts automatically with “sorry, I’m driving and will contact you later”. You can even have the app read the incoming texts aloud to you (though this might lure you into responding).
App Inventor provides a way for you to communicate with the web. If you know how to write web apps, you can use App Inventor to write Android apps that talk to your favorite web sites, such as Amazon and Twitter.

Here is a not so statistical Android App I am trying to create called Hang-Out

using the current GPS location of your phone to find nearest Pub, Movie or Diner and catch Bus- Train based on your location city, the GPS and time of request and schedule of those cities public transport- very much WIP

Data Mining 2010:SAS Conference in Vegas

An interesting conference which I attended last year, this year one of the main guests is an ex professor of mine at UTenn. I am India bound this year though for family reasons.

http://www.sas.com/events/dmconf/over.html

Latest News

Early Bird Special
Register for M2010 before Sept. 17 and save $200 on conference fees!

Additional Data Mining Resources
Find additional data mining resouces including links to whitepapers, webinars, audio seminars, videos, blogs and online communities.

Location
Caesars Palace
Las Vegas, NV

Conference: October 25-26
Pre-conference workshops: October 24
Post-conference training: October 27-29

The M2010 Data Mining Conference is an international educational conference and exhibition for data mining practitioners including analysts, statisticians, programmers, consultants and anyone involved with data management within their organization, Hosted by SAS, M2010 is now in its 13th year and has become the world’s largest data mining conference, attracting over 600 people from various industries including Financial Services, Retail, Insurance, Technology, Education, Healthcare, Pharmaceutical, Government and more.

This conference is the top-choice for serious education and career networking. Conference highlights include

  • 6 keynotes
  • 36 sessions
  • 6 session tracks
  • exhibit hall
  • poster session
  • SAS software training
  • educational workshops
  • special events
  • networking opportunities
  • predictive modeling certification testing event.

Session Topics

  • Business applications
  • Data augmentation
  • Perspectives from the financial services industry
  • Fraud detection
  • Perspectives from the healthcare industry
  • New and emerging technologies
  • Perspectives from the retail industry
  • Data mining in marketing
  • Retention and Life Cycle Analysis
  • Text mining
  • And more! (View session abstracts.)

Event: Predictive analytics with R, PMML and ADAPA

From http://www.meetup.com/R-Users/calendar/14405407/

The September meeting is at the Oracle campus. (This is next door to the Oracle towers, so there is plenty of free parking.) The featured talk is from Alex Guazzelli (Vice President – Analytics, Zementis Inc.) who will talk about “Predictive analytics with R, PMML and ADAPA”.

Agenda:
* 6:15 – 7:00 Networking and Pizza (with thanks to Revolution Analytics)
* 7:00 – 8:00 Talk: Predictive analytics with R, PMML and ADAPA
* 8:00 – 8:30 General discussion

Talk overview:

The rule in the past was that whenever a model was built in a particular development environment, it remained in that environment forever, unless it was manually recoded to work somewhere else. This rule has been shattered with the advent of PMML (Predictive Modeling Markup Language). By providing a uniform standard to represent predictive models, PMML allows for the exchange of predictive solutions between different applications and various vendors.

Once exported as PMML files, models are readily available for deployment into an execution engine for scoring or classification. ADAPA is one example of such an engine. It takes in models expressed in PMML and transforms them into web-services. Models can be executed either remotely by using web-services calls, or via a web console. Users can also use an Excel add-in to score data from inside Excel using models built in R.

R models have been exported into PMML and uploaded in ADAPA for many different purposes. Use cases where clients have used the flexibility of R to develop and the PMML standard combined with ADAPA to deploy range from financial applications (e.g., risk, compliance, fraud) to energy applications for the smart grid. The ability to easily transition solutions developed in R to the operational IT production environment helps eliminate the traditional limitations of R, e.g. performance for high volume or real-time transactional systems and memory constraints associated with large data sets.

Speaker Bio:

Dr. Alex Guazzelli has co-authored the first book on PMML, the Predictive Model Markup Language which is the de facto standard used to represent predictive models. The book, entitled PMML in Action: Unleashing the Power of Open Standards for Data Mining and Predictive Analytics, is available on Amazon.com. As the Vice President of Analytics at Zementis, Inc., Dr. Guazzelli is responsible for developing core technology and analytical solutions under ADAPA, a PMML-based predictive decisioning platform that combines predictive analytics and business rules. ADAPA is the first system of its kind to be offered as a service on the cloud.
Prior to joining Zementis, Dr. Guazzelli was involved in not only building but also deploying predictive solutions for large financial and telecommunication institutions around the globe. In academia, Dr. Guazzelli worked with data mining, neural networks, expert systems and brain theory. His work in brain theory and computational neuroscience has appeared in many peer reviewed publications. At Zementis, Dr. Guazzelli and his team have been involved in a myriad of modeling projects for financial, health-care, gaming, chemical, and manufacturing industries.

Dr. Guazzelli holds a Ph.D. in Computer Science from the University of Southern California and a M.S and B.S. in Computer Science from the Federal University of Rio Grande do Sul, Brazil.

R Oracle Data Mining

Here is a new package called R ODM and it is an interface to do Data Mining via Oracle Tables through R. You can read more here http://www.oracle.com/technetwork/database/options/odm/odm-r-integration-089013.html and here http://cran.fhcrc.org/web/packages/RODM/RODM.pdf . Also there is a contest for creative use of R and ODM.

R Interface to Oracle Data Mining

The R Interface to Oracle Data Mining ( R-ODM) allows R users to access the power of Oracle Data Mining’s in-database functions using the familiar R syntax. R-ODM provides a powerful environment for prototyping data analysis and data mining methodologies.

R-ODM is especially useful for:

  • Quick prototyping of vertical or domain-based applications where the Oracle Database supports the application
  • Scripting of “production” data mining methodologies
  • Customizing graphics of ODM data mining results (examples: classificationregressionanomaly detection)

The R-ODM interface allows R users to mine data using Oracle Data Mining from the R programming environment. It consists of a set of function wrappers written in source R language that pass data and parameters from the R environment to the Oracle RDBMS enterprise edition as standard user PL/SQL queries via an ODBC interface. The R-ODM interface code is a thin layer of logic and SQL that calls through an ODBC interface. R-ODM does not use or expose any Oracle product code as it is completely an external interface and not part of any Oracle product. R-ODM is similar to the example scripts (e.g., the PL/SQL demo code) that illustrates the use of Oracle Data Mining, for example, how to create Data Mining models, pass arguments, retrieve results etc.

R-ODM is packaged as a standard R source package and is distributed freely as part of the R environment’s Comprehensive R Archive Network ( CRAN). For information about the R environment, R packages and CRAN, see www.r-project.org.

and

Present and win an Apple iPod Touch!
The BI, Warehousing and Analytics (BIWA) SIG is giving an Apple iPOD Touch to the best new presenter. Be part of the TechCast series and get a chance to win!

Consider highlighting a creative use of R and ODM.

BIWA invites all Oracle professionals (experts, end users, managers, DBAs, developers, data analysts, ISVs, partners, etc.) to submit abstracts for 45 minute technical webcasts to our Oracle BIWA (IOUG SIG) Community in our Wednesday TechCast series. Note that the contest is limited to new presenters to encourage fresh participation by the BIWA community.

Also an interview with Oracle Data Mining head, Charlie Berger https://decisionstats.wordpress.com/2009/09/02/oracle/