RCOMM 2012 goes live in August

An awesome conference by an awesome software Rapid Miner remains one of the leading enterprise grade open source software , that can help you do a lot of things including flow driven data modeling ,web mining ,web crawling etc which even other software cant.

Presentations include:

  • Mining Machine 2 Machine Data (Katharina Morik, TU Dortmund University)
  • Handling Big Data (Andras Benczur, MTA SZTAKI)
  • Introduction of RapidAnalytics at Telenor (Telenor and United Consult)
  • and more

Here is a list of complete program

 

Program

 

Time
Slot
Tuesday
Training / Workshop 1
Wednesday
Conference 1
Thursday
Conference 2
Friday
Training / Workshop 2
09:00 – 10:30
Introductory Speech
Ingo Mierswa (Rapid-I)Resource-aware Data Mining or M2M Mining (Invited Talk)

Katharina Morik (TU Dortmund University)

More information

 

Data Analysis

 

NeurophRM: Integration of the Neuroph framework into RapidMiner
Miloš Jovanović, Jelena Stojanović, Milan Vukićević, Vera Stojanović, Boris Delibašić (University of Belgrade)

To be announced (Invited Talk)
Andras Benczur 

Recommender Systems

 

Extending RapidMiner with Recommender Systems Algorithms
Matej Mihelčić, Nino Antulov-Fantulin, Matko Bošnjak, Tomislav Šmuc (Ruđer Bošković Institute)

Implementation of User Based Collaborative Filtering in RapidMiner
Sérgio Morais, Carlos Soares (Universidade do Porto)

Parallel Training / Workshop Session

Advanced Data Mining and Data Transformations

or

Development Workshop Part 2

10:30 – 11:00
Coffee Break
Coffee Break
Coffee Break
11:00 – 12:30
Data Analysis

Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner
Mennatallah Amer, Markus Goldstein (DFKI)

Customers’ LifeStyle Targeting on Big Data using Rapid Miner
Maksim Drobyshev (LifeStyle Marketing Ltd)

Robust GPGPU Plugin Development for RapidMiner
Andor Kovács, Zoltán Prekopcsák (Budapest University of Technology and Economics)

Extensions

 

Optimization Plugin For RapidMiner
Venkatesh Umaashankar, Sangkyun Lee (TU Dortmund University; presented by Hendrik Blom)

 

Image Mining Extension – Year After
Radim Burget, Václav Uher, Jan Mašek (Brno University of Technology)

Incorporating R Plots into RapidMiner Reports
Peter Jeszenszky (University of Debrecen)

12:30 – 13:30
Lunch
Lunch
Lunch
13:30 – 15:30
Parallel Training / Workshop Session

Basic Data Mining and Data Transformations

or

Development Workshop Part 1

Applications

 

Introduction of RapidAnalyticy Enterprise Edition at Telenor Hungary
t.b.a. (Telenor Hungary and United Consult)

 

Application of RapidMiner in Steel Industry Research and Development
Bengt-Henning Maas, Hakan Koc, Martin Bretschneider (Salzgitter Mannesmann Forschung)

A Comparison of Data-driven Models for Forecast River Flow
Milan Cisty, Juraj Bezak (Slovak University of Technology)

Portfolio Optimization Using Local Linear Regression Ensembles in Rapid Miner
Gábor Nagy, Tamás Henk, Gergő Barta (Budapest University of Technology and Economics)

Extensions

 

An Octave Extension for RapidMiner
Sylvain Marié (Schneider Electric)

 

Unstructured Data

 

Processing Data Streams with the RapidMiner Streams-Plugin
Christian Bockermann, Hendrik Blom (TU Dortmund)

Automated Creation of Corpuses for the Needs of Sentiment Analysis
Peter Koncz, Jan Paralic (Technical University of Kosice)

 

Demonstration: News from the Rapid-I Labs
Simon Fischer; Rapid-I

This short session demonstrates the latest developments from the Rapid-I lab and will let you how you can build powerful analysis processes and routines by using those RapidMiner tools.

Certification Exam
15:30 – 16:00
Coffee Break
Coffee Break
Coffee Break
16:00 – 18:00
Book Presentation and Game Show

Data Mining for the Masses: A New Textbook on Data Mining for Everyone
Matthew North (Washington & Jefferson College)

Matthew North presents his new book “Data Mining for the Masses” introducing data mining to a broader audience and making use of RapidMiner for practical data mining problems.

 

Game Show
Did you miss last years’ game show “Who wants to be a data miner?”? Use RapidMiner for problems it was never created for and beat the time and other contestants!

User Support

Get some Coffee for free – Writing Operators with RapidMiner Beans
Christian Bockermann, Hendrik Blom (TU Dortmund)

Meta-Modeling Execution Times of RapidMiner operators
Matija Piškorec, Matko Bošnjak, Tomislav Šmuc (Ruđer Bošković Institute)

Conference day ends at ca. 17:00.

19:30
Social Event (Conference Dinner)
Social Event (Visit of Bar District)

 

and you should have a look at https://rapid-i.com/rcomm2012f/index.php?option=com_content&view=article&id=65

Conference is in Budapest, Hungary,Europe.

( Disclaimer- Rapid Miner is an advertising sponsor of Decisionstats.com in case you didnot notice the two banner sized ads.)

 

Interview James G Kobielus IBM Big Data

Here is an interview with  James G Kobielus, who is the Senior Program Director, Product Marketing, Big Data Analytics Solutions at IBM. Special thanks to Payal Patel Cudia of IBM’s communication team,for helping with the logistics for this.

Ajay -What are the specific parts of the IBM Platform that deal with the three layers of Big Data -variety, velocity and volume

James-Well first of all, let’s talk about the IBM Information Management portfolio. Our big data platform addresses the three layers of big data to varying degrees either together in a product , or two out of the three or even one of the three aspects. We don’t have separate products for the variety, velocity and volume separately.

Let us define these three layers-Volume refers to the hundreds of terabytes and petabytes of stored data inside organizations today. Velocity refers to the whole continuum from batch to real time continuous and streaming data.

Variety refers to multi-structure data from structured to unstructured files, managed and stored in a common platform analyzed through common tooling.

For Volume-IBM has a highly scalable Big Data platform. This includes Netezza and Infosphere groups of products, and Watson-like technologies that can support petabytes volume of data for analytics. But really the support of volume ranges across IBM’s Information Management portfolio both on the database side and the advanced analytics side.

For real time Velocity, we have real time data acquisition. We have a product called IBM Infosphere, part of our Big Data platform, that is specifically built for streaming real time data acquisition and delivery through complex event processing. We have a very rich range of offerings that help clients build a Hadoop environment that can scale.

Our Hadoop platform is the most real time capable of all in the industry. We are differentiated by our sheer breadth, sophistication and functional depth and tooling integrated in our Hadoop platform. We are differentiated by our streaming offering integrated into the Hadoop platform. We also offer a great range of modeling and analysis tools, pretty much more than any other offering in the Big Data space.

Attached- Jim’s slides from Hadoop World

Ajay- Any plans for Mahout for Hadoop

Jim- I cant speak about product plans. We have plans but I cant tell you anything more. We do have a feature in Big Insights called System ML, a library for machine learning.

Ajay- How integral are acquisitions for IBM in the Big Data space (Netezza,Cognos,SPSS etc). Is it true that everything that you have in Big Data is acquired or is the famous IBM R and D contributing here . (see a partial list of IBM acquisitions at at http://www.ibm.com/investor/strategy/acquisitions.wss )

Jim- We have developed a lot on our own. We have the deepest R and D of anybody in the industry in all things Big Data.

For example – Watson has Big Insights Hadoop at its core. Apache Hadoop is the heart and soul of Big Data (see http://www-01.ibm.com/software/data/infosphere/hadoop/ ). A great deal that makes Big Insights so differentiated is that not everything that has been built has been built by the Hadoop community.

We have built additions out of the necessity for security, modeling, monitoring, and governance capabilities into BigInsights to make it truly enterprise ready. That is one example of where we have leveraged open source and we have built our own tools and technologies and layered them on top of the open source code.

Yes of course we have done many strategic acquisitions over the last several years related to Big Data Management and we continue to do so. This quarter we have done 3 acquisitions with strong relevance to Big Data. One of them is Vivisimo (http://www-03.ibm.com/press/us/en/pressrelease/37491.wss ).

Vivisimo provides federated Big Data discovery, search and profiling capabilities to help you figure out what data is out there,what is relevance of that data to your data science project- to help you answer the question which data should you bring in your Hadoop Cluster.

 We also did Varicent , which is more performance management and we did TeaLeaf , which is a customer experience solution provider where customer experience management and optimization is one of the hot killer apps for Hadoop in the cloud. We have done great many acquisitions that have a clear relevance to Big Data.

Netezza already had a massively parallel analytics database product with an embedded library of models called Netezza Analytics, and in-database capabilties to massively parallelize Map Reduce and other analytics management functions inside the database. In many ways, Netezza provided capabilities similar to that IBM had provided for many years under the Smart Analytics Platform (http://www-01.ibm.com/software/data/infosphere/what-is-advanced-analytics/ ) .

There is a differential between Netezza and ISAS.

ISAS was built predominantly in-house over several years . If you go back a decade ago IBM acquired Ascential Software , a product portfolio that was the heart and soul of IBM InfoSphere Information Manager that is core to our big Data platform. In addition to Netezza, IBM bought SPSS two years back. We already had data mining tools and predictive modeling in the InfoSphere portfolio, but we realized we needed to have the best of breed, SPSS provided that and so IBM acquired them.

 Cognos– We had some BI reporting capabilities in the InfoSphere portfolio that we had built ourselves and also acquired for various degrees from prior acquisitions. But clearly Cognos was one of the best BI vendors , and we were lacking such a rich tool set in our product in visualization and cubing and so for that reason we acquired Cognos.

There is also Unica – which is a marketing campaign optimization which in many ways is a killer app for Hadoop. Projects like that are driving many enterprises.

Ajay- How would you rank order these acquisitions in terms of strategic importance rather than data of acquisition or price paid.

Jim-Think of Big Data as an ecosystem that has components that are fitted to particular functions for data analytics and data management. Is the database the core, or the modeling tool the core, or the governance tools the core, or is the hardware platform the core. Everything is critically important. We would love to hear from you what you think have been most important. Each acquisition has helped play a critical role to build the deepest and broadest solution offering in Big Data. We offer the hardware, software, professional services, the hosting service. I don’t think there is any validity to a rank order system.

Ajay-What are the initiatives regarding open source that Big Data group have done or are planning?

Jim- What we are doing now- We are very much involved with the Apache Hadoop community. We continue to evolve the open source code that everyone leverages.. We have built BigInsights on Apache Hadoop. We have the closest, most up to date in terms of version number to Apache Hadoop ( Hbase,HDFS, Pig etc) of all commercial distributions with our BigInsights 1.4 .

We have an R library integrated with BigInsights . We have a R library integrated with Netezza Analytics. There is support for R Models within the SPSS portfolio. We already have a fair amount of support for R across the portfolio.

Ajay- What are some of the concerns (privacy,security,regulation) that you think can dampen the promise of Big Data.

Jim- There are no showstoppers, there is really a strong momentum. Some of the concerns within the Hadoop space are immaturity of the technology, the immaturity of some of the commercial offerings out there that implement Hadoop, the lack of standardization for formal sense for Hadoop.

There is no Open Standards Body that declares, ratifies the latest version of Mahout, Map Reduce, HDFS etc. There is no industry consensus reference framework for layering these different sub projects. There are no open APIs. There are no certifications or interoperability standards or organizations to certify different vendors interoperability around a common API or framework.

The lack of standardization is troubling in this whole market. That creates risks for users because users are adopting multiple Hadoop products. There are lots of Hadoop deployments in the corporate world built around Apache Hadoop (purely open source). There may be no assurance that these multiple platforms will interoperate seamlessly. That’s a huge issue in terms of just magnifying the risk. And it increases the need for the end user to develop their own custom integrated code if they want to move data between platforms, or move map-reduce jobs between multiple distributions.

Also governance is a consideration. Right now Hadoop is used for high volume ETL on multi structured and unstructured data sources, or Hadoop is used for exploratory sand boxes for data scientists. These are important applications that are a majority of the Hadoop deployments . Some Hadoop deployments are stand alone unstructured data marts for specific applications like sentiment analysis like.

Hadoop is not yet ready for data warehousing. We don’t see a lot of Hadoop being used as an alternative to data warehouses for managing the single version of truth of system or record data. That day will come but there needs to be out there in the marketplace a broader range of data governance mechanisms , master data management, data profiling products that are mature that enterprises can use to make sure their data inside their Hadoop clusters is clean and is the single version of truth. That day has not arrived yet.

One of the great things about IBM’s acquisition of Vivisimo is that a piece of that overall governance picture is discovery and profiling for unstructured data , and that is done very well by Vivisimo for several years.

What we will see is vendors such as IBM will continue to evolve security features inside of our Hadoop platform. We will beef up our data governance capabilities for this new world of Hadoop as the core of Big Data, and we will continue to build up our ability to integrate multiple databases in our Hadoop platform so that customers can use data from a bit of Hadoop,some data from a bit of traditional relational data warehouse, maybe some noSQL technology for different roles within a very complex Big Data environment.

That latter hybrid deployment model is becoming standard across many enterprises for Big Data. A cause for concern is when your Big Data deployment has a bit of Hadoop, bit of noSQL, bit of EDW, bit of in-memory , there are no open standards or frameworks for putting it all together for a unified framework not just for interoperability but also for deployment.

There needs to be a virtualization or abstraction layer for unified access to all these different Big Data platforms by the users/developers writing the queries, by administrators so they can manage data and resources and jobs across all these disparate platforms in a seamless unified way with visual tooling. That grand scenario, the virtualization layer is not there yet in any standard way across the big data market. It will evolve, it may take 5-10 years to evolve but it will evolve.

So, that’s the concern that can dampen some of the enthusiasm for Big Data Analytics.

About-

You can read more about Jim at http://www.linkedin.com/pub/james-kobielus/6/ab2/8b0 or

follow him on Twitter at http://twitter.com/jameskobielus

You can read more about IBM Big Data at http://www-01.ibm.com/software/data/bigdata/

Rapid Miner User Conference 2012

One of those cool conferences that is on my bucket list- this time in Hungary (That’s a nice place)

But I am especially interested in seeing how far Radoop has come along !

Disclaimer- Rapid Miner has been a Decisionstats.com sponsor  for many years. It is also a very cool software but I like the R Extension facility even more!

—————————————————————

and not very expensive too compared to other User Conferences in Europe!-

http://rcomm2012.org/index.php/registration/prices

Information about Registration

  • Early Bird registration until July 20th, 2012.
  • Normal registration from July 21st, 2012 until August 13th, 2012.
  • Latest registration from August 14th, 2012 until August 24th, 2012.
  • Students have to provide a valid Student ID during registration.
  • The Dinner is included in the All Days and in the Conference packages.
  • All prices below are net prices. Value added tax (VAT) has to be added if applicable.

Prices for Regular Visitors

Days and Event
Early Bird Rate
Normal Rate
Latest Registration
Tuesday

(Training / Development 1)

190 Euro 230 Euro 280 Euro
Wednesday + Thursday

(Conference)

290 Euro 350 Euro 420 Euro
Friday

(Training / Development 2 and Exam)

190 Euro 230 Euro 280 Euro
All Days

(Full Package)

610 Euro 740 Euro 900 Euro

Prices for Authors and Students

In case of students, please note that you will have to provide a valid student ID during registration.

Days and Event
Early Bird Rate
Normal Rate
Latest Registration
Tuesday

(Training / Development 1)

90 Euro 110 Euro 140 Euro
Wednesday + Thursday

(Conference)

140 Euro 170 Euro 210 Euro
Friday

(Training / Development 2 and Exam)

90 Euro 110 Euro 140 Euro
All Days

(Full Package)

290 Euro 350 Euro 450 Euro
Time
Slot
Tuesday
Training / Workshop 1
Wednesday
Conference 1
Thursday
Conference 2
Friday
Training / Workshop 2
09:00 – 10:30
Introductory Speech
Ingo Mierswa; Rapid-I 

Data Analysis

 

NeurophRM: Integration of the Neuroph framework into RapidMiner
Miloš Jovanović, Jelena Stojanović, Milan Vukićević, Vera Stojanović, Boris Delibašić (University of Belgrade)

To be announced (Invited Talk)
To be announced

 

Recommender Systems

 

Extending RapidMiner with Recommender Systems Algorithms
Matej Mihelčić, Nino Antulov-Fantulin, Matko Bošnjak, Tomislav Šmuc (Ruđer Bošković Institute)

Implementation of User Based Collaborative Filtering in RapidMiner
Sérgio Morais, Carlos Soares (Universidade do Porto)

Parallel Training / Workshop Session

Advanced Data Mining and Data Transformations

or

Development Workshop Part 2

10:30 – 12:30
Data Analysis

Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner
Mennatallah Amer, Markus Goldstein (DFKI)

Customers’ LifeStyle Targeting on Big Data using Rapid Miner
Maksim Drobyshev (LifeStyle Marketing Ltd)

Robust GPGPU Plugin Development for RapidMiner
Andor Kovács, Zoltán Prekopcsák (Budapest University of Technology and Economics)

Extensions

Image Mining Extension – Year After
Radim Burget, Václav Uher, Jan Mašek (Brno University of Technology)

Incorporating R Plots into RapidMiner Reports
Peter Jeszenszky (University of Debrecen)

An Octave Extension for RapidMiner
Sylvain Marié (Schneider Electric)

12:30 – 13:30
Lunch
Lunch
Lunch
13:30 – 15:00
Parallel Training / Workshop Session

Basic Data Mining and Data Transformations

or

Development Workshop Part 1

Applications

Application of RapidMiner in Steel Industry Research and Development
Bengt-Henning Maas, Hakan Koc, Martin Bretschneider (Salzgitter Mannesmann Forschung)

A Comparison of Data-driven Models for Forecast River Flow
Milan Cisty, Juraj Bezak (Slovak University of Technology)

Portfolio Optimization Using Local Linear Regression Ensembles in Rapid Miner
Gábor Nagy, Tamás Henk, Gergő Barta (Budapest University of Technology and Economics)

Unstructured Data


Processing Data Streams with the RapidMiner Streams-Plugin
Christian Bockermann, Hendrik Blom (TU Dortmund)

Automated Creation of Corpuses for the Needs of Sentiment Analysis
Peter Koncz, Jan Paralic (Technical University of Kosice)

 

Demonstration

 

News from the Rapid-I Labs
Simon Fischer; Rapid-I

This short session demonstrates the latest developments from the Rapid-I lab and will let you how you can build powerful analysis processes and routines by using those RapidMiner tools.

Certification Exam
15:00 – 17:00
Book Presentation and Game Show

Data Mining for the Masses: A New Textbook on Data Mining for Everyone
Matthew North (Washington & Jefferson College)

Matthew North presents his new book “Data Mining for the Masses” introducing data mining to a broader audience and making use of RapidMiner for practical data mining problems.

 

Game Show
Did you miss last years’ game show “Who wants to be a data miner?”? Use RapidMiner for problems it was never created for and beat the time and other contestants!

User Support

Get some Coffee for free – Writing Operators with RapidMiner Beans
Christian Bockermann, Hendrik Blom (TU Dortmund)

Meta-Modeling Execution Times of RapidMiner operators
Matija Piškorec, Matko Bošnjak, Tomislav Šmuc (Ruđer Bošković Institute) 

19:00
Social Event (Conference Dinner)
Social Event (Visit of Bar District)

 

Training: Basic Data Mining and Data Transformations

This is a short introductory training course for users who are not yet familiar with RapidMiner or only have a few experiences with RapidMiner so far. The topics of this training session include

  • Basic Usage
    • User Interface
    • Creating and handling RapidMiner repositories
    • Starting a new RapidMiner project
    • Operators and processes
    • Loading data from flat files
    • Storing data, processes, and results
  • Predictive Models
    • Linear Regression
    • Naïve Bayes
    • Decision Trees
  • Basic Data Transformations
    • Changing names and roles
    • Handling missing values
    • Changing value types by discretization and dichotimization
    • Normalization and standardization
    • Filtering examples and attributes
  • Scoring and Model Evaluation
    • Applying models
    • Splitting data
    • Evaluation methods
    • Performance criteria
    • Visualizing Model Performance

 

Training: Advanced Data Mining and Data Transformations

This is a short introductory training course for users who already know some basic concepts of RapidMiner and data mining and have already used the software before, for example in the first training on Tuesday. The topics of this training session include

  • Advanced Data Handling
    • Sampling
    • Balancing data
    • Joins and Aggregations
    • Detection and removal of outliers
    • Dimensionality reduction
  • Control process execution
    • Remember process results
    • Recall process results
    • Loops
    • Using branches and conditions
    • Exception handling
    • Definition of macros
    • Usage of macros
    • Definition of log values
    • Clearing log tables
    • Transforming log tables to data

 

Development Workshop Part 1 and Part 2

Want to exchange ideas with the developers of RapidMiner? Or learn more tricks for developing own operators and extensions? During our development workshops on Tuesday and Friday, we will build small groups of developers each working on a small development project around RapidMiner. Beginners will get a comprehensive overview of the architecture of RapidMiner before making the first steps and learn how to write own operators. Advanced developers will form groups with our experienced developers, identify shortcomings of RapidMiner and develop a new extension which might be presented during the conference already. Unfinished work can be continued in the second workshop on Friday before results might be published on the Marketplace or can be taken home as a starting point for new custom operators.

Predictive Models Ain’t Easy to Deploy

 

This is a guest blog post by Carole Ann Matignon of Sparkling Logic. You can see more on Sparkling Logic at http://my.sparklinglogic.com/

Decision Management is about combining predictive models and business rules to automate decisions for your business. Insurance underwriting, loan origination or workout, claims processing are all very good use cases for that discipline… But there is a hiccup… It ain’t as easy you would expect…

What’s easy?

If you have a neat model, then most tools would allow you to export it as a PMML model – PMML stands for Predictive Model Markup Language and is a standard XML representation for predictive model formulas. Many model development tools let you export it without much effort. Many BRMS – Business rules Management Systems – let you import it. Tada… The model is ready for deployment.

What’s hard?

The problem that we keep seeing over and over in the industry is the issue around variables.

Those neat predictive models are formulas based on variables that may or may not exist as is in your object model. When the variable is itself a formula based on the object model, like the min, max or sum of Dollar amount spent in Groceries in the past 3 months, and the object model comes with transaction details, such that you can compute it by iterating through those transactions, then the problem is not “that” big. PMML 4 introduced some support for those variables.

The issue that is not easy to fix, and yet quite frequent, is when the model development data model does not resemble the operational one. Your Data Warehouse very likely flattened the object model, and pre-computed some aggregations that make the mapping very hard to restore.

It is clearly not an impossible project as many organizations do that today. It comes with a significant overhead though that forces modelers to involve IT resources to extract the right data for the model to be operationalized. It is a heavy process that is well justified for heavy-duty models that were developed over a period of time, with a significant ROI.

This is a show-stopper though for other initiatives which do not have the same ROI, or would require too frequent model refresh to be viable. Here, I refer to “real” model refresh that involves a model reengineering, not just a re-weighting of the same variables.

For those initiatives where time is of the essence, the challenge will be to bring closer those two worlds, the modelers and the business rules experts, in order to streamline the development AND deployment of analytics beyond the model formula. The great opportunity I see is the potential for a better and coordinated tuning of the cut-off rules in the context of the model refinement. In other words: the opportunity to refine the strategy as a whole. Very ambitious? I don’t think so.

About Carole Ann Matignon

http://my.sparklinglogic.com/index.php/company/management-team

Carole-Ann Matignon Print E-mail

Carole-Ann MatignonCarole-Ann Matignon – Co-Founder, President & Chief Executive Officer

She is a renowned guru in the Decision Management space. She created the vision for Decision Management that is widely adopted now in the industry.  Her claim to fame is managing the strategy and direction of Blaze Advisor, the leading BRMS product, while she also managed all the Decision Management tools at FICO (business rules, predictive analytics and optimization). She has a vision for Decision Management both as a technology and a discipline that can revolutionize the way corporations do business, and will never get tired of painting that vision for her audience.  She speaks often at Industry conferences and has conducted university classes in France and Washington DC.

She started her career building advanced systems using all kinds of technologies — expert systems, rules, optimization, dashboarding and cubes, web search, and beta version of database replication. At Cleversys (acquired by Kurt Salmon & Associates), she also conducted strategic consulting gigs around change management.

While playing with advanced software components, she found a passion for technology and joined ILOG (acquired by IBM). She developed a growing interest in Optimization as well as Business Rules. At ILOG, she coined the term BRMS while brainstorming with her Sales counterpart. She led the Presales organization for Telecom in the Americas up until 2000 when she joined Blaze Software (acquired by Brokat Technologies, HNC Software and finally FICO).

Her 360-degree experience allowed her to gain appreciation for all aspects of a software company, giving her a unique perspective on the business. Her technical background kept her very much in touch with technology as she advanced.

Going off Search Radar for 2012 Q1

I just used the really handy tools at

https://www.google.com/webmasters/tools/crawl-access

, clicked Remove URL

https://www.google.com/webmasters/tools/crawl-access?hl=en&siteUrl=https://decisionstats.com/&tid=removal-list

and submitted http://www.decisionstats.com

and I also modified my robots.txt file to

User-agent: *
Disallow: /

Just to make sure- I added the meta tag to each right margin of my blog

“<meta name=”robots” content=”noindex”>”

Now for last six months of 2011 as per Analytics, search engines were really generous to me- Giving almost 170 K page views,

Source                            Visits          Pages/Visit
1. google                       58,788                       2.14
2. (direct)                     10,832                       2.24
3. linkedin.com            2,038                       2.50
4. google.com                1,823                       2.15
5. bing                              1,007                      2.04
6. reddit.com                    749                       1.93
7. yahoo                              740                      2.25
8. google.co.in                  576                       2.13
9. search                             572                       2.07

 

I do like to experiment though, and I wonder if search engines just –

1) Make people lazy to bookmark or type the whole website name in Chrome/Opera  toolbars

2) Help disguise sources of traffic by encrypted search terms

3) Help disguise corporate traffic watchers and aggregators

So I am giving all spiders a leave for Q1 2012. I am interested in seeing impact of this on my traffic , and I suspect that the curves would not be as linear as I think.

Is search engine optimization over rated? Let the data decide…. 🙂

I am also interested in seeing how social sharing can impact traffic in the absence of search engine interaction effects- and whether it is possible to retain a bigger chunk of traffic by reducing SEO efforts and increasing social efforts!

 

Quantitative Modeling for Arbitrage Positions in Ad KeyWords Internet Marketing

Assume you treat an ad keyword as an equity stock. There are slight differences in the cost for advertising for that keyword across various locations (Zurich vs Delhi) and various channels (Facebook vs Google) . You get revenue if your website ranks naturally in organic search for the keyword, and you have to pay costs for getting traffic to your website for that keyword.
An arbitrage position is defined as a riskless profit when cost of keyword is less than revenue from keyword. We take examples of Adsense  and Adwords primarily.
There are primarily two types of economic curves on the foundation of which commerce of the  internet  resides-
1) Cost Curve- Cost of Advertising to drive traffic into the website  (Google Adwords, Twitter Ads, Facebook , LinkedIn ads)
2) Revenue Curve – Revenue from ads clicked by the incoming traffic on website (like Adsense, LinkAds, Banner Ads, Ad Sharing Programs , In Game Ads)
The cost and revenue curves are primarily dependent on two things
1) Type of KeyWord-Also subdependent on
a) Location of Prospective Customer, and
b) Net Present Value of Good and Service to be eventually purchased
For example , keyword for targeting sales of enterprise “business intelligence software” should ideally be costing say X times as much as keywords for “flower shop for birthdays” where X is the multiple of the expected payoffs from sales of business intelligence software divided by expected payoff from sales of flowers (say in Location, Daytona Beach ,Florida or Austin, Texas)
2) Traffic Volume – Also sub-dependent on Time Series and
a) Seasonality -Annual Shoppping Cycle
b) Cyclicality– Macro economic shifts in time series
The cost and revenue curves are not linear and ideally should be continuous in a definitive exponential or polynomial manner, but in actual reality they may have sharp inflections , due to location, time, as well as web traffic volume thresholds
Type of Keyword – For example ,keywords for targeting sales for Eminem Albums may shoot up in a non linear manner after the musician dies.
The third and not so publicly known component of both the cost and revenue curves is factoring in internet industry dynamics , including relative market share of internet advertising platforms, as well as percentage splits between content creator and ad providing platforms.
For example, based on internet advertising spend, people belive that the internet advertising is currently heading for a duo-poly with Google and Facebook are the top two players, while Microsoft/Skype/Yahoo and LinkedIn/Twitter offer niche options, but primarily depend on price setting from Google/Bing/Facebook.
It is difficut to quantify  the elasticity and efficiency of market curves as most literature and research on this is by in-house corporate teams , or advisors or mentors or consultants to the primary leaders in a kind of incesteous fraternal hold on public academic research on this.
It is recommended that-
1) a balance be found in the need for corporate secrecy to protest shareholder value /stakeholder value maximization versus the need for data liberation for innovation and grow the internet ad pie faster-
2) Cost and Revenue Curves between different keywords, time,location, service providers, be studied by quants for hedging inetrent ad inventory or /and choose arbitrage positions This kind of analysis is done for groups of stocks and commodities in the financial world, but as commerce grows on the internet this may need more specific and independent quants.
3) attention be made to how cost and revenue curves mature as per level of sophistication of underlying economy like Brazil, Russia, China, Korea, US, Sweden may be in different stages of internet ad market evolution.
For example-
A study in cost and revenue curves for certain keywords across domains across various ad providers across various locations from 2003-2008 can help academia and research (much more than top ten lists of popular terms like non quantitative reports) as well as ensure that current algorithmic wightings are not inadvertently given away.
Part 2- of this series will explore the ways to create third party re-sellers of keywords and measuring impacts of search and ad engine optimization based on keywords.

Automatically creating tags for big blogs with WordPress

I use the simple-tags plugin in WordPress for automatically creating and posting tags. I am hoping this makes the site better to navigate. Given the fact that I had not been a very efficient tagger before, this plugin can really be useful for someone in creating tags for more than 100 (or 1000 posts) especially WordPress based blog aggregators.

 

 

The plugin is available here –

Simple Tags is the successor of Simple Tagging Plugin This is THE perfect tool to manage perfectly your WP terms for any taxonomy

It was written with this philosophy : best performances, more secured and brings a lot of new functions

This plugin is developped on WordPress 3.3, with the constant WP_DEBUG to TRUE.

  • Administration
  • Tags suggestion from Yahoo! Term Extraction API, OpenCalais, Alchemy, Zemanta, Tag The Net, Local DB with AJAX request
    • Compatible with TinyMCE, FCKeditor, WYMeditor and QuickTags
  • tags management (rename, delete, merge, search and add tags, edit tags ID)
  • Edit mass tags (more than 50 posts once)
  • Auto link tags in post content
  • Auto tags !
  • Type-ahead input tags / Autocompletion Ajax
  • Click tags
  • Possibility to tag pages (not only posts) and include them inside the tags results
  • Easy configuration ! (in WP admin)

The above plugin can be combined with the RSS Aggregator plugin for Search Engine Optimization purposes

Ajay-You can also combine this plugin with RSS auto post blog aggregator (read instructions here) and create SEO optimized Blog Aggregation or Curation

Related –http://www.decisionstats.com/creating-a-blog-aggregator-for-free/