Automatically creating tags for big blogs with WordPress

I use the simple-tags plugin in WordPress for automatically creating and posting tags. I am hoping this makes the site better to navigate. Given the fact that I had not been a very efficient tagger before, this plugin can really be useful for someone in creating tags for more than 100 (or 1000 posts) especially WordPress based blog aggregators.

 

 

The plugin is available here –

Simple Tags is the successor of Simple Tagging Plugin This is THE perfect tool to manage perfectly your WP terms for any taxonomy

It was written with this philosophy : best performances, more secured and brings a lot of new functions

This plugin is developped on WordPress 3.3, with the constant WP_DEBUG to TRUE.

  • Administration
  • Tags suggestion from Yahoo! Term Extraction API, OpenCalais, Alchemy, Zemanta, Tag The Net, Local DB with AJAX request
    • Compatible with TinyMCE, FCKeditor, WYMeditor and QuickTags
  • tags management (rename, delete, merge, search and add tags, edit tags ID)
  • Edit mass tags (more than 50 posts once)
  • Auto link tags in post content
  • Auto tags !
  • Type-ahead input tags / Autocompletion Ajax
  • Click tags
  • Possibility to tag pages (not only posts) and include them inside the tags results
  • Easy configuration ! (in WP admin)

The above plugin can be combined with the RSS Aggregator plugin for Search Engine Optimization purposes

Ajay-You can also combine this plugin with RSS auto post blog aggregator (read instructions here) and create SEO optimized Blog Aggregation or Curation

Related –http://www.decisionstats.com/creating-a-blog-aggregator-for-free/

Announcing Jaspersoft 4.5

Message from  Jaspersoft

————————–

Announcing Jaspersoft 4.5:
Powerful Analytics for All Your Data

This new release provides a single, easy-to-use environment designed with the non-technical user in mind — delivering insight to data stored in relational, OLAP, and Big Data environments.New in Jaspersoft 4.5

Broad and Deep Big Data Connectivity
Intuitive drag and drop web UI for performing reporting and analysis against Hadoop, MongoDB, Cassandra, and many more.

Improved Ad Hoc Reporting and Analysis
Non-technical users can perform their own investigation.

Supercharged Analytic Performance
Enhanced push-down query processing and In-Memory Analysis engine improves response times for aggregation and summary queries.

Join us for an in-depth review and demo, showcasing the new features for self-service BI across any data source.

For more information on Jaspersoft 4.5, or any Jaspersoft solution, contact at sales@jaspersoft.com, or            415-348-2380      .

 

Download Jaspersoft 4.5 Today.

. Download your free 30 day evaluation trial now.

download now button

 

 

Augustus- a PMML model producer and consumer. Scoring engine.

A Bold GNU Head
Image via Wikipedia

I just checked out this new software for making PMML models. It is called Augustus and is created by the Open Data Group (http://opendatagroup.com/) , which is headed by Robert Grossman, who was the first proponent of using R on Amazon Ec2.

Probably someone like Zementis ( http://adapasupport.zementis.com/ ) can use this to further test , enhance or benchmark on the Ec2. They did have a joint webinar with Revolution Analytics recently.

https://code.google.com/p/augustus/

Recent News

  • Augustus v 0.4.3.1 has been released
  • Added a guide (pdf) for including Augustus in the Windows System Properties.
  • Updated the install documentation.
  • Augustus 2010.II (Summer) release is available. This is v 0.4.2.0. More information is here.
  • Added performance discussion concerning the optional cyclic garbage collection.

See Recent News for more details and all recent news.

Augustus

Augustus is a PMML 4-compliant scoring engine that works with segmented models. Augustus is designed for use with statistical and data mining models. The new release provides Baseline, Tree and Naive-Bayes producers and consumers.

There is also a version for use with PMML 3 models. It is able to produce and consume models with 10,000s of segments and conforms to a PMML draft RFC for segmented models and ensembles of models. It supports Baseline, Regression, Tree and Naive-Bayes.

Augustus is written in Python and is freely available under the GNU General Public License, version 2.

See the page Which version is right for me for more details regarding the different versions.

PMML

Predictive Model Markup Language (PMML) is an XML mark up language to describe statistical and data mining models. PMML describes the inputs to data mining models, the transformations used to prepare data for data mining, and the parameters which define the models themselves. It is used for a wide variety of applications, including applications in finance, e-business, direct marketing, manufacturing, and defense. PMML is often used so that systems which create statistical and data mining models (“PMML Producers”) can easily inter-operate with systems which deploy PMML models for scoring or other operational purposes (“PMML Consumers”).

Change Detection using Augustus

For information regarding using Augustus with Change Detection and Health and Status Monitoring, please see change-detection.

Open Data

Open Data Group provides management consulting services, outsourced analytical services, analytic staffing, and expert witnesses broadly related to data and analytics. It has experience with customer data, supplier data, financial and trading data, and data from internal business processes.

It has staff in Chicago and San Francisco and clients throughout the U.S. Open Data Group began operations in 2002.


Overview

The above example contains plots generated in R of scoring results from Augustus. Each point on the graph represents a use of the scoring engine and a chart is an aggregation of multiple Augustus runs. A Baseline (Change Detection) model was used to score data with multiple segments.

Typical Use

Augustus is typically used to construct models and score data with models. Augustus includes a dedicated application for creating, or producing, predictive models rendered as PMML-compliant files. Scoring is accomplished by consuming PMML-compliant files describing an appropriate model. Augustus provides a dedicated application for scoring data with four classes of models, Baseline (Change Detection) ModelsTree ModelsRegression Models and Naive Bayes Models. The typical model development and use cycle with Augustus is as follows:

  1. Identify suitable data with which to construct a new model.
  2. Provide a model schema which proscribes the requirements for the model.
  3. Run the Augustus producer to obtain a new model.
  4. Run the Augustus consumer on new data to effect scoring.

Separate consumer and producer applications are supplied for Baseline (Change Detection) models, Tree models, Regression models and for Naive Bayes models. The producer and consumer applications require configuration with XML-formatted files. The specification of the configuration files and model schema are detailed below. The consumers provide for some configurability of the output but users will often provide additional post-processing to render the output according to their needs. A variety of mechanisms exist for transmitting data but user’s may need to provide their own preprocessing to accommodate their particular data source.

In addition to the producer and consumer applications, Augustus is conceptually structured and provided with libraries which are relevant to the development and use of Predictive Models. Broadly speaking, these consist of components that address the use of PMML and components that are specific to Augustus.

Post Processing

Augustus can accommodate a post-processing step. While not necessary, it is often useful to

  • Re-normalize the scoring results or performing an additional transformation.
  • Supplements the results with global meta-data such as timestamps.
  • Formatting of the results.
  • Select certain interesting values from the results.
  • Restructure the data for use with other applications.

Complex Event Processing- SASE Language

Logo of the anti-RFID campaign by German priva...
Image via Wikipedia

Complex Event Processing (CEP- not to be confused by Circular Probability Error) is defined processing many events happening across all the layers of an organization, identifying the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time.

Software supporting CEP are-

Oracle http://www.oracle.com/us/technologies/soa/service-oriented-architecture-066455.html

Oracle CEP is a Java application server for the development and deployment of high-performance event driven applications. It can detect patterns in the flow of events and message payloads, often based on filtering, correlation, and aggregation across event sources, and includes industry leading temporal and ordering capabilities. It supports ultra-high throughput (1 million/sec++) and microsecond latency.

Tibco is also trying to get into this market (it claims to have a 40 % market share in the public CEP market 😉 though probably they have not measured the DoE and DoD as worthy of market share yet

– see webcast by TIBCO ‘s head here http://www.tibco.com/products/business-optimization/complex-event-processing/default.jsp

and product info here-http://www.tibco.com/products/business-optimization/complex-event-processing/businessevents/default.jsp

TIBCO is the undisputed leader in complex event processing (CEP) software with over 40 percent market share, according to a recent IDC Study.

A good explanation of how social media itself can be used as an analogy for CEP is given in this SAS Global Paper

http://support.sas.com/resources/papers/proceedings10/040-2010.pdf

You can see a report on Predictive Analytics and Data Mining  in q1 2010 also from SAS’s website  at –http://www.sas.com/news/analysts/forresterwave-predictive-analytics-dm-104388-0210.pdf

A very good explanation on architecture involved is given by SAS CTO Keith Collins here on SAS’s Knowledge Exchange site,

http://www.sas.com/knowledge-exchange/risk/four-ways-divide-conquer.html

What it is: Methods 1 through 3 look at historical data and traditional architectures with information stored in the warehouse. In this environment, it often takes months of data cleansing and preparation to get the data ready to analyze. Now, what if you want to make a decision or determine the effect of an action in real time, as a sale is made, for instance, or at a specific step in the manufacturing process. With streaming data architectures, you can look at data in the present and make immediate decisions. The larger flood of data coming from smart phones, online transactions and smart-grid houses will continue to increase the amount of data that you might want to analyze but not keep. Real-time streaming, complex event processing (CEP) and analytics will all come together here to let you decide on the fly which data is worth keeping and which data to analyze in real time and then discard.

When you use it: Radio-frequency identification (RFID) offers a good user case for this type of architecture. RFID tags provide a lot of information, but unless the state of the item changes, you don’t need to keep warehousing the data about that object every day. You only keep data when it moves through the door and out of the warehouse.

The same concept applies to a customer who does the same thing over and over. You don’t need to keep storing data for analysis on a regular pattern, but if they change that pattern, you might want to start paying attention.

Figure  4: Traditional architecture vs. streaming architecture

Figure 4: Traditional architecture vs. streaming architecture

 

In academia  here is something called SASE Language

  • A rich declarative event language
  • Formal semantics of the event language
  • Theorectical underpinnings of CEP
  • An efficient automata-based implementation

http://sase.cs.umass.edu/

and

http://avid.cs.umass.edu/sase/index.php?page=navleft_1col

Financial Services

The query below retrieves the total trading volume of Google stocks in the 4 hour period after some bad news occurred.

PATTERN SEQ(News a, Stock+ b[ ])WHERE   [symbol]    AND	a.type = 'bad'    AND	b[i].symbol = 'GOOG' WITHIN  4 hoursHAVING  b[b.LEN].volume < 80%*b[1].volumeRETURN  sum(b[ ].volume)

The next query reports a one-hour period in which the price of a stock increased from 10 to 20 and its trading volume stayed relatively stable.

PATTERN	SEQ(Stock+ a[])WHERE 	 [symbol]   AND	  a[1].price = 10   AND	  a[i].price > a[i-1].price   AND	  a[a.LEN].price = 20            WITHIN  1 hourHAVING	avg(a[].volume) ≥ a[1].volumeRETURN	a[1].symbol, a[].price

The third query detects a more complex trend: in an hour, the volume of a stock started high, but after a period of price increasing or staying relatively stable, the volume plummeted.

PATTERN SEQ(Stock+ a[], Stock b)WHERE 	 [symbol]   AND	  a[1].volume > 1000   AND	  a[i].price > avg(a[…i-1].price))   AND	  b.volume < 80% * a[a.LEN].volume           WITHIN  1 hourRETURN	a[1].symbol, a[].(price,volume), b.(price,volume)

(note from Ajay-

 

I was not really happy about the depth of resources on CEP available online- there seem to be missing bits and pieces in both open source, academic and corporate information- one reason for this is the obvious military dual use of this technology- like feeds from Satellite, Audio Scans, etc)

Creating a Blog Aggregator for free

I discovered an increasing trend of Blog Aggregators ( Blog Lists have been around for a long time). Several sites come in this category- http://bigdatanews.com/ (which is a GreenPlum /AsterData sponsored site on Big Data) , http://mapreduce.org/ (which is a site on MapReduce that has an inbuilt blog aggregator but is more of a community website – I will explain the difference below) , http://www.r-bloggers.com/ (which is an excellent aggregation of 69 R Bloggers sites that is built by Tal and is currently not sponsored/independent) and http://smartdatacollective.com/ which is sponsored by Teradata and uses http://wordframe.com which is a paid software) and some others like http://www.biblogs.com/ (which is Adsense supported 🙂 ) and http://javablogs.com/Welcome.jspa (an aggregator on Java) (or even http://feministblogs.org/).

CMS Based Blog Aggregator

CMS Based Blog Aggregator/ Community Site

CMS Based Community Site with an inbuilt Blog Aggregator feed

I am noting blog aggregator as a distinct website that pulls in automated content from RSS feeds , may or maynot be moderated, and usually revolves around a certain domain or topic. It is slightly different from community websites which have Lists of Blogs as part of many other features, and boutique collection of blogs like http://www.b-eye-network.com/blogs/index.php

and Intelligent Enterprise ( http://intelligent-enterprise.informationweek.com/blog/index.jhtml) as they have selected authors and have more than Blogs as their featured content including News etc. Since community is a buzz word- many websites claim to be community websites while retaining the look and feel of a CMS- Blog Aggregator.

WordPress Enabled Blog Aggregator

Anyways, if you have a WordPress Installation- you can create a Blog Aggregator for free. Basically there is a wordpress-plugin called FeedWordPress http://wordpress.org/extend/plugins/feedwordpress/

Doing so you can simply addin as many RSS feeds as you like –

(see a screenshot below).

Of course – you can use Twitterfeed to create a Twitter Aggregator/ Fire Hose that simply pulls in Post Titles, and can link them using Facebook-LinkedIn-Twitter apps to your RSS feed of the aggregated website. 🙂

Building a website /content aggregator is just a few clicks away and free for anyone with a website and some passion for a topic. It is really free and painless 🙂

Automated Content Aggregation

Here is a way to create Content Aggregator-

1) Create a Google Alert for keywords (like say MapReduce and Hadoop)

at http://google.com/alerts

2) Create a WordPress.com free blog and enable the Auto Post by Email Feature (using manage my blogs in Settings- Reading in Left Bottom Margin). This creates an email with random numbers that can enable you for sending emails and thus auto posting.

3) Create an Email Filter in your Gmail Account to auto forward this to the post by Email Account

Note you can thus create an auto aggregator and pick and choose for creating a Keyword specific or domain specific newsletter. It is all free and takes 15 minutes to set up, and is then automated.