I use the simple-tags plugin in WordPress for automatically creating and posting tags. I am hoping this makes the site better to navigate. Given the fact that I had not been a very efficient tagger before, this plugin can really be useful for someone in creating tags for more than 100 (or 1000 posts) especially WordPress based blog aggregators.
Announcing Jaspersoft 4.5: Powerful Analytics for All Your Data
This new release provides a single, easy-to-use environment designed with the non-technical user in mind — delivering insight to data stored in relational, OLAP, and Big Data environments.New in Jaspersoft 4.5
Broad and Deep Big Data Connectivity
Intuitive drag and drop web UI for performing reporting and analysis against Hadoop, MongoDB, Cassandra, and many more.
Improved Ad Hoc Reporting and Analysis
Non-technical users can perform their own investigation.
Supercharged Analytic Performance
Enhanced push-down query processing and In-Memory Analysis engine improves response times for aggregation and summary queries.
I just checked out this new software for making PMML models. It is called Augustus and is created by the Open Data Group (http://opendatagroup.com/) , which is headed by Robert Grossman, who was the first proponent of using R on Amazon Ec2.
Probably someone like Zementis ( http://adapasupport.zementis.com/ ) can use this to further test , enhance or benchmark on the Ec2. They did have a joint webinar with Revolution Analytics recently.
Augustus is a PMML 4-compliant scoring engine that works with segmented models. Augustus is designed for use with statistical and data mining models. The new release provides Baseline, Tree and Naive-Bayes producers and consumers.
There is also a version for use with PMML 3 models. It is able to produce and consume models with 10,000s of segments and conforms to a PMML draft RFC for segmented models and ensembles of models. It supports Baseline, Regression, Tree and Naive-Bayes.
Augustus is written in Python and is freely available under the GNU General Public License, version 2.
Predictive Model Markup Language (PMML) is an XML mark up language to describe statistical and data mining models. PMML describes the inputs to data mining models, the transformations used to prepare data for data mining, and the parameters which define the models themselves. It is used for a wide variety of applications, including applications in finance, e-business, direct marketing, manufacturing, and defense. PMML is often used so that systems which create statistical and data mining models (“PMML Producers”) can easily inter-operate with systems which deploy PMML models for scoring or other operational purposes (“PMML Consumers”).
Change Detection using Augustus
For information regarding using Augustus with Change Detection and Health and Status Monitoring, please see change-detection.
Open Data Group provides management consulting services, outsourced analytical services, analytic staffing, and expert witnesses broadly related to data and analytics. It has experience with customer data, supplier data, financial and trading data, and data from internal business processes.
It has staff in Chicago and San Francisco and clients throughout the U.S. Open Data Group began operations in 2002.
The above example contains plots generated in R of scoring results from Augustus. Each point on the graph represents a use of the scoring engine and a chart is an aggregation of multiple Augustus runs. A Baseline (Change Detection) model was used to score data with multiple segments.
Augustus is typically used to construct models and score data with models. Augustus includes a dedicated application for creating, or producing, predictive models rendered as PMML-compliant files. Scoring is accomplished by consuming PMML-compliant files describing an appropriate model. Augustus provides a dedicated application for scoring data with four classes of models, Baseline (Change Detection) Models, Tree Models, Regression Models and Naive Bayes Models. The typical model development and use cycle with Augustus is as follows:
Identify suitable data with which to construct a new model.
Provide a model schema which proscribes the requirements for the model.
Run the Augustus producer to obtain a new model.
Run the Augustus consumer on new data to effect scoring.
Separate consumer and producer applications are supplied for Baseline (Change Detection) models, Tree models, Regression models and for Naive Bayes models. The producer and consumer applications require configuration with XML-formatted files. The specification of the configuration files and model schema are detailed below. The consumers provide for some configurability of the output but users will often provide additional post-processing to render the output according to their needs. A variety of mechanisms exist for transmitting data but user’s may need to provide their own preprocessing to accommodate their particular data source.
In addition to the producer and consumer applications, Augustus is conceptually structured and provided with libraries which are relevant to the development and use of Predictive Models. Broadly speaking, these consist of components that address the use of PMML and components that are specific to Augustus.
Augustus can accommodate a post-processing step. While not necessary, it is often useful to
Re-normalize the scoring results or performing an additional transformation.
Supplements the results with global meta-data such as timestamps.
Formatting of the results.
Select certain interesting values from the results.
Restructure the data for use with other applications.
Complex Event Processing (CEP- not to be confused by Circular Probability Error) is defined processing many events happening across all the layers of an organization, identifying the most meaningful events within the event cloud, analyzing their impact, and taking subsequent action in real time.
Oracle CEP is a Java application server for the development and deployment of high-performance event driven applications. It can detect patterns in the flow of events and message payloads, often based on filtering, correlation, and aggregation across event sources, and includes industry leading temporal and ordering capabilities. It supports ultra-high throughput (1 million/sec++) and microsecond latency.
Tibco is also trying to get into this market (it claims to have a 40 % market share in the public CEP market 😉 though probably they have not measured the DoE and DoD as worthy of market share yet
What it is: Methods 1 through 3 look at historical data and traditional architectures with information stored in the warehouse. In this environment, it often takes months of data cleansing and preparation to get the data ready to analyze. Now, what if you want to make a decision or determine the effect of an action in real time, as a sale is made, for instance, or at a specific step in the manufacturing process. With streaming data architectures, you can look at data in the present and make immediate decisions. The larger flood of data coming from smart phones, online transactions and smart-grid houses will continue to increase the amount of data that you might want to analyze but not keep. Real-time streaming, complex event processing (CEP) and analytics will all come together here to let you decide on the fly which data is worth keeping and which data to analyze in real time and then discard.
When you use it: Radio-frequency identification (RFID) offers a good user case for this type of architecture. RFID tags provide a lot of information, but unless the state of the item changes, you don’t need to keep warehousing the data about that object every day. You only keep data when it moves through the door and out of the warehouse.
The same concept applies to a customer who does the same thing over and over. You don’t need to keep storing data for analysis on a regular pattern, but if they change that pattern, you might want to start paying attention.
Figure 4: Traditional architecture vs. streaming architecture
In academia here is something called SASE Language
The query below retrieves the total trading volume of Google stocks in the 4 hour period after some bad news occurred.
PATTERN SEQ(News a, Stock+ b[ ])WHERE [symbol] AND a.type = 'bad' AND b[i].symbol = 'GOOG' WITHIN 4 hoursHAVING b[b.LEN].volume < 80%*b.volumeRETURN sum(b[ ].volume)
The next query reports a one-hour period in which the price of a stock increased from 10 to 20 and its trading volume stayed relatively stable.
PATTERN SEQ(Stock+ a)WHERE [symbol] AND a.price = 10 AND a[i].price > a[i-1].price AND a[a.LEN].price = 20 WITHIN 1 hourHAVING avg(a.volume) ≥ a.volumeRETURN a.symbol, a.price
The third query detects a more complex trend: in an hour, the volume of a stock started high, but after a period of price increasing or staying relatively stable, the volume plummeted.
PATTERN SEQ(Stock+ a, Stock b)WHERE [symbol] AND a.volume > 1000 AND a[i].price > avg(a[…i-1].price)) AND b.volume < 80% * a[a.LEN].volume WITHIN 1 hourRETURN a.symbol, a.(price,volume), b.(price,volume)
(note from Ajay-
I was not really happy about the depth of resources on CEP available online- there seem to be missing bits and pieces in both open source, academic and corporate information- one reason for this is the obvious military dual use of this technology- like feeds from Satellite, Audio Scans, etc)
CMS Based Community Site with an inbuilt Blog Aggregator feed
I am noting blog aggregator as a distinct website that pulls in automated content from RSS feeds , may or maynot be moderated, and usually revolves around a certain domain or topic. It is slightly different from community websites which have Lists of Blogs as part of many other features, and boutique collection of blogs like http://www.b-eye-network.com/blogs/index.php
and Intelligent Enterprise ( http://intelligent-enterprise.informationweek.com/blog/index.jhtml) as they have selected authors and have more than Blogs as their featured content including News etc. Since community is a buzz word- many websites claim to be community websites while retaining the look and feel of a CMS- Blog Aggregator.
Doing so you can simply addin as many RSS feeds as you like –
(see a screenshot below).
Of course – you can use Twitterfeed to create a Twitter Aggregator/ Fire Hose that simply pulls in Post Titles, and can link them using Facebook-LinkedIn-Twitter apps to your RSS feed of the aggregated website. 🙂
Building a website /content aggregator is just a few clicks away and free for anyone with a website and some passion for a topic. It is really free and painless 🙂
2) Create a WordPress.com free blog and enable the Auto Post by Email Feature (using manage my blogs in Settings- Reading in Left Bottom Margin). This creates an email with random numbers that can enable you for sending emails and thus auto posting.
3) Create an Email Filter in your Gmail Account to auto forward this to the post by Email Account
Note you can thus create an auto aggregator and pick and choose for creating a Keyword specific or domain specific newsletter. It is all free and takes 15 minutes to set up, and is then automated.