DECISION STATS

Experimental Ad AuDio-Video

As an experiment I will be putting Random Images /U Tube songs in the next 7 posts/ post this week. This would be viewable only by reaching my site and not the RSS ( now restored to full rather than summary).

Let me know if the server hangs ( sigh!!) or if you find them distracting.

or if you know a better song.

So what happened to S Plus

Splus – The corporate version of S ( the predecessor of R) is still being marketed by Tibco corporation- again rumoured to be an acquisition target of (???)

SAS ( who have desired R like capabilties especially in their IML product to be released soon
SAP who lost out to IBM in the SPSS acquisition

Oracle

Microsoft

Rogue Wave (acquirer of Visual Numerics)

etc etc.

Anyways S Plus is still alive and kicking-

“The S language and the S+ application have been critical to our ability to manage big data objects intrinsic to wind analytics and wind energy development,” said Brad Horn, Director of Wind Analytics at NextEra Energy. “We credit our long-term interface and Spotfire consulting with unlocking new ideas and sources of value. Joint dialogue on configuration alternatives and our recent efforts to restructure legacy code is allowing us to transition from simple interactive use of S+ to a customized S+ configuration with integrated batch processing, server load balancing, and parallel processing. S+ has a central role in supporting internal decisions and our group emphasis on scale, speed, and quality.”

http://spotfire.tibco.com/news/press-releases/2009/2_17_2009.aspx

Wavelets, Spatial Stats, EnvironmentalStats: Apply statistics for advanced analysis of signal and image data, spatially correlated data, and environmental data.

Resampling: Apply resampling techniques, such as bootstrap and permutation tests, to enable the use of standard statistics on smaller data sets.

Association Rules: Uncover relationships between variables in large data sets, most commonly to detect purchase patterns (Market Basket Analysis), or in many other areas like web site usage analysis.

Recode Values: Easily handle and prepare data from multiple sources by changing the values in a column to a new value.

Deployment and Integration:

Spotfire Integration: Read and write Spotfire Text Data files, and leverage examples of using Spotfire Professional to visualize, explore and share model results.

Custom Java & C++ nodes: Extend Spotfire Miner by writing custom nodes in Java and C++.

Remote Script Execution: Execute S+ scripts remotely on S+ Server to offload and distribute intensive jobs.

Global Worksheet Parameters: Make workflows more flexible and reusable to interactive and batch applications.

FlexBayes: Create more realistic models, provide a natural way to address missing data, and take advantage of prior analysis.

Data Access and Preparation:

New Data File Types: Unlock more data sources by reading new formats including Spotfire Text Data, Microsoft Excel 2007, Microsoft Access 2007, and Matlab 7.

JDBC Access: Access new data sources for analysis with data import and export via the sjdbc library in Spotfire S+ 8.1.

Citation:

http://spotfire.tibco.com/Products/S-Plus-Overview.aspx

http://spotfire.tibco.com/Products/Whatsnew-Splus.aspx

Webfocus RStat: Pervasive BI using R

Here is a great reporting and BI tool from Information Builders and uses the Rattle R GUI ( covered earlier here http://www.decisionstats.com/2009/01/13/interview-dr-graham-williams/).

So if you are looking for generation next reporting solution here is one called WebFocus RStat.

Citation:

http://www.informationbuilders.com/products/webfocus/predictivemodeling.html

Predict the Future and Make Effective Decisions Today

Traditional reporting solutions provide a clear picture of past occurrences, but have little power to shed light on the future. The ability to anticipate and prepare for upcoming events can greatly impact the decisions that need to be made today.

WebFOCUS RStat is the market’s first fully-integrated business intelligence and data mining environment, seamlessly bridging the gap between backward and forward-facing views of business operations. With WebFOCUS RStat, companies can easily and cost-effectively deploy predictive models as intuitive scoring applications. So business users at all levels can make decisions based on accurate, validated future predictions, instead of relying on gut instinct alone.

WebFOCUS RStat provides a single platform for BI, data modeling, and scoring. This eliminates the need to purchase and maintain multiple tools, and frees analysts and other statisticians from spending countless hours extracting and querying data. At the same time, it reduces costs, simplifies maintenance, and optimizes IT resources.

But, the greatest benefit WebFOCUS RStat offers is significantly increased accuracy. With the R engine – a powerful and flexible open source statistical programming language – as its underlying analysis tool, WebFOCUS RStat can deliver results that are consistent, complete, and correct – every time.

WebFOCUS RStat provides:

A single tool, fully integrated with Developer Studio and WebFOCUS Reporting Servers with access to over 300 data sources, for both BI developers and data miners

Comprehensive data exploration, descriptive statistics, and interactive graphs

In-depth data visualization and transformation

Hypothesis testing, clustering, and correlation analysis

Other key WebFOCUS RStat features include:

The ability to build and export models for prediction and classification

Comprehensive model evaluation

Incidently the parent company which is based in Tennessee has some interesting numbers-

http://www.informationbuilders.com/about_us/index.html

Company At A Glance

$300 million in revenue

Over 30 years of experience

More than 1,400 employees

Over 12,000 customers

Over 350 business partners

47 offices and 26 worldwide distributors

Rapid application creation through easy incorporation of scoring routines into WebFOCUS reports

See Also-

http://www.informationbuilders.com/cgi-shell/press/intpr/f_intpr.pl?intpr_code=06_03_08_rstat

http://rattle.togaware.com/

Poem: The Fine Print

did you read the fine print
when you signed your life away
or did you believe them badly
when they said your life was good to give today

did all the drums, the ribbons and the music
tilt your head to emotion away from fact
and did the inherent absurdity of it all
was swallowed by you intact

for as the world spins tilted
around the bright unforgiving sun
words in a language built to deceive
mask the coming pain below the frosting of fun

deception is the game here
and an unwilling player you have to be
fool them or be fooled in turn
reality is spotless for you to see

what old promises where tokens of love
it is all cash and carry now
as willed in your destiny from above

and even though eyes grow misty
by potential of what could be
you keep one eye on the rolling ball
lest more surprises it brings to see

Interview Gregory Piatetsky KDNuggets.com

Here is an interviw with Gregory Piatetsky, founder and editor of KDNuggets (www.KDnuggets.com ) ,the oldest and biggest independent industry websites in terms of data mining and analytics-

gps6

Ajay- Please describe your career in science, many challenges and rewards that came with it. Name any scientific research, degrees teaching etc.

Gregory- I was born in Moscow, Russia and went to a top math high-school in Moscow. A unique challenge for me was that my father was one of leading mathematicians in Soviet Union. While I liked math (and still do), I quickly realized while still in high school that I will never be as good as my father, and math career was not for me.

Fortunately, I discovered computers and really liked the process of programming and solving applied problems. At that time (late 1970s) computers were not very popular and it was not clear that one can make a career in computers. However I was very lucky that I was able to pursue what I liked and find demand for my skills.

I got my MS in 1979 and PhD in 1984 in Computer Science from New York University.
I was interested in AI (perhaps thanks to a lot of science fiction I read as a kid), but found a job in databases, so I was looking for ways to combine them.

In 1984 I joined GTE Labs where I worked on research in databases and AI, and in 1989 started the first project on Knowledge Discovery in data. To help convince my management that there will be a demand for this thing
called “data mining” (GTE management did not see much future for it), I also organized a AAAI workshop on the topic.

I thought “data mining” is not sexy enough name, and so I called it “Knowledge Discovery in Data”, or KDD. Since 1989, I was working on KDD and data mining in all aspects – more on my page www.kdnuggets.com/gps.html

Ajay- How would you encourage a young science entrepreneur in this recession.

Gregory- Many great companies were started or grew in a recession, e.g.
http://www.insidecrm.com/features/businesses-started-slump-111108/

Recession may be compared to a brush fire which removes dead wood and allows new trees to grow.

Ajay- What prompted you to set up KD Nuggets? Any reasons for the name (kNowledge Discovery Nuggets). Describe some key milestones in this iconic website for data mining people.

Gregory- After a third KDD workshop in 1993 I started a newsletter to connect about 50 people who attended the workshop and possibly others who were interested in data mining and KDD. The idea was that it will have short items or “nuggets” of information. Also, at that time a popular metaphor for data miner was gold miners who were looking for gold “nuggets”. So, I wanted a newsletter with “nuggets” – short, valuable items about Knowledge Discovery. Thus, the name KDnuggets.

In 1994 I created a website on data mining at GTE and in 1997, after I left GTE , I moved it to the current domain name www.kdnuggets.com .

In 1999, I was working for startup which provided data mining services to financial industry. However, because of Y2K issues, all banks etc froze their systems in the second half of 1999, and we had very little work (and our salaries were reduced as well). I decided that I will try to get some ads and was able to get companies like SPSS and Megaputer to advertise.

Since 2001, I am an independent consultant and KDnuggets is only part of what I am doing. I also do data mining consulting, and actively participate in SIGKDD (Director 1998-2005, Chair 2005-2009).

Some people think that KDnuggets is a large company, with publisher, webmaster, editor, ad salesperson, billing dept, etc. KDnuggets indeed has all this functions, but it is all me and my two cats.

Ajay- I am impressed by the fact KD nuggets is almost a dictionary or encyclopedia for data mining. But apart from advertising you have not been totally commercial- many features of your newsletter remain ad free – you still maintain a minimalistic look and do not take sponsership aligned with one big vendor. What is your vision for KD Nuggets for the years to come to keep it truly independent.

Gregory- My vision for KDnuggets is to be a comprehensive resource for data mining community, and I really enjoyed maintaining such resource for the first 7-8 years completely non-commercially. However, when I became self -employed, I could not do KDnuggets without any income, so I selectively introduced ads, and only those which are relevant to data mining.

I like to think of KDnuggets as a Craiglist for data mining community.

I certainly realize the importance of social media and Web 2.0 (and interested people can follow my tweets at tweeter.com/kdnuggets) and plan to add more social features to KDnuggets.

Still, just like Wikipedia and Facebook do not make New York Times obsolete, I think there is room and need for an edited website, especially for such a nerdy and not very social group like data miners.

Ajay- What is the worst mistake/error in writing publishing that you did. What is the biggest triumph or high moment in the Nuggets history.

Gregory- My biggest mistake is probably in choosing the name kdnuggets – in retrospect, I could have used a shorter and easier to spell domain name, but in 1997 I never expected that I will still be publishing www.KDnuggets.com 12 years later.

Ajay- Who are your favourite data mining students ( having known so many people). What qualities do you think set a data mining person apart from other sceinces.

Gregory- I was only an adjunct professor for a short time, so I did not really have data mining students, but I was privileged enough to know many current data mining leaders when they were students. Among more recent students, I am very impressed with Jure Leskovec, who just finished his PhD and got the best KDD dissertation award.

Ajay- What does Gregory Piatetsky do for fun when he is not informing the world on analytics and knowledge discovery.

Gregory- I enjoy travelling with my family, and in the summer I like biking and windsurfing.
I also read a lot, and currently in the middle of reading Proust (which I periodically dilute by other, lighter books).

Ajay- What is your favourite reading blog and website ? Any India plans to visit.
Gregory– I visit many blogs on www.kdnuggets.com/websites/blogs.html

and I like especially
– Matthew Hurst blog: Data Mining: Text Mining, Visualization, and Social Media
– Occam’s Razor by Avinash Kaushik, examining web analytics.
– Juice Analytics, blogging about analytics and visualization
– Geeking with Greg, exploring the future of personalized information.

I also like your website decisionstats.com and plan to visit it more frequently

I visited many countries, but not yet India – waiting for the right occasion !

Biography–

(http://www.kdnuggets.com/gps.html)

Gregory Piatetsky-Shapiro, Ph.D. is the President of KDnuggets, which provides research and consulting services in the areas of data mining, web mining, and business analytics. Gregory is considered to be one of the founders of the data mining and knowledge discovery field.Gregory edited or co-edited many collections on data mining and knowledge discovery, including two best-selling books: Knowledge Discovery in Databases (AAAI/MIT Press, 1991) and Advances in Knowledge Discovery in Databases (AAAI/MIT Press, 1996), and has over 60 publications in the areas of data mining, artificial intelligence and database research.

Gregory is the founder of Knowledge Discovery in Database (KDD) conference series. He organized and chaired the first three Knowledge Discovery in Databases (KDD) workshops in 1989, 1991, and 1993. He then served as the Chair of KDD Steering committee and guided the conversion of KDD workshops into leading international conferences on data mining. He also was the General Chair of the KDD-98 conference.

Interview Tasso Argyros CTO Aster Data Systems

Here is an interview with Tasso Argyros,the CTO and co-founder of Aster Data Systems (www.asterdata.com ) .Aster Data Systems is one of the first DBMS to tightly integrate SQL with MapReduce.

tassos_argyros

Ajay- Maths and Science students the world over are facing a major decline. What would you recommend to young students to get careers in science.

[TA] –My father is a professor of Mathematics and I spent a lot of my college time studying advanced math. What I would say to new students is that Math is not a way to get a job, it’s a way to learn how to think. As such, a Math education can lead to success in any discipline that requires intellectual abilities. As long as they take the time to specialize at some point – via postgraduate education or a job where they can learn a new discipline from smart people – they won’t regret the investment.

Ajay- Describe your career in Science particularly your time at Stanford. What made you think of starting up Asterdata. How important is it for a team rather than an individual to begin startups. Could you describe the startup moment when your team came together.

[TA] – While at Stanford I became very familiar with the world of startups through my advisor, David Cheriton (who was an angel investor in VMWare, Google and founder of two successful companies). My research was about processing large amounts of data on large, low-cost computer farms. A year into my research it became obvious that this approach had huge processingpower advantages and it was superior to anything else I could see in the marketplace. I then happened to meet my other two co-founders, Mayank Bawa & George Candea who were looking at a similar technical problem from the database and reliability perspective, respectively.

I distinctly remember George walking into my office one day (I barely knew him back then) and saying “I want talk to you about startups and the future” – the rest has become history.

Ajay- How would you describe your product Aster nCluster Cloud Edition to omebody who does not anything beyond the Traditional Server/ Datawarehouse technologies. Could you rate it against some known vendors and give a price point specific to what level of usage does the Total Cost of Ownership in Asterdata becomes cheaper than a say Oracle or a SAP or a Microsoft Datawarehosuing solution.

[TA]- Aster allows businesses to reduce the data analytics TCO in two interesting ways. First, it has a much lower hardware cost than any traditional DW technology because of its use of commodity servers or cloud infrastructure like Amazon EC2. Secondly, Aster has implemented a lot of innovations that simplify the (previously tedious and expensive) management of the system, which includes scaling the system elastically up/down as needed – so they are not paying for capacity they don’t need at a given point in time.

But cutting costs is one side of the equation; what makes me even more excited is the ability to make a business more profitable, competitive and efficient through analyzing more data at greaterdepth. We have customers that have cut their costs and increased their customers and revenue by using Aster to analyze their valuable (and usually underutilized) data. If you have data – and you think you’re not taking full advantage of it – Aster can help.

Ajay- I have always have this one favourite question.When can I analyze 100 giga bytes of data using just a browser and some statistical software like R or advanced forecasting softwares that are available.Describe some of Asterdata ‘s work in enhancing the analytical capabilities of big data.

Can I run R ( free -open source) on an on demand basis for an Asterdata solution. How much would it cost me to crunch 100 gb of data and make segmentations and models with say 50 hours of processing time per month

[TA]- One of the big innovations that Aster does it to allow analytical applications like R to be embedded in the database via our SQL/MapReduce framework. We actually have customers right now that are using R to do advanced analytics over terabytes of data. 100GB is actually on the lower end of what our software can enable and as such the cost would not be significant.

Ajay- What do people at Asterdata do when not making complex software.

[TA]- A lot of Asterites love to travel around the world – we are, after all, a very diverse company. We also love coffee, Indian food as well as international and US sports like soccer, cricket, cycling,and football!

Ajay- Name some competing products to Asterdata and where Asterdata products are more suitable for a TCO viewpoint. Name specific areas where you would not recommend your own products.

[TA]- We go against products like Orace database, Teradata and IBM DB2. If you need to do analytics over 100s of GBs or terabytes of data, our price/performance ratio would be orders of magnitude better.

Ajay- How do you convince named and experienced VC’s Sequia Capital to invest in a start-up ( eg I could do with some server costs coming financing)

[TA]- You need to convince Sequoia of three things. (a) that the market you’re going after is very large (in the billions of dollars, if you’re successful). (b) that your team is the best set of people that could ever come together to solve the particular problem you’re trying to solve. And (c) that the technology you’ve developed gives you an “unfair advantage” over incumbents or new market entrants. Most importantly, you have to smile a lot! J

Biography

About Tasso:

Tasso (Tassos) Argyros is the CTO and co-founder of Aster Data Systems, where he is responsible for all product and engineering operations of the company. Tasso was recently recognized as one ofBusinessWeek’s Best Young Tech Entrepreneurs for 2009 and was an SAP fellow at the Stanford Computer Science department. Prior to Aster, Tasso was pursuing a Ph.D. in the Stanford Distributed Systems Group with a focus on designing cluster architectures for fast, parallel data processing using large farms of commodity servers. He holds an MsC in Computer Science from Stanford University and a Diploma in Computer and Electrical Engineering from Technical University of Athens.

About Aster:

Aster Data Systems is a proven leader in high-performance database systems for data warehousing and analytics – the first DBMS to tightly integrate SQL with MapReduce – providing deep insights on data analyzed on clusters of low-cost commodity hardware.

The Aster nCluster database cost-effectively powers frontline analytic applications for companies such as MySpace, aCerno (an Akamai company), and ShareThis. Running on low-cost off-the-shelf hardware, and providing ‘hands-free’ administration, Aster enables enterprises to meet their data warehousing needs within their budget.

Aster is headquartered in San Carlos, California and is backed by Sequoia Capital, JAFCO Ventures, IVP, Cambrian Ventures, and First-Round Capital, as well as industry visionaries including David Cheriton, Rajeev Motwani and Ron Conway.

Aster_logo_3.0_red

TeraData buys AsterData for 260+ million $ (decisionstats.com)
Teradata, Aster Data, and Teradata/Aster (dbms2.com)

Please share:

Please share:

Predict the Future and Make Effective Decisions Today

Company At A Glance

Please share:

Please share:

Please share:

Related Articles

Please share:

Please share: