If you have a Dell PC, well just use PeerNet to configure and set up your own network around the neighbourhood. This is particularly applicable if you are in country that is both repressive and not so technologically advanced. Wont work in China or USA.
A peer network is a network in which one computer can connect directly to another computer. This capability is accomplished by enabling access point (AP) functionality on one of the computers. Other computers can then connect to this computer in the same way that they would connect to a physical AP. If Internet Connection Sharing is enabled on the computer that has the AP functionality, computers that connect to that computer have Internet connectivity as well.
A basic peer network, which requires no networking knowledge or experience to set up, should meet the needs of most home users and small businesses. By default, a basic peer network is configured with the strongest available security (see How do I set up a basic peer network?).
For users who are familiar with wireless networking technology, advanced configuration features are available to do the following:
IRC chat is less secure than you think it is thanks to BOT Trawlers- so I am hoping someone in the open source community updates Waste Again for encrypted chats http://wasteagain.sourceforge.net/
What is “WASTE again”?
“WASTE again” enables you to create a decentralized and secure private mesh network using an unsecure network, such as the internet. Once the public encryption keys are exchanged, sending messages, creating groupchats and transferring files is easy and secure.
Creating a mesh
To create a mesh you need at least two computers with “WASTE again” installed. During installation, a unique pair of public and private keys for each computer is being generated. Before the first connection can be established, you need to exchange these public keys. These keys enable “WASTE again” to authenticate every connection to other “WASTE again” clients.
After exchanging the keys, you simply type in the computers IP address to connect to. If that computer is located behind a firewall or a NAT-router, you have to create a portmap first to enable incoming connections.
At least one computer in your mesh has to be able to accept incoming connections, making it a “public node”. If no direct connection between two firewalled computers can be made, “WASTE again” automatically routes your traffic through one or more of the available public nodes.
Every new node simply has to exchange keys with one of the connected nodes and then connect to it. All the other nodes will exchange their keys automatically over the mesh.
I just checked out this new software for making PMML models. It is called Augustus and is created by the Open Data Group (http://opendatagroup.com/) , which is headed by Robert Grossman, who was the first proponent of using R on Amazon Ec2.
Probably someone like Zementis ( http://adapasupport.zementis.com/ ) can use this to further test , enhance or benchmark on the Ec2. They did have a joint webinar with Revolution Analytics recently.
Augustus is a PMML 4-compliant scoring engine that works with segmented models. Augustus is designed for use with statistical and data mining models. The new release provides Baseline, Tree and Naive-Bayes producers and consumers.
There is also a version for use with PMML 3 models. It is able to produce and consume models with 10,000s of segments and conforms to a PMML draft RFC for segmented models and ensembles of models. It supports Baseline, Regression, Tree and Naive-Bayes.
Augustus is written in Python and is freely available under the GNU General Public License, version 2.
Predictive Model Markup Language (PMML) is an XML mark up language to describe statistical and data mining models. PMML describes the inputs to data mining models, the transformations used to prepare data for data mining, and the parameters which define the models themselves. It is used for a wide variety of applications, including applications in finance, e-business, direct marketing, manufacturing, and defense. PMML is often used so that systems which create statistical and data mining models (“PMML Producers”) can easily inter-operate with systems which deploy PMML models for scoring or other operational purposes (“PMML Consumers”).
Change Detection using Augustus
For information regarding using Augustus with Change Detection and Health and Status Monitoring, please see change-detection.
Open Data Group provides management consulting services, outsourced analytical services, analytic staffing, and expert witnesses broadly related to data and analytics. It has experience with customer data, supplier data, financial and trading data, and data from internal business processes.
It has staff in Chicago and San Francisco and clients throughout the U.S. Open Data Group began operations in 2002.
The above example contains plots generated in R of scoring results from Augustus. Each point on the graph represents a use of the scoring engine and a chart is an aggregation of multiple Augustus runs. A Baseline (Change Detection) model was used to score data with multiple segments.
Augustus is typically used to construct models and score data with models. Augustus includes a dedicated application for creating, or producing, predictive models rendered as PMML-compliant files. Scoring is accomplished by consuming PMML-compliant files describing an appropriate model. Augustus provides a dedicated application for scoring data with four classes of models, Baseline (Change Detection) Models, Tree Models, Regression Models and Naive Bayes Models. The typical model development and use cycle with Augustus is as follows:
Identify suitable data with which to construct a new model.
Provide a model schema which proscribes the requirements for the model.
Run the Augustus producer to obtain a new model.
Run the Augustus consumer on new data to effect scoring.
Separate consumer and producer applications are supplied for Baseline (Change Detection) models, Tree models, Regression models and for Naive Bayes models. The producer and consumer applications require configuration with XML-formatted files. The specification of the configuration files and model schema are detailed below. The consumers provide for some configurability of the output but users will often provide additional post-processing to render the output according to their needs. A variety of mechanisms exist for transmitting data but user’s may need to provide their own preprocessing to accommodate their particular data source.
In addition to the producer and consumer applications, Augustus is conceptually structured and provided with libraries which are relevant to the development and use of Predictive Models. Broadly speaking, these consist of components that address the use of PMML and components that are specific to Augustus.
Augustus can accommodate a post-processing step. While not necessary, it is often useful to
Re-normalize the scoring results or performing an additional transformation.
Supplements the results with global meta-data such as timestamps.
Formatting of the results.
Select certain interesting values from the results.
Restructure the data for use with other applications.
Here is an interview with Prof Luis Torgo, author of the recent best seller “Data Mining with R-learning with case studies”.
Ajay- Describe your career in science. How do you think can more young people be made interested in science.
Luis- My interest in science only started after I’ve finished my degree. I’ve entered a research lab at the University of Porto and started working on Machine Learning, around 1990. Since then I’ve been involved generally in data analysis topics both from a research perspective as well as from a more applied point of view through interactions with industry partners on several projects. I’ve spent most of my career at the Faculty of Economics of the University of Porto, but since 2008 I’m at the department of Computer Science of the Faculty of Sciences of the same university. At the same time I’ve been a researcher at LIAAD / Inesc Porto LA (www.liaad.up.pt).
I like a lot what I do and like science and the “scientific way of thinking”, but I cannot say that I’ve always thought of this area as my “place”. Most of all I like solving challenging problems through data analysis. If that translates into some scientific outcome than I’m more satisfied but that is not my main goal, though I’m kind of “forced” to think about that because of the constraints of an academic career.
That does not mean I’m not passionate about science, I just think there are many more ways of “doing science” than what is reflected in the usual “scientific indicators” that most institutions seem to be more and more obsessed about.
Regards interesting young people in science that is a hard question that I’m not sure I’m qualified to answer. I do tend to think that young people are more sensible to concrete examples of problems they think are interesting and that science helps in solving, as a way of finding a motivation for facing the hard work they will encounter in a scientific career. I do believe in case studies as a nice way to learn and motivate, and thus my book ;-)
Ajay- Describe your new book “Data Mining with R, learning with case studies” Why did you choose a case study based approach? who is the target audience? What is your favorite case study from the book
Luis- This book is about learning how to use R for data mining. The book follows a “learn by doing it” approach to data mining instead of the more common theoretical description of the available techniques in this discipline. This is accomplished by presenting a series of illustrative case studies for which all necessary steps, code and data are provided to the reader. Moreover, the book has an associated web page (www.liaad.up.pt/~ltorgo/DataMiningWithR) where all code inside the book is given so that easy copy-paste is possible for the more lazy readers.
The language used in the book is very informal without many theoretical details on the used data mining techniques. For obtaining these theoretical insights there are already many good data mining books some of which are referred in “further readings” sections given throughout the book. The decision of following this writing style had to do with the intended target audience of the book.
In effect, the objective was to write a monograph that could be used as a supplemental book for practical classes on data mining that exist in several courses, but at the same time that could be attractive to professionals working on data mining in non-academic environments, and thus the choice of this more practically oriented approach.
Regards my favorite case study that is a hard question for an author… still I would probably choose the “Predicting Stock Market Returns” case study (Chapter 3). Not only because I like this challenging problem, but mainly because the case study addresses all aspects of knowledge discovery in a real world scenario and not only the construction of predictive models. It tackles data collection, data pre-processing, model construction, transforming predictions into actions using different trading policies, using business-related performance metrics, implementing a trading simulator for “real-world” evaluation, and laying out grounds for constructing an online trading system.
Obviously, for all these steps there are far too many options to be possible to describe/evaluate all of them in a chapter, still I do believe that for the reader it is important to see the overall picture, and read about the relevant questions on this problem and some possible paths that can be followed at these different steps.
In other words: do not expect to become rich with the solution I describe in the chapter !
Ajay- Apart from R, what other data mining software do you use or have used in the past. How would you compare their advantages and disadvantages with R
Luis- I’ve played around with Clementine, Weka, RapidMiner and Knime, but really only playing with teaching goals, and no serious use/evaluation in the context of data mining projects. For the latter I mainly use R or software developed by myself (either in R or other languages). In this context, I do not think it is fair to compare R with these or other tools as I lack serious experience with them. I can however, tell you about what I see as the main pros and cons of R. The main reason for using R is really not only the power of the tool that does not stop surprising me in terms of what already exists and keeps appearing as contributions of an ever growing community, but mainly the ability of rapidly transforming ideas into prototypes. Regards some of its drawbacks I would probably mention the lack of efficiency when compared to other alternatives and the problem of data set sizes being limited by main memory.
I know that there are several efforts around for solving this latter issue not only from the community (e.g. http://cran.at.r-project.org/web/views/HighPerformanceComputing.html), but also from the industry (e.g. Revolution Analytics), but I would prefer that at this stage this would be a standard feature of the language so the the “normal” user need not worry about it. But then this is a community effort and if I’m not happy with the current status instead of complaining I should do something about it!
Ajay- Describe your writing habit- How do you set about writing the book- did you write a fixed amount daily or do you write in bursts etc
Luis- Unfortunately, I write in bursts whenever I find some time for it. This is much more tiring and time consuming as I need to read back material far too often, but I cannot afford dedicating too much consecutive time to a single task. Actually, I frequently tease my PhD students when they “complain” about the lack of time for doing what they have to, that they should learn to appreciate the luxury of having a single task to complete because it will probably be the last time in their professional life!
Ajay- What do you do to relax or unwind when not working?
Luis- For me, the best way to relax from work is by playing sports. When I’m involved in some game I reset my mind and forget about all other things and this is very relaxing for me. A part from sports I enjoy a lot spending time with my family and friends. A good and long dinner with friends over a good bottle of wine can do miracles when I’m too stressed with work! Finally,I do love traveling around with my family.
Short Bio: Luis Torgo has a degree in Systems and Informatics Engineering and a PhD in Computer Science. He is an Associate Professor of the Department of Computer Science of the Faculty of Sciences of the University of Porto. He is also a researcher of the Laboratory of Artificial Intelligence and Data Analysis (LIAAD) belonging to INESC Porto LA. Luis Torgo has been an active researcher in Machine Learning and Data Mining for more than 20 years. He has lead several academic and industrial Data Mining research projects. Luis Torgo accompanies the R project almost since its beginning, using it on his research activities. He teaches R at different levels and has given several courses in different countries.
For reading “Data Mining with R” – you can visit this site, also to avail of a 20% discount the publishers have generously given (message below)-
For more information and to place an order, visit us at http://www.crcpress.com. Order online and apply 20% Off discount code 907HM at checkout. CRC is pleased to offer free standard shipping on all online orders!