The Tor software protects you by bouncing your communications around a distributed network of relays run by volunteers all around the world: it prevents somebody watching your Internet connection from learning what sites you visit, it prevents the sites you visit from learning your physical location, and it lets you access sites which are blocked.
The Tor Browser Bundle lets you use Tor on Windows, Mac OS X, or Linux without needing to install any software. It can run off a USB flash drive, comes with a pre-configured web browser, and is self-contained. The Tor IM Browser Bundleadditionally allows instant messaging and chat over Tor. If you would prefer to use your existing web browser, install Tor permanently, or if you don’t use Windows, see the other ways to download Tor.
Freedom House has produced a video on how to find and use the Tor Browser Bundle. If you don’t see a video below, view it at Youtube . Know of a better video or one translated into your language? Let us know!
and if you now want to see or check your own website for a Denial of Service attack , download this
This is the software for which 32 Turkish teenagers got arrested for bringing down their govt websites. Do NOT USE it for ILLEGAL purposes,
because 1) it is hosted on a western website that due to Patriot Act would tracking downloads as well as most likely be inserting some logging code into your computer (especially if you are still on Windows)
2) Turkey being a NATO member got rather immediate notice of this – which makes it very likely that this tool is compromised in the Western Hemisphere. You can probably use this in Eastern Hemisphere country excluding Israel, Turkey, China, India ,Korea or Japan because these countries do have sophisticated hackers working for the government as well.
3) This is just a beginners tool to understand how flooding a website with requests work.
Note the Failed Tab tells you how good or bad this method is.
Note – it wont work on my blogs hosted on wordpress.com- but then those blogs had a root level breach some time back. It did work on both my blogspot and my tumblr blogs, and it completely shattered my son’s self hosted wordpress blog (see below)
This is a movie that restores faith in the good old art of story telling with completely realistic but not in the face Computer Generated Effects.
Both Charles (as Prof X) and Erik (as Magneto) are awesome, but Erik steals the show as Michael FAsbbender plays the avenging Holocaust victim with complete and ruthless abandon. The use of Mad Men like costumes, and the flashback to history was awesome too, but the Russians were bad- same old chaps we have seen playing Russians in dozens of movies , slurring over their Rs. The interpolation of JFK, Cuban Missile Crisis and even the 1960’s chauvinistic humor really add on to this movie.
Watch it- good for both family and friends. Kevin Bacon is a steal, and lots of talented actors now join the Kevin Bacon game.
This is a short list of several known as well as lesser known R ( #rstats) language codes, packages and tricks to build a business intelligence application. It will be slightly Messy (and not Messi) but I hope to refine it someday when the cows come home.
It assumes that BI is basically-
a Database, a Document Database, a Report creation/Dashboard pulling software as well unique R packages for business intelligence.
What is business intelligence?
Seamless dissemination of data in the organization. In short let it flow- from raw transactional data to aggregate dashboards, to control and test experiments, to new and legacy data mining models- a business intelligence enabled organization allows information to flow easily AND capture insights and feedback for further action.
BI software has lately meant to be just reporting software- and Business Analytics has meant to be primarily predictive analytics. the terms are interchangeable in my opinion -as BI reports can also be called descriptive aggregated statistics or descriptive analytics, and predictive analytics is useless and incomplete unless you measure the effect in dashboards and summary reports.
Data Mining- is a bit more than predictive analytics- it includes pattern recognizability as well as black box machine learning algorithms. To further aggravate these divides, students mostly learn data mining in computer science, predictive analytics (if at all) in business departments and statistics, and no one teaches metrics , dashboards, reporting in mainstream academia even though a large number of graduates will end up fiddling with spreadsheets or dashboards in real careers.
Basically dispensing with the relational setup, with primary and foreign keys, and with the additional overhead involved in keeping transactional safety, often gives you extreme increases in performance
NoSQL is a kind of database that doesn’t have a fixed schema like a traditional RDBMS does. With the NoSQL databases the schema is defined by the developer at run time. They don’t write normal SQL statements against the database, but instead use an API to get the data that they need.
instead relating data in one table to another you store things as key value pairs and there is no database schema, it is handled instead in code.)
I believe any corporation with data driven decision making would need to both have atleast one RDBMS and one NoSQL for unstructured data-Ajay. This is a sweeping generic statement 😉 , and is an opinion on future technologies.
RevoConnectR for JasperReports Server RevoConnectR for JasperReports Server is a Java library interface between JasperReports Server and Revolution R Enterprise’s RevoDeployR, a standardized collection of web services that integrates security, APIs, scripts and libraries for R into a single server. JasperReports Server dashboards can retrieve R charts and result sets from RevoDeployR.
Extending Pentaho with R analytics”R” is a popular open source statistical and analytical language that academics and commercial organizations alike have used for years to get maximum insight out of information using advanced analytic techniques. In this twelve-minute video, David Reinke from Pentaho Certified Partner OpenBI provides an overview of R, as well as a demonstration of integration between R and Pentaho.
R and BI – Integrating R with Open Source Business
Intelligence Platforms Pentaho and Jaspersoft
David Reinke, Steve Miller
Keywords: business intelligence
Increasingly, R is becoming the tool of choice for statistical analysis, optimization, machine learning and
visualization in the business world. This trend will only escalate as more R analysts transition to business
from academia. But whereas in academia R is often the central tool for analytics, in business R must coexist
with and enhance mainstream business intelligence (BI) technologies. A modern BI portfolio already includes
relational databeses, data integration (extract, transform, load – ETL), query and reporting, online analytical
processing (OLAP), dashboards, and advanced visualization. The opportunity to extend traditional BI with
R analytics revolves on the introduction of advanced statistical modeling and visualizations native to R. The
challenge is to seamlessly integrate R capabilities within the existing BI space. This presentation will explain
and demo an initial approach to integrating R with two comprehensive open source BI (OSBI) platforms –
Pentaho and Jaspersoft. Our efforts will be successful if we stimulate additional progress, transparency and
innovation by combining the R and BI worlds.
The demonstration will show how we integrated the OSBI platforms with R through use of RServe and
its Java API. The BI platforms provide an end user web application which include application security,
data provisioning and BI functionality. Our integration will demonstrate a process by which BI components
can be created that prompt the user for parameters, acquire data from a relational database and pass into
RServer, invoke R commands for processing, and display the resulting R generated statistics and/or graphs
within the BI platform. Discussion will include concepts related to creating a reusable java class library of
commonly used processes to speed additional development.
An review and update on the predictions made in our 2007 article focused on the current state of the commercial open source BI market. Also included is a brief analysis of potential options for commercial open source business models and our take on their applicability.
R has become the “lingua franca” for academic statistical analysis and modeling, and is now rapidly gaining exposure in the commercial world. Steve examines the R technology and community and its relevancy to mainstream BI.
Boxplots and variants (e.g. Violin Plot) are explored as an essential graphical technique to summarize data distributions by categories and dimensions of other attributes.
Lattices and logarithmic data transformations are used to illuminate data density and distribution and find patterns otherwise missed using classic charting techniques.
How do you deal with highly skewed data distributions? Standard charting techniques on this “deviant” data often fail to illuminate relationships. This article explains techniques to re-express skewed data so that it is more understandable.
Steve uses the R open source stats package and Monte Carlo simulations to examine alternative investment portfolio returns…a good example of applied statistics using R.
In August, Steve attended the 2007 Internation R User conference (useR!2007). This article details his experiences, including his meeting with long-time R community expert, Frank Harrell.
The newly launched Dashboard Insight web site is focused on the most useful of BI tools: dashboards. With this article discussing the use of R and trellis graphics, OpenBI brings the realm of open source to this forum.
Utilizing Tufte’s philosophy of maximizing the data to ink ratio of graphics, Steve demonstrates the value in dot plot diagramming. The R open source statistical/analytics software is showcased.
I think that the report generation package Brew would also qualify as a BI package, but large scale implementation remains to be seen in
a commercial business environment
brew: Creating Repetitive Reports
brew: Templating Framework for Report Generation
brew implements a templating framework for mixing text and R code for report generation. brew template syntax is similar to PHP, Ruby's erb module, Java Server Pages, and Python's psp module. http://bit.ly/jINmaI
If you do a Google search for Data Mining Blog- for the past several years one Blog will come on top. data mining blog – Google Search http://bit.ly/kEdPlE
To honor 5 years of Sandro Saitta’s blog (yes thats 5 years!) , we cover an exclusive interview with him where he reveals his unique sauce for cool techie blogging.
Ajay- Describe your journey as a scientist and data miner, from early experiences, to schooling to your work/research/blogging.
Sandro- My first experience with data mining was my master project. I used decision tree to predict pollen concentration for the following week using input data such as wind, temperature and rain. The fact that an algorithm can make a computer learn from experience was really amazing to me. I found it so interesting that I started a PhD in data mining. This time, the field of application was civil engineering. Civil engineers put a lot of sensors on their structure in order to understand how they behave. With all these sensors they generate a lot of data. To interpret these data, I used data mining techniques such as feature selection and clustering. I started my blog, Data Mining Research, during my PhD, to share with other researchers.
I then started applying data mining in the stock market as my first job in industry. I realized the difference between image recognition, where 99% correct classification rate is state of the art, and stock market, where you’re happy with 55%. However, the company ambiance was not as good as I thought, so I moved to consulting. There, I applied data mining in behavioral targeting to increase click-through rates. When you compare the number of customers who click with the ones who don’t, then you really understand what class imbalance mean. A few months ago, I accepted a very good opportunity at SICPA. I’m looking forward to resolving new challenges there.
Ajay- Your blog is the top ranked blog for “data mining blog”. Could you share some tips on better blogging for analytics and technical people
Sandro- It’s always difficult to start a blog, since at the beginning you have no reader. Writing for nobody may seem stupid, but it is not. By writing my first posts during my PhD I was reorganizing my ideas. I was expressing concepts which were not always clear to me. I thus learned a lot and also improved my English level. Of course, it’s still not perfect, but I hope most people can understand me.
Next come the readers. A few dozen each week first. To increase this number, I then started to learn SEO (Search Engine Optimization) by reading books and blogs. I tested many techniques that increased Data Mining Research visibility in the blogosphere. I think SEO is interesting when you already have some content published (which means not at the very beginning of your blog). After a while, once your blog is nicely ranked, the main task is to work on the content of the blog. To be of interest, your content must be particular: original, informative or provocative for example. I also had the chance to have a good visibility thanks to well-known people in the field like Kevin Hillstrom, Gregory Piatetsky-Shapiro, Will Dwinnell / Dean Abbott, Vincent Granville, Matthew Hurst and many others.
Ajay- Whats your favorite statistical software and what are the various softwares that you have worked with. Could you compare and contrast these software as well.
Sandro- My favorite software at this point is SAS. I worked with it for two years. Once you know the language, you can perform ETL and data mining so easily. It’s also very fast compared to others. There are a lot of tools for data mining, but I cannot think of a tool that is as powerful as SAS and, in the same time, has a high-level programming language behind it.
I also worked with R and Matlab. R is very nice since you have all the up-to-date data mining algorithms implemented. However, working in the memory is not always a good choice, especially for ETL. Matlab is an excellent tool for prototyping. It’s not so fast and certainly not done for ETL, but the price is low regarding all the possibilities for data mining. According to me, SAS is the best choice for ETL and a good choice for data mining. Of course, there is the price.
Ajay- What are your favorite techniques and training resources for learning basics of data mining to say statisticians or business management graduates.
Sandro- I’m the kind of guy who likes to read books. I read data mining books one after the other. The fact that the same concepts are explained differently (and by different people) helps a lot in learning a topic like data mining. Of course, nothing replaces experience in the field. You can read hundreds of books, you will still not be a good practitioner until you really apply data mining in specific fields. My second choice after books is blogs. By reading data mining blogs, you will really see the issues and challenges in the field. It’s still not experience, but we are closer. Finally, web resources and networks such as KDnuggets of course, but also AnalyticBridge and LinkedIn.
Ajay- Describe your hobbies and how they help you ,if at all in your professional life.
Sandro- One of my hobbies is reading. I read a lot of books about data mining, SEO, Google as well as Sci-Fi and Fantasy. I’m a big fan of Asimov by the way. My other hobby is playing tennis. I think I simply use my hobbies as a way to find equilibrium in my life. I always try to find the best balance between work, family, friends and sport.
Ajay- What are your plans for your website for 2011-2012.
Sandro- I will continue to publish guest posts and interviews. I think it is important to let other people express themselves about data mining topics. I will not write about my current applications due to the policies of my current employer. But don’t worry, I still have a lot to write, whether it is technical or not. I will also emphasis more on my experience with data mining, advices for data miners, tips and tricks, and of course book reviews!
Standard Disclosure of Blogging- Sandro awarded me the Peoples Choice award for his blog for 2010 and carried out my interview. There is a lot of love between our respective wordpress blogs, but to reassure our puritan American readers- it is platonic and intellectual.
About Sandro S-
Sandro Saitta is a Data Mining Research Engineer at SICPA Security Solutions. He is also a blogger at Data Mining Research (www.dataminingblog.com). His interests include data mining, machine learning, search engine optimization and website marketing.
Outstandingly attractive scholarships are available for students willing to travel to Yorkshire. Thats where the Battle of Roses was fought by the British Royal Family.
Emphasis and spaces in email above are made by me.
Message from Dr Top i bell ow-
It is not New York but very old York, in the North of England.
The scholarships carry a tax-free stipend and financial assistance will be
given for travel expenses to and from York. Accommodation for successful
students is available on the University of York Campus.
I am hoping to put this on my pre-ordered or Amazon Wish list. The book the common people who wanted to do data mining with , but were unable to ask aloud they didnt know much. It is written by the seminal Australian authority on data mining Dr Graham Williams whom I interviewed here at https://decisionstats.com/2009/01/13/interview-dr-graham-williams/
Data Mining for the masses using an ergonomically designed Graphical User Interface.
Encourages the concept of programming with data – more than just pushing data through tools, but learning to live and breathe the data
Accessible to many readers and not necessarily just those with strong backgrounds in computer science or statistics
Details some of the more popular algorithms for data mining, as well as covering model evaluation and model deployment
Data mining is the art and science of intelligent data analysis. By building knowledge from information, data mining adds considerable value to the ever increasing stores of electronic data that abound today. In performing data mining many decisions need to be made regarding the choice of methodology, the choice of data, the choice of tools, and the choice of algorithms.
Throughout this book the reader is introduced to the basic concepts and some of the more popular algorithms of data mining. With a focus on the hands-on end-to-end process for data mining, Williams guides the reader through various capabilities of the easy to use, free, and open source Rattle Data Mining Software built on the sophisticated R Statistical Software. The focus on doing data mining rather than just reading about data mining is refreshing.
The book covers data understanding, data preparation, data refinement, model building, model evaluation, and practical deployment. The reader will learn to rapidly deliver a data mining project using software easily installed for free from the Internet. Coupling Rattle with R delivers a very sophisticated data mining environment with all the power, and more, of the many commercial offerings.
XBRL is a member of the family of languages based on XML, or Extensible Markup Language, which is a standard for the electronic exchange of data between businesses and on the internet. Under XML, identifying tags are applied to items of data so that they can be processed efficiently by computer software.
XBRL is a powerful and flexible version of XML which has been defined specifically to meet the requirements of business and financial information. It enables unique identifying tags to be applied to items of financial data, such as ‘net profit’. However, these are more than simple identifiers. They provide a range of information about the item, such as whether it is a monetary item, percentage or fraction. XBRL allows labels in any language to be applied to items, as well as accounting references or other subsidiary information.
XBRL can show how items are related to one another. It can thus represent how they are calculated. It can also identify whether they fall into particular groupings for organisational or presentational purposes. Most importantly, XBRL is easily extensible, so companies and other organisations can adapt it to meet a variety of special requirements.
The rich and powerful structure of XBRL allows very efficient handling of business data by computer software. It supports all the standard tasks involved in compiling, storing and using business data. Such information can be converted into XBRL by suitable mapping processes or generated in XBRL by software. It can then be searched, selected, exchanged or analysed by computer, or published for ordinary viewing.
With more than 7,000 new U.S. companies facing extensible business reporting language (XBRL) filing mandates in 2011, Oracle has released a free XBRL extension on top of the latest release of Oracle Database.
Oracle’s XBRL extension leverages Oracle Database 11g Release 2 XML to manage the collection, validation, storage, and analysis of XBRL data. It enables organizations to create one or more back-end XBRL repositories based on Oracle Database, providing secure XBRL storage and query-ability with a set of XBRL-specific services.
In addition, the extension integrates easily with Oracle Business Intelligence Suite Enterprise Edition to provide analytics, plus interactive development environments (IDEs) and design tools for creating and editing XBRL taxonomies.
The Other Side of XBRL
“While the XBRL mandate continues to grow, the feedback we keep hearing from the ‘other side’ of XRBL—regulators, academics, financial analysts, and investors—is that they lack sufficient tools and historic data to leverage the full potential of XBRL,” says John O’Rourke, vice president of product marketing, Oracle.
However, O’Rourke says this is quickly changing as XBRL mandates enter their third year—and more and more companies have to comply. While the new extension should be attractive to organizations that produce XBRL filings, O’Rourke expects it will prove particularly valuable to regulators, stock exchanges, universities, and other organizations that need to collect, analyze, and disseminate XBRL-based filings.
Outsourcing, a Bolt-on Solution, or Integrated XBRL Tagging
Until recently, reporting organizations had to choose between expensive third-party outsourcing or manual, in-house tagging with bolt-on solutions— both of which introduce the possibility of error.
In response, Oracle launched Oracle Hyperion Disclosure Management, which provides an XBRL tagging solution that is integrated with the financial close and reporting process for fast and reliable XBRL report submission—without relying on third-party providers. The solution enables organizations to
Author regulatory filings in Microsoft Office and “hot link” them directly to financial reporting systems so they can be easily updated
Graphically perform XBRL tagging at several levels—within Microsoft Office, within EPM system reports, or in the data source metadata
Modify or extend XBRL taxonomies before the mapping process, as well as set up multiple taxonomies
Create and validate final XBRL instance documents before submission