Home » Posts tagged 'databases' (Page 2)
Tag Archives: databases
Here is an interview with Kelci Miclaus, a researcher working with the JMP division of the SAS Institute, in which she demonstrates examples of how the R programming language is a great hit with JMP customers who like to be flexible.
Ajay- How has JMP been using integration with R? What has been the feedback from customers so far? Is there a single case study you can point out where the combination of JMP and R was better than any one of them alone?
Kelci- Feedback from customers has been very positive. Some customers are using JMP to foster collaboration between SAS and R modelers within their organizations. Many are using JMP’s interactive visualization to complement their use of R. Many SAS and JMP users are using JMP’s integration with R to experiment with more bleeding-edge methods not yet available in commercial software. It can be used simply to smooth the transition with regard to sending data between the two tools, or used to build complete custom applications that take advantage of both JMP and R.
One customer has been using JMP and R together for Bayesian analysis. He uses R to create MCMC chains and has found that JMP is a great tool for preparing the data for analysis, as well as displaying the results of the MCMC simulation. For example, the Control Chart platform and the Bubble Plot platform in JMP can be used to quickly verify convergence of the algorithm. The use of both tools together can increase productivity since the results of an analysis can be achieved faster than through scripting and static graphics alone.
I, along with a few other JMP developers, have written applications that use JMP scripting to call out to R packages and perform analyses like multidimensional scaling, bootstrapping, support vector machines, and modern variable selection methods. These really show the benefit of interactive visual analysis of coupled with modern statistical algorithms. We’ve packaged these scripts as JMP add-ins and made them freely available on our JMP User Community file exchange. Customers can download them and now employ these methods as they would a regular JMP platform. We hope that our customers familiar with scripting will also begin to contribute their own add-ins so a wider audience can take advantage of these new tools.
Ajay- Are there plans to extend JMP integration with other languages like Python?
Kelci- We do have plans to integrate with other languages and are considering integrating with more based on customer requests. Python has certainly come up and we are looking into possibilities there.
Ajay- How is R a complimentary fit to JMP’s technical capabilities?
Kelci- R has an incredible breadth of capabilities. JMP has extensive interactive, dynamic visualization intrinsic to its largely visual analysis paradigm, in addition to a strong core of statistical platforms. Since our brains are designed to visually process pictures and animated graphs more efficiently than numbers and text, this environment is all about supporting faster discovery. Of course, JMP also has a scripting language (JSL) allowing you to incorporate SAS code, R code, build analytical applications for others to leverage SAS, R and other applications for users who don’t code or who don’t want to code.
JSL is a powerful scripting language on its own. It can be used for dialog creation, automation of JMP statistical platforms, and custom graphic scripting. In other ways, JSL is very similar to the R language. It can also be used for data and matrix manipulation and to create new analysis functions. With the scripting capabilities of JMP, you can create custom applications that provide both a user interface and an interactive visual back-end to R functionality. Alternatively, you could create a dashboard using statistical and/or graphical platforms in JMP to explore the data and with the click of a button, send a portion of the data to R for further analysis.
Another JMP feature that complements R is the add-in architecture, which is similar to how R packages work. If you’ve written a cool script or analysis workflow, you can package it into a JMP add-in file and send it to your colleagues so they can easily use it.
Ajay- What is the official view on R from your organization? Do you think it is a threat, or a complimentary product or another statistical platform that coexists with your offerings?
Kelci- Most definitely, we view R as complimentary. R contributors are providing a tremendous service to practitioners, allowing them to try a wide variety of methods in the pursuit of more insight and better results. The R community as a whole is providing a valued role to the greater analytical community by focusing attention on newer methods that hold the most promise in so many application areas. Data analysts should be encouraged to use the tools available to them in order to drive discovery and JMP can help with that by providing an analytic hub that supports both SAS and R integration.
Ajay- While you do use R, are there any plans to give back something to the R community in terms of your involvement and participation (say at useR events) or sponsoring contests.
Kelci- We are certainly open to participating in useR groups. At Predictive Analytics World in NY last October, they didn’t have a local useR group, but they did have a Predictive Analytics Meet-up group comprised of many R users. We were happy to sponsor this. Some of us within the JMP division have joined local R user groups, myself included. Given that some local R user groups have entertained topics like Excel and R, Python and R, databases and R, we would be happy to participate more fully here. I also hope to attend the useR! annual meeting later this year to gain more insight on how we can continue to provide tools to help both the JMP and R communities with their work.
We are also exploring options to sponsor contests and would invite participants to use their favorite tools, languages, etc. in pursuit of the best model. Statistics is about learning from data and this is how we make the world a better place.
About- Kelci Miclaus
Kelci is a research statistician developer for JMP Life Sciences at SAS Institute. She has a PhD in Statistics from North Carolina State University and has been using SAS products and R for several years. In addition to research interests in statistical genetics, clinical trials analysis, and multivariate analysis/visualization methods, Kelci works extensively with JMP, SAS, and R integration.
Part 1 in this series is avaiable at http://www.decisionstats.com/analytics-for-cyber-conflict/
The next articles in this series will cover-
- the kind of algorithms that are currently or being proposed for cyber conflict, as well as or detection
Cyber Conflict requires some basic elements of the following broad disciplines within Computer and Information Science (besides the obvious disciplines of heterogeneous database types for different kinds of data) -
1) Cryptography – particularly a cryptographic hash function that maximizes cost and time of the enemy trying to break it.
The ideal cryptographic hash function has four main or significant properties:
- it is easy (but not necessarily quick) to compute the hash value for any given message
- it is infeasible to generate a message that has a given hash
- it is infeasible to modify a message without changing the hash
- it is infeasible to find two different messages with the same hash
A commercial spin off is to use this to anonymized all customer data stored in any database, such that no database (or data table) that is breached contains personally identifiable information. For example anonymizing the IP Addresses and DNS records with a mashup (embedded by default within all browsers) of Tor and MafiaaFire extensions can help create better information privacy on the internet.
This can also help in creating better encryption between Instant Messengers in Communication
2) Data Disaster Planning for Data Storage (but also simulations for breaches)- including using cloud computing, time sharing, or RAID for backing up data. Planning and creating an annual (?) exercise for a simulated cyber breach of confidential just like a cyber audit- similar to an annual accounting audit
3) Basic Data Reduction Algorithms for visualizing large amounts of information. This can include
- K Means Clustering, http://www.jstor.org/pss/2346830 , http://www.cs.ust.hk/~qyang/Teaching/537/Papers/huang98extensions.pdf , and http://stackoverflow.com/questions/6372397/k-means-with-really-large-matrix
- Topic Models (LDA) http://www.decisionstats.com/topic-models/,
- Social Network Analysis http://en.wikipedia.org/wiki/Social_network_analysis,
- Graph Analysis http://micans.org/mcl/ and http://www.ncbi.nlm.nih.gov/pubmed/19407357
- MapReduce and Parallelization algorithms for computational boosting http://www.slideshare.net/marin_dimitrov/large-scale-data-analysis-with-mapreduce-part-i
In the next article we will examine
- the role of non state agents as well as state agents competing and cooperating,
- and what precautions can knowledge discovery in databases practitioners employ to avoid breaches of security, ethics, and regulation.
This is a piece of science fiction. I wrote while reading Isaac Assimov’s advice to writers in GOLD, while on a beach in Anjuna.
1) Identify senators, lobbyists, senior executives of companies advocating for SOPA. Go for selective targeting of these people than massive Denial of Service Attacks.
This could also include election fund raising websites in the United States.
2) Create hacking tools with simple interfaces to probe commonly known software errors, to enable wider audience including the Occupy Movement students to participate in hacking. thus making hacking more democratic. What are the top 25 errors as per http://cwe.mitre.org/cwss/
Easy interface tools to check vulnerabilities would be the next generation to flooding tools like HOIC, LOIC – Massive DDOS atttacks make good press coverage but not so good technically
3) Disrupt digital payment mechanisms for selected targets (in step1) using tools developed in Step 2, and introduce random noise errors in payment transfers.
4) Help create a better secure internet by embedding Tor within Chromium with all tools for anonymity embedded for easy usage – a more secure peer to peer browser (like a mashup of Opera , tor and chromium).
or maybe embed bit torrents within a browser.
5) Disrupt media companies and cloud computing based companies like iTunes, Spotify or Google Music, just like virus, ant i viruses disrupted the desktop model of computing. After that offer solutions to the problems like companies of anti virus software did for decades.
6) Hacking websites is fine fun, but hacking internet databases and massively parallel data scrapers can help disrupt some of the status quo.
This applies to databases that offer data for sale, like credit bureaus etc. Making this kind of data public will eliminate data middlemen.
7) Use cross border, cross country regulatory arbitrage for better risk control of hacker attacks.
8) recruiting among universities using easy to use hacking tools to expand the pool of dedicated hacker armies.
9) using operations like those targeting child pornography to increase political acceptability of the hacker sub culture. Refrain from overtly negative and unimaginative bad Press Relations
10) If you cant convince them to pass SOPA, confuse them Use bots for random clicks on ads to confuse internet commerce.
What is Google Cloud SQL?
Google Cloud SQL is web service that allows you to create, configure, and use relational databases with your App Engine applications. It is a fully-managed service that maintains, manages, and administers your databases, allowing you to focus on your applications and services.
By offering the capabilities of a MySQL database, the service enables you to easily move your data, applications, and services into and out of the cloud. This allows for high data portability and helps in faster time-to-market because you can quickly leverage your existing database (using JDBC and/or DB-API) in your App Engine application.
Here is where you can get an invite to the beta only Google Cloud SQL
Sign up for Limited Preview
Google Cloud SQL is available to a limited number of users. To sign up for the service:
- Visit the Google APIs Console. The console opens the All services pane.
- Find the SQL Service line in the Services table and click Request access…
- Fill out the enrollment form.
- Our team will review your enrollment information and respond by email to the address associated with your Google Account.
- Follow the link in the email to view the Terms of Service. Please read these carefully before accepting.
- Sign up for the google-cloud-sql-announce group to receive important announcements and product news. (NOTE- Members: 384)