Personal: My Son the Blogger

My 20 something son has decided to blog at http://www.kushohri.com.

These kids!

This is a archive content

Happy new Year

January 2nd, 2009

This is the second time I am seeing a Happy New Year. It gets difficult to see New year in delhi, when covered within 3 sweaters ,warmers,cap and delhi fog.

Sorry, didnt blog for some time. Dad went to US and brought me lots of clothes in October 2008.Chachu bought me a toy train from London, which dad said is too big and too good so I havent gotten it. Dad  likes my toys sometimes more than I do. Boys will be boys.

In October , I learnt to walk, and by today I am happily running and getting into trouble.Also grand ma decided no more creche and I got a new didi, or baby sitter called S..

Dadi and Dadu went to Chachu’s place in Mumbai in December , so I missed them. They are back now.

Anyways, happy new blogging year.

Gotta run !

Posted in Uncategorized | No Comments »

Kush starts to walk

September 10th, 2008

How many miles must a baby crawl, before he decides to stand.

The answer my friend , is blowing in the wind. The answer is blowing in the wind.

I decide to walk. Mom and Dad went all uh ah .Dad shot a video and uploaded it on Facebook

My chachu, Dad’s younger borther is London bound so he couldnt see the photos.

So he gets to see videos.

Walking means falling on bum lots of time. It also means able to do more mischief like climbing on tables,

pulling Dads computer (note from Dad-grrrrr) .

My grand ma gets worried when I fall though. She is very sweet , her skin is quite wrinkly.

Dont know why but babies enjoy old peoples company more. They are more relaxed.

Grownups are always rushing from here -there….

Pause dear Grwon up. Go home and watch your kids play.

Posted in Uncategorized | No Comments »

Birthday Number 1

September 5th, 2008

Kush turned 1 on this day. Its like his age was zero and now its one.

Grand Dad had big tent, lots of visitors who kind of hold on to me, till I shout or fake cries.

Dad forgot to put batteries in camera.

The morning party at day care centre was better.

The worse thing about turning 1— so much cake , and you dont get even one bite to eat.

Grrrrr

Posted in Uncategorized | No Comments »

Measles Day 3-6

September 1st, 2008

Measles is over now. Mom rubbed a lot of cream, I got to do lots of fun.

I lost some weight though, and that kind of worried Dad. On Day 5, MOM took me to temple with DAD an SUPER DAD or Grand Dad, and there we said a prayer and priest sprinkled some milk drops on me.

Measles and Milk drops. This happens . in.India.

Super Dad is planning my 1 birthday next week. Its apparently a big deal, with lots of tents,cooks, guests etc coming in. I dunno. I love parties dedicated to me , like my Lohri party in January.

Also, my hair is kind of growing so Mom made some ponytail. Everyone says I look nice in it.

I hate it.

Takes me 2 secs to rub it down.

We have  a new maid called Sharmila didi. I play with her too when Dad is beating stuff on his bumputer

and mom is cooking. Apparently she is a Kushu-sitter (because I think Kushu is no baby )

Day 7- Off To Creche.

Vacations over.

Wah Wah Wah…

Posted in Uncategorized | No Comments »

Measles Day 2.0

August 26th, 2008

Measles day 2 was ok dokey.

I got to sleep late. Mommy took day off, so no creche for me.I stay at creche , mommy goota finish her med school (job and study)

Pappa ’s pretty useless. Just wringes his hands. Says he is on part day off-on. Works from home anyways.

On his big box bumputer.

Extra care from maama is worth the measles . But these spots on face kind of itchy.

And the medicine….well they should be tested on doctors….grrrrr

Posted in Uncategorized | No Comments »

Measles

August 25th, 2008

Kush got measles.

Despite the booster he got at 9 months. From his mom whos a docter.

Despite all the love ,care and babysitting. from the grown up known as Dad.

So he is all cry cry. And turning pink.

Measles is tough .

But the grownups are suddenly loving me more.

He he.He.

Maybe I can get to eat  Daddys mobile phone now

But Baby jokes apart.

This is just day 1,

Daddy will take two days off.

and I am off to Grandpa’s for extra tender loving care.

Please pray for me to get well.

Feeling sleepy again..these medicines …and that pointy pin jenction hurrrrt.

Granpas is cool , and was planning my Bday party next week. But lets see.

back to baby bed.

Tags:
Posted in Uncategorized | No Comments »

Papa ’s Big Idea

August 25th, 2008

Cause these food stamps dont buy diapers

Marshal Mathhers

Posted in Uncategorized | No Comments »

viral fever and rashes

August 23rd, 2008

for two weeks i had something called diaper rash. daddy calls it my red monkey bum. funny to him maybe.

it hurts when i get powdered, then it becomes white monkey bum.

and it has started raining. that means water from sky. water from my stomachs below i barely understand. water from above is even more strange. see my photo in post 1.

so i got fever, and so rush to docter, and daddy mommy worried, and bitter medicine….

life can be tough for a baby. thats when kushu decided to sleep it off. like right now. too much typing ….

kushu s time to sleep…

good bye grown up world…here comes baby sleep world…..

Posted in Uncategorized | No Comments »

Kush is in trouble with daddy

August 23rd, 2008

Kush is in some kind of trouble with daddy.

It all started when daddy started spending way too much time in front of white box with black thing and shiny ball. he calls it his computer. i call it daddy’s toy.

so one day i came crawling , stood straight and pushed at the shiny box below. it fell down tih thud. daddy started shouting my cpu my cpu.

this was too much for me to bear so i started crying.

anyway the cpu seems okay now and daddy is back to his B L O O G I N

so sometimes I knock on his door when inside

sometime i happen to stand near him when water comes from my tummy below (water always come suddenly from there)

he doesnt like it especially when i broke 2 keys of his keyboard

daddy play with kushu. not with computer.

ah these grown ups.

Tags: , ,
Posted in Uncategorized | No Comments »

11 months ago

August 6th, 2008

I was born 11 months ago. Slowly my neck, my head, my legs , my hands started moving.

I can even stand now..though I need more work on my quadriceps. (I shake when I stand..crawling is kind of better)

Being born means taken out of nice ,warm place to strange place of voices that go “000 …000″, Milk, Milk…and daily bath.

Babies dont like daily baths. I dont, but I have gotten used to it now.

Last week, I had strange sound in throat.

Big tall people, called grown ups started acting funny..taking me here , there.

They said I was sick.

I love my my moma ,and popa though.

They are kinda fun, except when they make stop me crawling ,drink or take a bath.

The love kind of softens when they force bitter liquids after every three hours, saying “oooo…Oooo”

Lots of really nasty liquids later, I am ok and not so sick.

The World in my eyes , is of two types of people.

Big Tall people who speak a lot, and tiny small people who cant speak.But that is changing….

Life is beautiful

July 30th, 2008

I used to be here

but then came life …..

Posted in Uncategorized | No Comments »

Hello World!

July 26th, 2008

Hello I am Kush.

11 Months old.

The World’s Youngest Blogger.


Watch this space.

The World\'s Youngest Blogger

Coming Soon www.kushohri.com (As Narated by the grown up also known as DAD)

How NOT to ask Questions/ Comments

I got this great website from Joshua Reich of i2pi

Basically it tells newbies on how to get better effective help online while learning new tech stuff, by lucidly explaining basic community volunteer behaviour.

Citation:http://catb.org/~esr/faqs/smart-questions.html

Rodin_TheThinker

hackers have a reputation for meeting simple questions with what looks like hostility or arrogance. It sometimes looks like we’re reflexively rude to newbies and the ignorant. But this isn’t really true.What we are, unapologetically, is hostile to people who seem to be unwilling to think or to do their own homework before asking questions. People like that are time sinks — they take without giving back, and they waste time we could have spent on another question more interesting and another person more worthy of an answer. We call people like this “losers” (and for historical reasons we sometimes spell it “lusers”).

We realize that there are many people who just want to use the software we write, and who have no interest in learning technical details. For most people, a computer is merely a tool, a means to an end; they have more important things to do and lives to live. We acknowledge that, and don’t expect everyone to take an interest in the technical matters that fascinate us. Nevertheless, our style of answering questions is tuned for people who do take such an interest and are willing to be active participants in problem-solving. That’s not going to change. Nor should it; if it did, we would become less effective at the things we do best.

We’re (largely) volunteers. We take time out of busy lives to answer questions, and at times we’re overwhelmed with them. So we filter ruthlessly. In particular, we throw away questions from people who appear to be losers in order to spend our question-answering time more efficiently, on winners.

Kind of explains why Bloggers delete some comments on  blogs as well 😉

Image Source: wikimedia.org


R and SAS in Twitter Land

A tale of two languages ( set in Twitterland)

Everytime I post to the R help list, if the email contains the three words S.- A – S , I get plenty of e-spanking from senior professors and distinguished Linux people. On the other hand when I mentioned W-P-S I got dunked by the Don of SAS Global himself. We geeks are so passionate.

Here is some new stuff on Twitter for the R /Open Source community.

1) I manually made a list of

  1. best R blogs,
  2. R help lists ( on Nabble since Google Groups banned R help archive),
  3. Twitter Search for #rstats ( general search word for R)

I then copied the RSS feeds of each of the above.

2) I then went to www.twitterfeed.com (uses open Id) and linked a new Twitter Account to these RSS feeds

Screenshot-twitterfeed.com : feed your blog to twitter - Mozilla Firefox

3) I then tweaked the layout and added #rstats before each post to the new R resource http://twitter.com/Rarchive

http://twitter.com/Rarchive
http://twitter.com/Rarchive

If you are a tweeter you can follow it here http://twitter.com/Rarchive and never miss any R news going forward.

ps- I also did the same for sas for http://twitter.com/sascommunity

UPDATE

#rstats helps in SEO in Google since Google uses Twitter search as well. Existing best R search engine is http://rseek.com
In any case it is too late to change now since this is more like a automated firehose. Now you can use #rstats as well additional keywords to get more searchable useful stuff.

NOTE______

http://twitter.com/sas belongs to a guy who is wondering who is trying to hack his twitter account

, well you can check the screen shot below

Screenshot-Sky Sutton (SAS) on Twitter - Mozilla Firefox
Screenshot-Sky Sutton (SAS) on Twitter - Mozilla Firefox

Advertisements @ Decisionstats

Decision Stats will soon run advertisements to cover expenses.

It costs a flat fee of XXX$ per month with a minimum 3 month contract but I am looking at other advertisers.

Advertising will be on first come first served basis but only non competing advertisers are allowed.

Non open source supporting companies may not be welcome at all.

For exclusive advertising on the blog picture it will be YYY$ a month with no other commercial advertisers with a minimum 3 month contract. For a bigger duration exclusive contract- the price can be adjusted further.

Advertisements will be pictures of the size of the images on www.decisionstats.com and would need a minimum of three separate graphic pictures for advertisers ( as they rotate).

I would work with the creative team to ensure size as well as clarity of pictures. It will also be in the form of Videos in the same space. Contact me at www.twitter.com/decisionstats to get special discounted rates on XXX and YYY above

Credentials-

Recommended by KD Nuggets.

Twice Winner of Contributor of Month by Analyticbridge-
Once winner of Blogger of the Week by Smart Data Collective-

Also see some numbers for size of Decisionstats Community

Twitter Followers-1052 (RSS)
Linkedin Group Members: 794
Facebook Fans: 52
Google Analytics Monthly Count: 3,147 Visits 5,565 Pageviews

Some Charts for Transparent Statistics is what this site is all about

NOTE FROM AJAY-

( As you can see Google Adsense ads on right margin seem to be quite very relevant already to the content :)) Goggle named it AD SENSE , but sometimes it is AD HOC SENSE

Twitter Channel for SAS Users

Dear All,

Screenshot-SAS Language Central (sascommunity) on Twitter - Mozilla Firefox

I just created a twitter channel for everthying about the SAS language. (independent of SAS institute as of now)

That’s right -using RSS feeeds, keyword filters, key bloggers AND tweeters and using http://www.twitterfeed.com this is a firehose of information on  everything SAS- includes Google Search and Twitter Search. I am still modifying it before transitioning it – Scandavian Airlines messes up some results for examples.

Anyways if you like to follow tech tweets this is one more-

http://twitter.com/sascommunity

PAW Blog Partner and 15 % off for you

paw09_blog_125

Dear Readers,

If you plan to attend Predictive Analytics World ( Oct20-21) in Washington DC,

Here are the speakers – source

Speakers Washington DC 2009:

Stephen L. Baker, Senior writer, BusinessWeek

Stephen L. BakerStephen L. Baker, author of The Numerati, is a senior writer at BusinessWeek, covering technology. Previously he was a Paris correspondent. Baker joined BusinessWeek in March, 1987, as manager of the Mexico City bureau, where he was responsible for covering Mexico and Latin America. He was named Pittsburgh bureau manager in 1992. Before BusinessWeek, Baker was a reporter for the El Paso Herald-Post. Prior to that, he was chief economic reporter for The Daily Journal in Caracas, Venezuela. Baker holds a bachelor’s degree from the University of Wisconsin and a master’s from the Columbia University Graduate School of Journalism. He blogs at TheNumerati.net and Blogspotting.net, and can be found on Twitter at @stevebaker.


John F. Elder, Ph.D., CEO and Founder, Elder Research, Inc.

Dr. John F. ElderDr. John F. Elder heads a data mining consulting team with offices in Charlottesville, Virginia and Washington DC. Founded in 1995, Elder Research, Inc. focuses on scientific and commercial applications of pattern discovery and optimization, including stock selection, image recognition, text mining, biometrics, drug efficacy, credit scoring, cross-selling, investment timing, and fraud detection.

John obtained a BS and MEE in Electrical Engineering from Rice University, and a PhD in Systems Engineering from the University of Virginia, where he’s an adjunct professor, teaching Optimization or Data Mining. Prior to 13 years leading ERI, he spent 5 years in aerospace defense consulting, 4 heading research at an investment management firm, and 2 in Rice’s Computational & Applied Mathematics department.

Dr. Elder has authored innovative data mining tools, is active on Statistics, Engineering, and Finance conferences and boards, is a frequent keynote conference speaker, and is General Chair of the 2009 Knowledge Discovery and Data Mining conference in Paris. John’s courses on data analysis techniques – taught at dozens of universities, companies, and government labs – are noted for their clarity and effectiveness. Dr. Elder was honored to serve for 5 years on a panel appointed by the President to guide technology for National Security. His book on Practical Data Mining, with Bob Nisbet and Gary Minor, will appear in May 2009.


Usama Fayyad, Ph.D., CEO, Open Insights

Dr. Usama FayyadDr. Usama Fayyad was until recently Yahoo!’s Chief Data Officer and Executive Vice President of Research & Strategic Data Solutions where he was responsible for Yahoo!’s global data strategy, architecting Yahoo!’s data policies and systems, prioritizing data investments, and managing the Company’s data analytics and data processing infrastructure. Fayyad also founded and oversaw the Yahoo! Research organization with offices around the world. Yahoo! Research is building the premier scientific research organization to develop the new sciences of the Internet, on-line marketing, and innovative interactive applications.

Prior to joining Yahoo!, Fayyad co-founded and led the DMX Group, a data mining and data strategy consulting and technology company that was acquired by Yahoo! in 2004. In early 2000, he co-founded and served as CEO of Revenue Science, Inc.(digiMine, Inc.), a data analysis and data mining company that built, operated and hosted data warehouses and analytics for some of the world’s largest enterprises in online publishing, retail, manufacturing, telecommunications and financial services. The company today specializes in Behavioral Targeting and advertising networks. Fayyad’s professional experience also includes five years spent leading the data mining and exploration group at Microsoft Research and building the data mining products for Microsoft’s server division. From 1989 to 1996 Fayyad held a leadership role at NASA’s Jet Propulsion Laboratory (JPL), where his work in the analysis and exploration of scientific databases gathered from observatories, remote-sensing platforms and spacecraft garnered him the top research excellence award that Caltech awards to JPL scientists, as well as a U.S. Government medal from NASA.

Fayyad earned his Ph.D. in engineering from the University of Michigan, Ann Arbor (1991), and also holds BSE’s in both electrical and computer engineering (1984); MSE in computer science and engineering (1986); and M.Sc. in mathematics (1989). He has published over 100 technical articles in the fields of data mining and Artificial Intelligence, is a Fellow of the AAAI and a Fellow of the ACM, has edited two influential books on the data mining and launched and served as editor-in-chief of both the primary scientific journal in the field of data mining and the primary newsletter in the technical community published by the ACM: SIGKDD Explorations.


Eric Siegel, Ph.D., Conference Chair

Eric SiegelThe president of Prediction Impact, Inc., Eric Siegel is an expert in predictive analytics and data mining and a former computer science professor at Columbia University, where he won awards for teaching, including graduate-level courses in machine learning and intelligent systems – the academic terms for predictive analytics. After Columbia, Dr. Siegel co-founded two software companies for customer profiling and data mining, and then started Prediction Impact in 2003, providing predictive analytics services and training to mid-tier through Fortune 100 companies.

Dr. Siegel is the instructor of the acclaimed training program, Predictive Analytics for Business, Marketing and Web, and the online version, Predictive Analytics Applied. He has published 13 papers in data mining research and computer science education, has served on 10 conference program committees, and has chaired a AAAI Symposium held at MIT.

you can register at http://www.predictiveanalyticsworld.com/register.php

Here is the pricing

Pricing
Predictive Analytics World Fall 2009

Includes breakfasts, lunches, priceless networking during coffee breaks, the PAW Reception, and full access to program sessions and sponsor expositions.

Super Early Bird Price
(till June 30)
Early Bird Price
(July 1 – Sept 4)
Regular     Price

Two Day Pass
(Oct 20-21)

$1190 $1390 $1590

Predictive Modeling Methods Workshop
(Oct 22)

$695 $795 $895

Putting Predictive Analytics to Work
(Oct 19)

$695 $795 $895

The discount code I can distribute to you  readers is the following: BLOGDC09 (15% off a two-day pass).You can do the maths…

(Ajay- Nopes I dont get money at all in these activities as blasted by some people
- but I do hope to get some good karma. Have a good time and book now).

Interview Karim Chine BIOCEP (Cloud Computing with R)

Here is an interview with Karim Chine of http://www.biocep.net/

Working with an R or Scilab on clusters/grids/clouds becomes as simple as working with them locally-

Karim Chine, Biocep.

Ajay- Please describe your career in the field of science. What advice would you give to young science graduates in this recession.

Karim- My original background is in theoretical Physics, I did my Master’s thesis at the Ecole Normale’s Statistical Physics Laboratory where I worked on phase separation in two-dimensional additive mixtures with Dr Werner Krauth. I came to computer science after graduating from the Ecole Polytechnique and I spent two years at TELECOM ParisTech studying software architecture and distributed systems design. I worked then for the IBM Paris Laboratory (VisualAge Pacbase applications’ generator), Schlumberger (Over the Air Platform and Web platform for smartcards personalization services), Air France (SSO deployment) and ILOG (OPL-CPLEX-ODM Development System). This gave me the intense exposure to real world large-scale software design. I crossed the borders of cultural, technical and organizational domains several times and I worked with a broad palette of technologies with some of the best and most innovative engineers. I moved to Cambridge in 2006 and I worked for the European Bioinformatics Institute. It’s where I started dealing with the integration of R into various types of applications. I left the EBI in November 2007. I was looking for an institutional support to help me in bringing into reality a vision that was becoming clearer and clearer about a universal platform for scientific and statistical computing. I failed in getting that support and I have been working on BIOCEP full time for most of the last 18 months without being funded. Few days of consultancy given here and there allowed me to keep going. I spent several weeks at Imperial College, at the National Center for e-Social Sciences and at Berkeley’s department of statistics during that period. Those visits were extremely useful in refining the use cases of my platform. I am still looking for a partner to back the project. You asked me to give advice. The unique advice I would give is to be creative and to try again and again to do what you really want to do. Crisises come and go, they will always do and extreme situations are part of life. I believe hard work and sincerity can prevail anything.

Ajay- Describe BIOCEP’s scope and ambition.

What are the current operational analytics that can be done by users having data.

Karim- My first ambition with BIOCEP is to deliver a universal platform for scientific and statistical computing and to create an open, federative and collaborative environment for the production, sharing and reuse of all the artifacts of computing. My second ambition is to enhance dramatically the accessibility of mathematical and statistical computing, to make HPC a commonplace and to put new analytical, numerical and processing capabilities in the hands of everyone (open science).

The Open source software Conquest has gone very far. Environments like R or Scilab, technologies like Java, Operating Systems like Linux-Ubuntu, and tools like OpenOffice are being used by millions of people. Very little doubt remains about the OSS’s final victory in some domains. The cloud is already a reality and it will take computing to a whole new realm. What is currently missing is the software that, by making the Cloud’s usage seamless, will create new ecosystems and will provide rooms for creativity, innovation and knowledge discovery of an unprecedented scale.

BIOCEP is one more building block into this. BIOCEP is built on top of R and Scilab and anything that you can do within those environments is accessible through BIOCEP. Here is what you have uniquely with this new R/Scilab-based e-platform:

High productivity via the most advanced cross-platform workbench available for the R environment.

Advanced Graphics: with BIOCEP, a graphic transducer allows the rendering on client side of graphics produced on server side and enables advanced capabilities like zooming/unzooming/scrolling for R graphics. a client side mouse tracker allows to display dynamically information related to the graphics and depending on coordinates. Several virtual R Devices showing different data can be coupled in zooming/scrolling and this helps comparing visually complex graphics.

Extensibility with plug-ins: new views (IDE-like views, analytical interfaces…) can be created very easily either programmatically or via drag-and-drop GUI designers.

Extensibility with server-side extensions: any java code can be packaged and used on server side. The code can interact seamlessly with R and Scilab or provide generic bridges to other software. For example, I provide an extension that allows you to use openoffice as a universal converter between various files formats on server side.

Seamless High Performance Computing: working with an R or Scilab on clusters/grids/clouds becomes as simple as working with them locally. Distributed computing becomes seamless, creating a large number R and Scilab remote engines and using them to solve large scale problems becomes easier than ever. From the R console the user can create logical links to existing R engines or to newly created ones and use those logical links to pilot the remote workers from within his R session. R functions enable using the logical links to import/export variables from the R session to the different workers and vice versa. R commands/scripts can be executed by the R workers synchronously or asynchronously. Many logical R links can be aggregated into one logical cluster variable that can be used to pilot the R workers in a coordinated way. A cluster.apply function allows the usage of the logical cluster to apply a function to a big data structure by slicing it and sending elementary execution commands to the workers. The workers apply the user’s function to the slices in parallel. The elementary results are aggregated to compose the final result that becomes available within the R session.

Collaboration: your R/scilab server running in the cloud can be accessed simultaneously by you and your collaborators. Everything gets broadcasted including Graphics. A spreadsheet enables to view and edit data collaboratively. Anyone can write plug-ins to take advantage of the collaborative capabilities of the frameworks. If your IP address is public, you can provide a URL to anyone and get him connect to your locally running R.

– Powerful frameworks for Java developers: BIOCEP provides Frameworks and tools to use R as if it was an Object Oriented Java Toolkit or a Web Toolkit for R-based dynamic application.

Webservices for C#, Perl, Python users/developers: Most of the capabilities of BIOCEP including piloting of R/Scilab engines on the cloud for distributed computing or for building scalable analytical web application are accessible from most of the programming languages thanks to the SOAP front-end.

RESTful API: simple URLs can perform computing using R/Scilab engines and return the result as an XML or as graphics in any format. This works like google charts and has all the power of R since the graphic is described with an R script provided as a parameter of the URL. The same API can be exposed on demand by the workbench. This allow for example to integrate a Cloud-R with Excel or OpenOffice. The workbench works as a bridge between the cloud and those applications.

Advanced Pooling framework for distributed resources: useful for deploying pools of R/scilab engines on multi nodes systems and get them used simultaneously by several distributed client processes in a scalable/optimal way. A supervision GUI is provided for a user friendly management of the pools/nodes/engines.

Simultaneous use of R and Scilab: Using java scripting, data can be transferred from R to Scilab and vice versa.

Ajay- Could you tell us about a successful BIOCEP installation and what it led to? Can BIOCEP be used by the rest of the R community for other packages? What would be an ideal BIOCEP user /customer for whom cloud based analytics makes more sense ?

Karim- BIOCEP is still in pre-beta stage. However it is a robust and polished pre-Beta that several organizations are already using. Janssen Pharmaceutica is using it to create and deliver statistical applications for drug discovery that use R engines running on their backend servers. The platform is foreseen there as the way to go for an ultimate optimization of some of their data analysis pipelines. Janssen’s head of statistics said to be very much interested in the capabilities given by BIOCEP to statisticians to create their own analytical User Interfaces and deliver them with their models without needing specific software development skills. Shell is creating BIOCEP-based applications prototypes to explore the feasibility and advantages of migrating some of Shell’s applications to the Cloud. One group from Shell Global Solutions is planning to use BIOCEP for running scilab in the cloud for Corrosion simulation modeling. Dr Ivo Dinov’s team at UCLA is studying the migration of some the SOCR applications to the BIOCEP platform as plug-ins and extensions. Dr Ivo Dinov also applied for an important grant for building DISCb (Distributed Infrastructure for Statistical Computing in Biomedicine). If the grant application is successful, BIOCEP will be the backbone at software architecture level of that new infrastructure. In cooperation with the Institute of Biostatistics, Leibniz University of Hannover, Bernd Bischl and Kornelius Rohmeyer have developed a framework to user friendly R-GUIs of different complexity. The toolkit uses BIOCEP as an R-backend since release 2.0. Several small projects have been implemented using this framework and some are in production such as an application for education in biostatistics at the University of Hannover. Also the ESNATS project is planning to use the BIOCEP frameworks. Some development is being done at the EBI to customize the workbench and use it to give to the end user the possibility to run R and Bioconductor on the EBI’s LSF cluster.

I’ve been in touch with Phil Butcher, Sanger’s head of IT and he is considering the deployment of BIOCEP on Sanger’s systems simultaneously with Eucalyptus. The same type of deployment has been discussed with the director of OMII-UK, Neil Chue Hong. BIOCEP’s deployment is probably going to follow the deployment of the Eucalyptus System on NGS. Tena Sakai deployed BIOCEP at the Ernest Gallo Clinic and Research Centre and he is currently exploring the usage of the R on the Cloud via BIOCEP (Eucalyptus / AWS). The platform has been deployed by a small consultancy company specializing in R on several London-based investment banks’ systems. I have had a go ahead form Nancy Wilkins Diher (Director for Science Gateways, SDSC) for deploying on TeraGrid, a deployment on EGEE has been discussed with Dr Steven Newhouse (EGEE Technical Director). Both deployments are in standby at the moment.

Quest Diagnostics is planning to use BIOCEP extensively. Sudeep Talati (University of Manchester) is doing his Master’s project on BIOCEP. He is supervised by Professor Andy Brass and he is exploring the use of a BIOCEP-based infrastructure to deliver microarray analysis workflows in a simple and intuitive way to biologists with and without the Cloud. In Manchester, Robin Pinning (e-Science team leader, Research Computing Services) has the deployment of BIOCEP on Manchester’s research cluster on his agenda…

As I have said, anything that you can do with R including installing, loading and using any R package is accessible through BIOCEP. The platform aims to be universal and to become a tool for productivity and collaboration used by everyone dealing with computing/analytics with or without the cloud.

The Cloud whether it is public or private will be generalized and everyone will become a cloud user in one way or another

Ajay- What motivated you to build BIOCEP and mash cloud computing and R. What scope do you see for cloud computing in developing countries in Asia and Africa?

Karim– When I was at the EBI, I worked on the integration of R within scalable web applications. I explored and tested the available frameworks and tools and all of them were too low level or too simple to answer the problem. I decided to build new frameworks. I had the opportunity to be able to stand on the shoulders of giants.

Simon Urbanek’s packages already bridged the C-API of R with Java reliably. Martin Morgan’s RWebsevices package defined class mappings between R types, including S4 classes, and java.

Progressively R became usable as a Java object oriented toolkit, then as a Java Server. Then I built a pooling framework for distributed resources that made it possible for multiple clients to use multiple R engines optimally.

I started building a GUI to validate the server’s increasingly sophisticated API. That GUI became progressively the workbench.

When I was at Imperial, I worked with the National Grid Service team at the Oxford e-Research Centre to deploy my platform on Oxford’s core cluster. That deployment led to many changes in the architecture to meet all the security requirements.

It was obvious that the next step was to make BIOCEP available on Amazon’s Cloud. Academic Grids are for researchers and the cloud is for everyone. Making the platform work seamlessly on EC2 took few months. With the cloud came the focus on collaborative features (collaborative views, graphics, spreadsheets…).

I can only talk about the example of a country I know, Tunisia, and I guess some of this applies to Asian Countries. Even if the broadband is everywhere today and is becoming accessible and affordable by a majority of Tunisians, I am not sure that the adoption of the cloud would happen soon.

Simple considerations like the obligation to pay for the compute cycles in dollars (and not in dinars) are a barrier for adoption. Spending foreign currencies is subject to several restrictions in general for companies and for individuals; few Tunisians have credit cards that can be used to pay Amazon. Companies would prefer to buy and administer their own machines because the cost of operation and maintenance is lower in Tunisia than it is in Europe/US.

Even if the cloud would help in giving Tunisian researchers access to affordable Computing cycles on demand, it seems that most of them have learned to live without HPC resources and that their research is more theoretical and less computational than it could be. Others are collaborating with research groups in Europe (France) and they are using those European groups’ infrastructures.

Ajay- How would BIOCEP address the problem of data hygiene, data security and privacy. Is encrypted and compressed data transfers supported or planned?

Karim- With BIOCEP, a computational engine is exposed as a distributed component via a single mono-directional HTTP port. When you run such an engine on an EC2 instance you have two options:

  • 1/ totally sandbox the machine (via the security group) and leave only the SSH port open.
  • Private Key authentication is required to access the machine. In this case you use an SSH Tunnel (created with a tool like Putty for example) which allows you to see the engine as if it was running on your local machine on a port of your choice, the one specified for creating the Tunnel.
  • When you start the Virtual Workbench and connect in Http mode to your local host via the specified port, you are effectively connecting to the EC2-R engine. 100% of the information exchanged between your workbench and the engine, including your data, is ciphered thanks to the SSH tunnel.
  • The virtual workbench embeds JSCH and can create the Tunnel for you automatically. This mode doesn’t allow collaboration since it requires the private key to let the workbench talk to the EC2 R/Scilab engine.
  • 2/ tell the EC2 machine at startup (via the “user data”) to require specific credentials from the user. When the machine starts running, the user needs to provide those credentials to get a session ID and to be able to pilot a virtual EC2 R/Scilab engine. This mode enables collaboration. The client (workbench/scripts) connects to the EC2 machine instance via HTTP (will be HTTPS in a near future).

Ajay- Suppose I have 20 gb per month of data and my organization decided to cut back on the number of annual expensive software. How can the current version of BIOCEP help me do the following?

Karim– Ways BIOCEP can help you right now.

1) Data aggregation and Reporting in terms of spreadsheet, presentation and graphs

  • BIOCEP provides a highly programmable server side spreadsheet.
  • It can be used interactively as a view of the workbench and simple clicks allow the transfer of data form cells to R variables and vice versa. It can be created and populated from R (console / scripts).
  • Any R function can be used within dynamically computed cells. The evaluation of those dynamic cells is done on server side and can use high performance computing functions. Macros allow adding reactivity to the spreadsheets.
  • A macro allows the user to execute any R code in response to a value change of an R variable or of the content of a range within a spreadsheet. Variables docking macros allow the mirroring of R variables of any type (vectors, matrixes, data frames..) with ranges within the spreadsheet in Read/Write mode

. Several ready-to-use User Interface components can be created and docked anywhere within the spreadsheet. Those components include

  • an R Graphics viewer (PDF viewer) showing Graphics produced by a user-defined R script and reactive on user-defined variables and cell ranges changes,
  • customizable sliders mirroring R variables,
  • Buttons executing user-defined R code when pressed,
  • Combo boxes mirroring factor variables ..

The spreadsheet-based analytical user interface can pilot an R running at any location (local R, Grid R, Cloud R…). It can be created in minutes just by pointing, clicking and copy/pasting.

Cells content+macros+reactive docked components can be saved in a zip file and become a Workbench plug-ins. Like all BIOCEP plug-ins, the spreadsheet-based GUI can be delivered to the end user via a simple URL. It can use a cloud-R or a local R created transparently on the user’s machine.

2) Build time series models, regression models

BIOCEP’s workbench is extensible and I am hoping that contributors will soon start writing plug-ins or converting available GUIs to BIOCEP plug-ins in order to make the creation of those models as easy as possible.

Biography-

Karim Chine
Karim chine graduated from the French Ecole Polytechnique and TELECOM ParisTech. He worked at Ecole Normale Supérieure-LPS (phase separation in two-dimensional additive mixture), IBM (VisualAge Pacbase), Schlumberger (Over the Air Platform and Web platform for smartcards personalization services), Air France (SSO deployment), ILOG (OPL-CPLEX-ODM Development System), European Bioinformatics Institute (Expression Profiler, Biocep) and Imperial College London-Internet Center (Biocep). He contributed to open source software (AdaBroker) and he is the author of the Biocep platform. He currently works on the seamless integration of the new platform within utility computing infrastructures (Amazon EC2), its deployment on Grids (NGS) and its usage as a tool for education and he tries to build collaborations with academic and industrial partners.

You can view his resume here http://www.biocep.net/scan/CV_Karim_Chine_June_2009.pdf