Here comes PySpread- 85,899,345 rows and 14,316,555 columns

A Bold GNU Head
Image via Wikipedia

Whats new/ One more open source analytics package. Built like a spreadsheet with an ability to import a million cells-

From http://pyspread.sourceforge.net/index.html

about Pyspread is a cross-platform Python spreadsheet application. It is based on and written in the programming language Python.

Instead of spreadsheet formulas, Python expressions are entered into the spreadsheet cells. Each expression returns a Python object that can be accessed from other cells. These objects can represent anything including lists or matrices.

Pyspread screenshot
features In pyspread, cells expect Python expressions and return Python objects. Therefore, complex data types such as lists, trees or matrices can be handled within a single cell. Macros can be used for functions that are too complex for a single expression.

Since Python modules can be easily used without external scripts, arbitrary size rational numbers (via gmpy), fixed point decimal numbers for business calculations, (via the decimal module from the standard library) and advanced statistics including plotting functions (via RPy) can be used in the spreadsheet. Everything is directly available from each cell. Just use the grid

Data can be imported and exported using csv files or the clipboard. Other forms of data exchange is possible using external Python modules.

In  order to simplify sparse matrix editing, pyspread features a three dimensional grid that can be sized up to 85,899,345 rows and 14,316,555 columns (64 bit-systems, depends on row height and column width). Note that importing a million cells requires about 500 MB of memory.

The concept of pyspread allows doing everything from each cell that a Python script can do. This may very well include deleting your hard drive or sending your data via the Internet. Of course this is a non-issue if you sandbox properly or if you only use self developed spreadsheets. Since this is not the case for everyone (see the discussion at lwn.net), a GPG signature based trust model for spreadsheet files has been introduced. It ensures that only your own trusted files are executed on loading. Untrusted files are displayed in safe mode. You can trust a file manually. Inspect carefully.

Pyspread screenshot

requirements Pyspread runs on Linux, Windows and *nix platforms with GTK+ support. There are reports that it works with MacOS X as well. If you would like to contribute by testing on OS X please contact me.

Dependencies

Highly recommended for full functionality

  • PyMe >=0.8.1, Note for Windows™ users: If you want to use signatures without compiling PyMe try out Gpg4win.
  • gmpy >=1.1.0 and
  • rpy >=1.0.3.
maturity Pyspread is in early Beta release. This means that the core functionality is fully implemented but the program needs testing and polish.

and from the wiki

http://sourceforge.net/apps/mediawiki/pyspread/index.php?title=Main_Page

a spreadsheet with more powerful functions and data structures that are accessible inside each cell. Something like Python that empowers you to do things quickly. And yes, it should be free and it should run on Linux as well as on Windows. I looked around and found nothing that suited me. Therefore, I started pyspread.

Concept

  • Each cell accepts any input that works in a Python command line.
  • The inputs are parsed and evaluated by Python’s eval command.
  • The result objects are accessible via a 3D numpy object array.
  • String representations of the result objects are displayed in the cells.

Benefits

  • Each cell returns a Python object. This object can be anything including arrays and third party library objects.
  • Generator expressions can be used efficiently for data manipulation.
  • Efficient numpy slicing is used.
  • numpy methods are accessible for the data.

Installation

  1. Download the pyspread tarball or zip and unzip at a convenient place
  2. In case you do not have it already get and install Python, wxpython and numpy
If you want the examples to work, install gmpy, R and rpy
Really do check the version requirements that are mentioned on http://pyspread.sf.net
  1. Get install privileges (e.g. become root)
  2. Change into the directory and type
python setup.py install
Windows: Replace “python” with your Python interpreter (absolute path)
  1. Become normal user again
  2. Start pyspread by typing
pyspread
  1. Enjoy

Links

Next on Spreadsheet wishlist-

a MSI bundle /Windows Self Installer which has all dependencies bundled in it-linking to PostGresSQL 😉 etc

way to go Mr Martin Manns

mmanns < at > gmx < dot > net

LibreOffice News and Google Musings

Tux, the Linux penguin
Image via Wikipedia

Official Bloggers on LibreOffice- http://planet.documentfoundation.org/

Note- for some strange reason I continue to be on top ranked LibreOffice blogs- maybe because I write more on the software itself than on Oracle politics or coffee spillovers.

LibreOffice Beta 2  is ready and I just installed it on Windows 7 – works nice- and I somehow think open Office and Google needs an  example to stop being so scary on cautioning—— hey,hey it’s a  beta – (do you see Oracle saying this release is a beta or Windows saying hey this Windows Vista is a beta for Windows 7- No right?)-

see screenshot of solver in  LibreOffice spreadsheet -works just fine.

We cant wait for Chromium OS and LibreOffice integration (or Google Docs-LibreOffice integration)  so Google starts thinking on those lines (of course

Google also needs to ramp up Google Storage and Google Predict API– but dude are you sure you wanna take on Amazon, Oracle and MS and Yahoo and Apple at the same time. Dear Herr Schmidt- Last German Guy who did that ,  ended up in a bunker in Berlin. (Ever since I had to pay 50 euros as Airline Transit fee -yes Indian passport holders have to do that in Germany- I am kind of non objective on that issue)

Google Management is busy nowadays thinking of trying to beat Facebook -hint -hint-

-buy out the biggest app makers of Facebook apps and create an api for Facebook info download and upload into Orkut –maybe invest like an angel in that startup called Diaspora http://www.joindiaspora.com/) see-

Back to the topic (and there are enough people blogging on Google should or shouldnt do)

-LibreOffice aesthetically rocks! It has a cool feel.

More news- The Wiki is up and awaits you at http://wiki.documentfoundation.org/Documentation

And there is a general pow-wow scheduled at http://www.oookwv.de/ for the Open Office Congress (Kongress)

As you can see I used the Chrome Extension for Google Translate for an instant translation from German into English (though it still needs some work,  Herr Translator)

Back to actually working on LibreOffice- if Word and Powerpoint is all you do- save some money for Christmas and download it today from

LibreOffice Beta 2 (Office Fork off Oracle) launches!

 

Windows 7, the latest client version in the Mi...
Image via Wikipedia

 

Announcement from Code Ninjas at Document Foundation

10 years after the StarOffice code has been opened as OpenOffice.org, The Document Foundation is proud to announce the availability of LibreOffice Beta 2 for public testing. Please, download the suitable package(s) from

http://www.documentfoundation.org/download/

 

Ajay- Note that first beta was downloaded almost 100,000 times

install them, and start testing! Should you find bugs, please report them to the FreeDesktop Bugzilla:

https://bugs.freedesktop.org

If you want to get involved in this exciting project, you can contribute code:

http://www.documentfoundation.org/develop/

translate LibreOffice to your language:

http://www.freedesktop.org/wiki/Software/LibreOffice/i18n/translating_3.3

or just donate:

http://www.documentfoundation.org/contribution/
A list of known issues with Beta 2 is available in our wiki:

http://wiki.documentfoundation.org/Beta2

Beta Release Notes

This beta release is not intended for production use!

There are a number of known issues being worked on:

  • The Windows build is an International build – you can choose the user interface language that is suitable for you, but the help is always English. We are currently working on improving the delivery mechanism to be able to provide you with the localized help. We are also working on smaller problems like wrong description of several languages.
  • The Linux and MacOSX builds are English builds with the possibility to install language packs. Please browse the archives to get the langugage pack you need for your platform.
  • The LibreOffice branding and renaming is new and work in progress. You may still see old graphics, icons or websites. So please bear with us. This also applies to the BrOffice.org branding – applicable in Brazil.
  • Filters for the legacy StarOffice binary formats are missing.

I tested it- it seems okay enough. Once again Open Source tends to underplay expectations (when was the last time you saw that in enterprise software?)

Ubuntu one goes musical

Heavenly choirs singing? Not quite, but music streaming on a cloudy platform seems like a pretty cool thing.-

readhttp://voices.canonical.com/ubuntuone/?p=617

:

Ubuntu One Basic – available now
This is the same as the current free 2 GB option but with a new name. Users can continue to sync files, contacts, bookmarks and notes for free as part of our basic service and access the integrated Ubuntu One Music Store. We are also extending our platform support to include a Windows client, which will be available in Beta very soon.

Ubuntu One Mobile – available October 7th
Ubuntu One Mobile is our first example of a service that helps you do more with the content stored in your personal cloud. With Ubuntu One Mobile’s main feature – mobile music streaming – users can listen to any MP3 songs in their personal cloud (any owned MP3s, not just those purchased from the Ubuntu One Music Store) using our custom developed apps for iPhone and Android (coming soon to their respective marketplaces). These will be open source and available from Launchpad. Ubuntu One Mobile will also include the mobile contacts sync feature that was launched in Beta for the 10.04 release.

Ubuntu One Mobile is available for $3.99 (USD) per month or $39.99 (USD) per year. Users interested in this add-on can try the service free for 30 days. Ubuntu One Mobile will be the perfect companion to your morning exercise, daily commute, and weekend at the beach – we’re really excited to bring you this service!

Ubuntu One 20-Packs – available now
A 20-Pack is 20 GB of storage for files, contacts, notes, and bookmarks. Users will be able to add multiple 20-Packs at $2.99 (USD) per month or $29.99 (USD) per year each. If you start with Ubuntu One Basic (2 GB) and add 1 20-Pack (20 GB), you will have 22 GB of storage.

All add-ons are available for purchase in multiple currencies – USD, EUR and, recently added, GBP.

Users currently paying for the old 50 GB plan (including mobile contacts sync) can either keep their existing service or switch to the new plans structure to get more value from Ubuntu One at a lower price.

Oracle Open World/ RODM package

From the press release, here comes Oracle Open World. They really have an excellent rock concert in that as well.

.NET and Windows @ Oracle Develop and Oracle OpenWorld 2010

Oracle Develop will again feature a .NET track for Oracle developers. Oracle Develop is suited for all levels of .NET developers, from beginner to advanced. It covers introductory Oracle .NET material, new features, deep dive application tuning, and includes three hours of hands-on labs apply what you learned from the sessions.

To register, go to Oracle Develop registration site.

Oracle OpenWorld will include several sessions on using the Oracle Database on Windows and .NET.

Session schedules and locations for Windows and .NET sessions at Oracle Develop and OpenWorld are now available.

Download: 32-bit ODAC 11.2.0.1.2 for Visual Studio 2010 and .NET Framework 4

With ODAC 11.2.0.1.2, developers can connect to Oracle Database versions 9.2 and higher from Visual Studio 2010 and .NET Framework 4. ODAC components support the full framework, as well as the new .NET Framework Client Profile.

Statement of Direction: Oracle Database and Microsoft Entity Framework

Learn about Oracle’s beta and production plans to support Microsoft Entity Framework with Oracle Database.

Also see http://www.oracle.com/technetwork/articles/datawarehouse/saternos-r-161569.html

for

Data Mining Using the RDOM Package

By Casimir Saternos

Some excerpts-

Open R and enter the following command.

> library(RODM)

This command loads the RODM library and as well the dependent RODBC package. The next step is to make a database connection.

> DB <- RODM_open_dbms_connection(dsn="orcl", uid="dm", pwd="dm")

Subsequent commands use the DB object (an instance of the RODBC class) to connect to the database. The DNS specified in the command is the name you used earlier for the Data Source Name during the ODBC connection configuration. You can view the actual R code being executed by the command by simply typing the function name (without parentheses).

> RODM_open_dbms_connection

And say making a Model in Oracle and R-

> numrows <- length(orange_data[,1])
> orange_data.rows <- length(orange_data[,1])
> orange_data.id <- matrix(seq(1, orange_data.rows),  nrow=orange_data.rows, ncol=1, dimnames= list(NULL, c(“CASE_ID”)))
> orange_data <- cbind(orange_data.id, orange_data)

This adjustment to the data frame then needs to be propagated to the database. You can confirm the change using the sqlColumns function, as listed earlier.

> RODM_create_dbms_table(DB, "orange_data")
> sqlColumns(DB, 'orange_data')$COLUMN_NAME

> glm <- RODM_create_glm_model(
database = DB,
data_table_name = “orange_data”,
case_id_column_name = “CASE_ID”,
target_column_name = “circumference”,
model_name = “GLM_MODEL”,
mining_function = “regression”)

Information about this model can then be obtained by analyzing value returned from the model and stored in the variable named glm.

> glm$model.model_settings
> glm$glm.globals
> $glm.coefficients

Once you have a model, you can apply the model to a new set of data. To begin, create or retrieve sample data in the same format as the training data.

> query<-('select 999 case_id, 1 tree, 120 age, 
32 circumference from dual')

> orange_test<-sqlQuery(DB, query)
> RODM_create_dbms_table(DB, "orange_test")
and 
Finally, the model can be applied to the new data set and the results analyzed.

results <- RODM_apply_model(database = DB, 
data_table_name = "orange_test",
model_name = "GLM_MODEL",
supplemental_cols = "circumference")

When your session is complete, you can clean up objects that were created (if you like) and you should close the database connection:

> RODM_drop_model(database=DB,'GLM_MODEL')
> RODM_drop_dbms_table(DB, "orange_test")
> RODM_drop_dbms_table(DB, "orange_data")
> RODM_close_dbms_connection(DB)

See the full article at http://www.oracle.com/technetwork/articles/datawarehouse/saternos-r-161569.html

Q&A with David Smith, Revolution Analytics.

Here’s a group of questions and answers that David Smith of Revolution Analytics was kind enough to answer post the launch of the new R Package which integrates Hadoop and R-                         RevoScaleR

Ajay- How does RevoScaleR work from a technical viewpoint in terms of Hadoop integration?

David-The point isn’t that there’s a deep technical integration between Revolution R and Hadoop, rather that we see them as complementary (not competing) technologies. Hadoop is amazing at reliably (if slowly) processing huge volumes of distributed data; the RevoScaleR package complements Hadoop by providing statistical algorithms to analyze the data processed by Hadoop. The analogy I use is to compare a freight train with a race car: use Hadoop to slog through a distributed data set and use Map/Reduce to output an aggregated, rectangular data file; then use RevoScaleR to perform statistical analysis on the processed data (and use the speed of RevolScaleR to iterate through many model options to find the best one).

Ajay- How is it different from MapReduce and R Hipe– existing R Hadoop packages?
David- They’re complementary. In fact, we’ll be publishing a white paper soon by Saptarshi Guha, author of the Rhipe R/Hadoop integration, showing how he uses Hadoop to process vast volumes of packet-level VOIP data to identify call time/duration from the packets, and then do a regression on the table of calls using RevoScaleR. There’s a little more detail in this blog post: http://blog.revolutionanalytics.com/2010/08/announcing-big-data-for-revolution-r.html
Ajay- Is it going to be proprietary, free or licensable (open source)?
David- RevoScaleR is a proprietary package, available to paid subscribers (or free to academics) with Revolution R Enterprise. (If you haven’t seen it, you might be interested in this Q&A I did with Matt Shotwell: http://biostatmatt.com/archives/533 )
Ajay- Any existing client case studies for Terabyte level analysis using R.
David- The VOIP example above gets close, but most of the case studies we’ve seen in beta testing have been in the 10’s to 100’s of Gb range. We’ve tested RevoScaleR on larger data sets internally, but we’re eager to hear about real-life use cases in the terabyte range.
Ajay- How can I use RevoScaleR on my dual chip Win Intel laptop for say 5 gb of data.
David- One of the great things about RevoScaleR is that it’s designed to work on commodity hardware like a dual-core laptop. You won’t be constrained by the limited RAM available, and the parallel processing algorithms will make use of all cores available to speed up the analysis even further. There’s an example in this white paper (http://info.revolutionanalytics.com/bigdata.html) of doing linear regression on 13Gb of data on a simple dual-core laptop in less than 5 seconds.
AJ-Thanks to David Smith, for this fast response and wishing him, Saptarshi Guha Dr Norman Nie and the rest of guys at Revolution Analytics a congratulations for this new product launch.

Google Web Intelligence (Beta)

Here is a screenshot from the kind of insights that can be created by the new Intelligence features in the free Google Analytics.

It can be used in websites as well as technical support websites to help create customer segments based on Behavior of visitors.


Avg. Time on Site

00:03:23 81%
expected: 00:01:23-00:01:59
Total Traffic Significance:
00:06:29 180%
expected: 00:01:59-00:02:48
Landing Page: /
36 Visits (15.3% of total)
Significance:

Bounce Rate

51.49% 29%
expected: 71.50%-73.53%
Total Traffic Significance:
49.69% 30%
expected: 67.03%-73.71%
Visitor Type: New Visitor
163 Visits (69.4% of total)
Significance:
53.23% 27%
expected: 68.88%-76.93%
Country/Territory: United States
124 Visits (52.8% of total)
Significance:
55.56% 26%
expected: 70.66%-79.68%
Visitor Type: Returning Visitor
72 Visits (30.6% of total)
Significance:

Pageviews

578 162%
expected: 199-221
Total Traffic Significance:
333 233%
expected: 95-108
Country/Territory: United States
124 Visits (52.8% of total)
Significance:
428 178%
expected: 136-170
Visitor Type: New Visitor
163 Visits (69.4% of total)
Significance:
213 168%
expected: 70-84
Medium: referral
93 Visits (39.6% of total)
Significance:
116 86%
expected: 61-87
Source: google
56 Visits (23.8% of total)
Significance:
150 122%
expected: 62-76
Visitor Type: Returning Visitor
72 Visits (30.6% of total)
Significance:

Visitors

201 74%
expected: 111-120
Total Traffic Significance:

Visits

235 97%
expected: 112-124
Total Traffic Significance:
124 112%
expected: 0-58
Country/Territory: United States
124 Visits (52.8% of total)
Significance:
75 115%
expected: 0-41
Source: (direct)
75 Visits (31.9% of total)
Significance:
163 95%
expected: 0-85
Visitor Type: New Visitor
163 Visits (69.4% of total)
Significance:
93 144%
expected: 0-41
Medium: referral
93 Visits (39.6% of total)
Significance:
72 76%
expected: 0-43
Visitor Type: Returning Visitor
72 Visits (30.6% of total)
Significance:
51 107%
expected: 0-25
Source: linkedin.com
51 Visits (21.7% of total)
Significance:
48 98%
expected: 0-26
Referral Path: linkedin.com/news
48 Visits (20.4% of total)
Significance: