Home » Posts tagged 'sql'
Tag Archives: sql
Databases in the cloud
One more day of me mucking around MySQL and Amazon (hoping to get to the R)
Related articles
- Copying MySQL Databases to Another Machine
http://dev.mysql.com/doc/refman/5.0/en/copying-databases.html
- Announcing Rackspace MySQL Cloud Database Private Beta (rackspace.com)
- Amazon Web Services adds free version of database in the cloud (techworld.com.au)
Data Frame in Python
Exploring some Python Packages and R packages to move /work with both Python and R without melting your brain or exceeding your project deadline
—————————————
If you liked the data.frame structure in R, you have some way to work with them at a faster processing speed in Python.
Here are three packages that enable you to do so-
(1) pydataframe
http://code.google.com/p/pydataframe/
An implemention of an almost R like DataFrame object. (install via Pypi/Pip: “pip install pydataframe”)
Usage:
u = DataFrame( { "Field1": [1, 2, 3], "Field2": ['abc', 'def', 'hgi']}, optional: ['Field1', 'Field2'] ["rowOne", "rowTwo", "thirdRow"])
A DataFrame is basically a table with rows and columns.
Columns are named, rows are numbered (but can be named) and can be easily selected and calculated upon. Internally, columns are stored as 1d numpy arrays. If you set row names, they’re converted into a dictionary for fast access. There is a rich subselection/slicing API, see help(DataFrame.get_item) (it also works for setting values). Please note that any slice get’s you another DataFrame, to access individual entries use get_row(), get_column(), get_value().
DataFrames also understand basic arithmetic and you can either add (multiply,…) a constant value, or another DataFrame of the same size / with the same column names, like this:
#multiply every value in ColumnA that is smaller than 5 by 6.
my_df[my_df[:,'ColumnA'] < 5, 'ColumnA'] *= 6
#you always need to specify both row and column selectors, use : to mean everything
my_df[:, 'ColumnB'] = my_df[:,'ColumnA'] + my_df[:, 'ColumnC']
#let's take every row that starts with Shu in ColumnA and replace it with a new list (comprehension)
select = my_df.where(lambda row: row['ColumnA'].startswith('Shu'))
my_df[select, 'ColumnA'] = [row['ColumnA'].replace('Shu', 'Sha') for row in my_df[select,:].iter_rows()]
Dataframes talk directly to R via rpy2 (rpy2 is not a prerequiste for the library!)
(2) pandas
http://pandas.pydata.org/
Library Highlights
- A fast and efficient DataFrame object for data manipulation with integrated indexing;
- Tools for reading and writing data between in-memory data structures and different formats: CSV and text files, Microsoft Excel, SQL databases, and the fast HDF5 format;
- Intelligent data alignment and integrated handling of missing data: gain automatic label-based alignment in computations and easily manipulate messy data into an orderly form;
- Flexible reshaping and pivoting of data sets;
- Intelligent label-based slicing, fancy indexing, and subsetting of large data sets;
- Columns can be inserted and deleted from data structures for size mutability;
- Aggregating or transforming data with a powerful group by engine allowing split-apply-combine operations on data sets;
- High performance merging and joining of data sets;
- Hierarchical axis indexing provides an intuitive way of working with high-dimensional data in a lower-dimensional data structure;
- Time series-functionality: date range generation and frequency conversion, moving window statistics, moving window linear regressions, date shifting and lagging. Even create domain-specific time offsets and join time series without losing data;
- The library has been ruthlessly optimized for performance, with critical code paths compiled to C;
- Python with pandas is in use in a wide variety of academic and commercial domains, including Finance, Neuroscience, Economics, Statistics, Advertising, Web Analytics, and more.
Why not R?
First of all, we love open source R! It is the most widely-used open source environment for statistical modeling and graphics, and it provided some early inspiration for pandas features. R users will be pleased to find this library adopts some of the best concepts of R, like the foundational DataFrame (one user familiar with R has described pandas as “R data.frame on steroids”). But pandas also seeks to solve some frustrations common to R users:
- R has barebones data alignment and indexing functionality, leaving much work to the user. pandas makes it easy and intuitive to work with messy, irregularly indexed data, like time series data. pandas also provides rich tools, like hierarchical indexing, not found in R;
- R is not well-suited to general purpose programming and system development. pandas enables you to do large-scale data processing seamlessly when developing your production applications;
- Hybrid systems connecting R to a low-productivity systems language like Java, C++, or C# suffer from significantly reduced agility and maintainability, and you’re still stuck developing the system components in a low-productivity language;
- The “copyleft” GPL license of R can create concerns for commercial software vendors who want to distribute R with their software under another license. Python and pandas use more permissive licenses.
(3) datamatrix
http://pypi.python.org/pypi/datamatrix/0.8
datamatrix 0.8
A Pythonic implementation of R’s data.frame structure.
Latest Version: 0.9
This module allows access to comma- or other delimiter separated files as if they were tables, using a dictionary-like syntax. DataMatrix objects can be manipulated, rows and columns added and removed, or even transposed
—————————————————————–
Modeling in Python
Decisionstats.com is back from a dDOS
- Servers were okay, it was the DNS server that got swamped.
- I am sorry for the downtime- hopefully you didnt even notice
- I have faced challenges like domain name hijacking, sql injection , malicious WP plugins and thats why shifted to a professional hosting. I stand by my vendors and their professional judgement, moving away would mean the hackers won.
- This was very clever to swamp the DNS provider- my compliments to the tech talent behind this.
- You would think that every webmaster would have a back up plan in case his site went dDOS, but surprisingly even corporate websites dont have a back up (under attack) plan
Anonymous grows up and matures…Anonanalytics.com
I liked the design, user interfaces and the conceptual ideas behind the latest Anonymous hactivist websites (much better than the shabby graphic design of Wikileaks, or Friends of Wikileaks, though I guess they have been busy what with Julian’s escapades and Syrian emails)
I disagree (and let us agree to disagree some of the time)
with the complete lack of respect for Graphical User Interfaces for tools. If dDOS really took off due to LOIC, why not build a GUI for SQL Injection (or atleats the top 25 vulnerability testing as by this list
http://www.sans.org/top25-software-errors/
Shouldnt Tor be embedded within the next generation of Loic.
Automated testing tools are used by companies like Adobe (and others)… so why not create simple GUI for the existing tools.., I may be completely offtrack here.. but I think hacker education has been a critical misstep[ that has undermined Western Democracies preparedness for Cyber tactics by hostile regimes)…. how to create the next generation of hackers by easy tutorials (see codeacademy and build appropriate modules)
-A slick website to be funded by Bitcoins (Money can buy everything including Mastercard and Visa, but Bitcoins are an innovative step towards an internet economy currency)
-A collobrative wiki
http://wiki.echelon2.org/wiki/Main_Page
Seriously dude, why not make this a part of Wikipedia- (i know Jimmy Wales got shifty eyes, but can you trust some1 )
-Analytics for Anonymous (sighs! I should have thought about this earlier)
http://anonanalytics.com/
(can be used to play and bill both sides of corporate espionage and be cyber private investigators)
What We Do
We provide the public with investigative reports exposing corrupt companies. Our team includes analysts, forensic accountants, statisticians, computer experts, and lawyers from various jurisdictions and backgrounds. All information presented in our reports is acquired through legal channels, fact-checked, and vetted thoroughly before release. This is both for the protection of our associates as well as groups/individuals who rely on our work.
_and lastly creative content for Pinterest.com and Public Relations ( what next-? Tom Cruise to play Julian Assange in the new Movie ?)
http://www.par-anoia.net/ />Potentially Alarming Research: Anonymous Intelligence AgencyInformation is and will be free. Expect it. ~ Anonymous
Links of interest
- Latest Scientology Mails (Austria)
- Full FBI call transcript
- Arrest Tracker
- HBGary Email Viewer
- The Pirate Bay Proxy
- We Are Anonymous – Book
- To be announced…
Google Cloud is finally here
Amazon gets some competition, and customers should see some relief, unless Google withdraws commitment on these products after a few years of trying (like it often does now!)
http://cloud.google.com/products/index.html
| Machine Type Pricing | ||||||
|---|---|---|---|---|---|---|
| Configuration | Virtual Cores | Memory | GCEU * | Local disk | Price/Hour | $/GCEU/hour |
| n1-standard-1-d | 1 | 3.75GB *** | 2.75 | 420GB *** | $0.145 | 0.053 |
| n1-standard-2-d | 2 | 7.5GB | 5.5 | 870GB | $0.29 | 0.053 |
| n1-standard-4-d | 4 | 15GB | 11 | 1770GB | $0.58 | 0.053 |
| n1-standard-8-d | 8 | 30GB | 22 | 2 x 1770GB | $1.16 | 0.053 |
| Network Pricing | |
|---|---|
| Ingress | Free |
| Egress to the same Zone. | Free |
| Egress to a different Cloud service within the same Region. | Free |
| Egress to a different Zone in the same Region (per GB) | $0.01 |
| Egress to a different Region within the US | $0.01 **** |
| Inter-continental Egress | At Internet Egress Rate |
| Internet Egress (Americas/EMEA destination) per GB | |
| 0-1 TB in a month | $0.12 |
| 1-10 TB | $0.11 |
| 10+ TB | $0.08 |
| Internet Egress (APAC destination) per GB | |
| 0-1 TB in a month | $0.21 |
| 1-10 TB | $0.18 |
| 10+ TB | $0.15 |
| Persistent Disk Pricing | |
|---|---|
| Provisioned space | $0.10 GB/month |
| Snapshot storage** | $0.125 GB/month |
| IO Operations | $0.10 per million |
| IP Address Pricing | |
|---|---|
| Static IP address (assigned but unused) | $0.01 per hour |
| Ephemeral IP address (attached to instance) | Free |
** coming soon
*** 1GB is defined as 2^30 bytes
**** promotional pricing; eventually will be charged at internet download rates
Google Prediction API
Tap into Google’s machine learning algorithms to analyze data and predict future outcomes.
Leverage machine learning without the complexity
Use the familiar RESTful interface
Enter input in any format – numeric or text
Build smart apps
Learn how you can use Prediction API to build customer sentiment analysis, spam detection or document and email classification.
Google Translation API
Use Google Translate API to build multilingual apps and programmatically translate text in your webpage or application.
Translate text into other languages programmatically
Use the familiar RESTful interface
Take advantage of Google’s powerful translation algorithms
Build multilingual apps
Learn how you can use Translate API to build apps that can programmatically translate text in your applications or websites.
Google BigQuery
Analyze Big Data in the cloud using SQL and get real-time business insights in seconds using Google BigQuery. Use a fully-managed data analysis service with no servers to install or maintain.
Figure
Reliable & Secure
Complete peace of mind as your data is automatically replicated across multiple sites and secured using access control lists.
Scale infinitely
You can store up to hundreds of terabytes, paying only for what you use.
Blazing fast
Run ad hoc SQL queries on
multi-terabyte datasets in seconds.
Google App Engine
Create apps on Google’s platform that are easy to manage and scale. Benefit from the same systems and infrastructure that power Google’s applications.
Focus on your apps
Let us worry about the underlying infrastructure and systems.
Scale infinitely
See your applications scale seamlessly from hundreds to millions of users.
Business ready
Premium paid support and 99.95% SLA for business users
Interview Alvaro Tejada Galindo, SAP Labs Montreal, Using SAP Hana with #Rstats
Here is a brief interview with Alvaro Tejada Galindo aka Blag who is a developer working with SAP Hana and R at SAP Labs, Montreal. SAP Hana is SAP’s latest offering in BI , it’s also a database and a computing environment , and using R and HANA together on the cloud can give major productivity gains in terms of both speed and analytical ability, as per preliminary use cases.
Ajay- What made the R language a fit for SAP HANA. Did you consider other languages? What is your view on Julia/Python/SPSS/SAS/Matlab languages
Blag- I think “R” is a must for SAP HANA. As the fastest database in the market, we needed a language that could help us shape the data in the best possible way. “R” filled that purpose very well. Right now, “R” is not the only language as “L” can be used as well (
http://wiki.tcl.tk/17068
) …not forgetting “SQLScript” which is our own version of SQL (
http://goo.gl/x3bwh
) . I have to admit that I tried Julia, but couldn’t manage to make it work. Regarding Python, it’s an interesting question as I’m going to blog about Python and SAP HANA soon. About Matlab, SPSS and SAS I haven’t used them, so I got nothing to say there.
Ajay- What is your view on some of the limitations of R that can be overcome with using it with SAP HANA.
Blag- I think mostly the ability of SAP HANA to work with big data. Again, SAP HANA and “R” can work very nicely together and achieve things that weren’t possible before.
Ajay- Have you considered other vendors of R including working with RStudio, Revolution Analytics, and even Oracle R Enterprise.
Blag- I’m not really part of the SAP HANA or the R groups inside SAP, so I can’t really comment on that. I can only say that I use RStudio every time I need to do something with R. Regarding Oracle…I don’t think so…but they can use any of our products whenever they want.
Ajay- Do you have a case study on an actual usage of R with SAP HANA that led to great results.
Blag- Right now the use of “R” and SAP HANA is very preliminary, I don’t think many people has start working on it…but as an example that it works, you can check this awesome blog entry from my friend Jitender Aswani “Big Data, R and HANA: Analyze 200 Million Data Points and Later Visualize Using Google Maps “ (
http://allthingsr.blogspot.com/#!/2012/04/big-data-r-and-hana-analyze-200-million.html
)
Ajay- Does your group in SAP plan to give to the R ecosystem by attending conferences like UseR 2012, sponsoring meets, or package development etc
Blag- My group is in charge of everything developers, so sure, we’re planning to get more in touch with R developers and their ecosystem. Not sure how we’re going to deal with it, but at least I’m going to get myself involved in the Montreal R Group.
About-
http://scn.sap.com/people/alvaro.tejadagalindo3
| Name: | Alvaro Tejada Galindo |
| Email: | a.tejada.galindo@sap.com |
| Profession: | Development |
| Company: | SAP Canada Labs-Montreal |
| Town/City: | Montreal |
| Country: | Canada |
| Instant Messaging Type: | |
| Instant Messaging ID: | Blag |
| Personal URL: | http://blagrants.blogspot.com |
| Professional Blog URL: | http://www.sdn.sap.com/irj/scn/weblogs?blog=/pub/u/252210910 |
| My Relation to SAP: | employee |
| Short Bio: | Development Expert for the Technology Innovation and Developer Experience team.Used to be an ABAP Consultant for the last 11 years. Addicted to programming since 1997. |
http://www.sap.com/solutions/technology/in-memory-computing-platform/hana/overview/index.epx
and from
http://en.wikipedia.org/wiki/SAP_HANA
SAP HANA is SAP AG’s implementation of in-memory database technology. There are four components within the software group:[1]
- SAP HANA DB (or HANA DB) refers to the database technology itself,
- SAP HANA Studio refers to the suite of tools provided by SAP for modeling,
- SAP HANA Appliance refers to HANA DB as delivered on partner certified hardware (see below) as anappliance. It also includes the modeling tools from HANA Studio as well replication and data transformation tools to move data into HANA DB,[2]
- SAP HANA Application Cloud refers to the cloud based infrastructure for delivery of applications (typically existing SAP applications rewritten to run on HANA).
R is integrated in HANA DB via TCP/IP. HANA uses SQL-SHM, a shared memory-based data exchange to incorporate R’s vertical data structure. HANA also introduces R scripts equivalent to native database operations like join or aggregation.[20] HANA developers can write R scripts in SQL and the types are automatically converted in HANA. R scripts can be invoked with HANA tables as both input and output in the SQLScript. R environments need to be deployed to use R within SQLScript
More blog posts on using SAP and R together
Dealing with R and HANA
http://scn.sap.com/community/in-memory-business-data-management/blog/2011/11/28/dealing-with-r-and-hana
R meets HANA
http://scn.sap.com/community/in-memory-business-data-management/blog/2012/01/29/r-meets-hana
HANA meets R
http://scn.sap.com/community/in-memory-business-data-management/blog/2012/01/26/hana-meets-r
When SAP HANA met R – First kiss
http://scn.sap.com/community/developer-center/hana/blog/2012/05/21/when-sap-hana-met-r–first-kiss
Using RODBC with SAP HANA DB-
SAP HANA: My experiences on using SAP HANA with R
and of course the blog that started it all-
Jitender Aswani’s
http://allthingsr.blogspot.in/
Anonymous Operation India- Using Amazon AWS to go to PirateBay
The cyber -group known as Anonymous has now decided to fight for internet freedom for my 1.2 billion countrymen (India)
So in operation India they go and knock some websites off. The immediate provocation-
1) Legal System prevented access to Pirate Bay (and other sites)
This as per Anons restricts the freedom of glorious motherland of India (which incidentally does have a high number of engineers).
A slight modification to using violence (like DDOS) is to use non violence-this approach is use the free tier at Amazon EC2-
http://aws.amazon.com/free/
and sign up and start the windows tier
AWS Free Usage Tier (Per Month): ( only if your torrents are going to be less than 15 gb a month!!)
- 750 hours of Amazon EC2 Linux Micro Instance usage (613 MB of memory and 32-bit and 64-bit platform support) – enough hours to run continuously each month
* - 750 hours of Amazon EC2 Microsoft Windows Server Micro Instance usage (613 MB of memory and 32-bit and 64-bit platform support) – enough hours to run continuously each month
* - 750 hours of an Elastic Load Balancer plus 15 GB data processing*
- 30 GB of Amazon Elastic Block Storage, plus 2 million I/Os and 1 GB of snapshot storage
* - 5 GB of Amazon S3 standard storage, 20,000 Get Requests, and 2,000 Put Requests
* - 100 MB of storage, 5 units of write capacity, and 10 units of read capacity for Amazon DynamoDB.**
- 25 Amazon SimpleDB Machine Hours and 1 GB of Storage
** - 1,000 Amazon SWF workflow executions can be initiated for free. A total of 10,000 activity tasks, signals, timers and markers, and 30,000 workflow-days can also be used for free
** - 100,000 Requests of Amazon Simple Queue Service
** - 100,000 Requests, 100,000 HTTP notifications and 1,000 email notifications for Amazon Simple Notification Service
** - 10 Amazon Cloudwatch metrics, 10 alarms, and 1,000,000 API requests
** - 15 GB of bandwidth out aggregated across all AWS services
*
and get download speeds of 190 kb/ps to connect to Pirate Bay from the US !!
So you dont know Linux, huh (but do know how to Torrent). Well Amazon has a Windows instance for free too. Shame on you for not knowing Linux though! Illegal torrents hurt artists like Shahrukh Khan the most!!!
http://aws.amazon.com/windows/
How to create a Windows Amazon Instance
http://aws.amazon.com/resources/webinars/?vid=OLfmqcYnhUM&p=015041767CFA57C8
and to download your precious data (why?) from your remote instance to your local PC use these instructions.
1. Go to find the RDP file amazon asked you to downloaded onto your local PC. right-click –> Edit
2. Go to “Local Resources” tab –> “Local devices and resources” –> “More” button
3. Expand the “Drives” and check the disks you want to share when you TS to the remote box.
4. after connect, you will see the new drives in My Computer already mounted for you.
For me, copy speed is 200-300kB/Second. Enjoy!
or even easier
Installing dropbox on both your client machine and EC2 instance is one of the easiest ways to do it. (go to http://dropbox.com) or try the new Google Drive to share content.
–
As for Anonymous- DDOS attacks are easy, IRC press conferences are fun, but there are enough techies in India ,kids.
NOTE- You are liable legally for your actions whether on Amazon AWS or on your own laptop. This is just a technical note- not a moral note.
PS- I wonder if the Chinese can use this to access Facebook. Maybe it is time Anonymous got the guts to hit China for it’s unfree internet.
PPS- Message to Anons— Next time, try giving us a pdf tutorial on how to create an anonymized sql injection/ddos !
Custom T Shirt-
INDIA- Writing code since 3000 BC.
INDIA- We made the zero 0.




