Home » Posts tagged 'privacy'
Tag Archives: privacy
A neat technical innovation Proxmate is a browser plugin with a Chrome and Firefox version. It allows non US internet citizens to go to US sites , including Google’s Play Store, Spotify, Turntable and others
It is very professionally designed and now being used quite a lot.
Great Work by Dave Mohl at http://proxmate.dave.cx/
I wish the same principle could be applied to create a fork of Chromium /Firefox to mash up with the Tor do not track privacy software. Or if a fork is too much work- even a plugin
going being sponsored to a Government of India sponsored talk on Big Data Analytics at Bangalore on Friday the 13 th of July. If you are in Bangalore, India you may drop in for a dekko. Schedule and Abstracts (i am on page 7 out 9) .
Your tax payer money is hard at work- (hassi majak only if you are a desi. hassi to fassi.)
13 July 2012 (9.30 – 11.00 & 11.30 – 1.00)
Big Data Big Analytics
The talk will showcase using open source technologies in statistical computing for big data, namely the R programming language and its use cases in big data analysis. It will review case studies using the Amazon Cloud, custom packages in R for Big Data, tools like Revolution Analytics RevoScaleR package, as well as the newly launched SAP Hana used with R. We will also review Oracle R Enterprise. In addition we will show some case studies using BigML.com (using Clojure) , and approaches using PiCloud. In addition it will showcase some of Google APIs for Big Data Analysis.
Lastly we will talk on social media analysis ,national security use cases (i.e. cyber war) and privacy hazards of big data analytics.
Here is an interview with Hjálmar Gíslason, CEO of Datamarket.com . DataMarket is an active marketplace for structured data and statistics. Through powerful search and visual data exploration, DataMarket connects data seekers with data providers.
HG- DataMarket is my fourth tech start-up since at age 20 in 1996. The previous ones have been in gaming, mobile and web search. I come from a technical background but have been moving more and more to the business side over the years. I can still prototype, but I hope there isn’t a single line of my code in production!
Funny you should ask about the 10 things that have surprised me the most on this journey, as I gave a presentation – literally yesterday – titled: “9 things nobody told me about the start-up business”
* Do NOT generalize – especially not to begin with
* Prioritize – and ﬁnd a work-ﬂow that works for you
* Meet people – face to face
* You are a sales person – whether you like it or not
* Technology is not a product – it’s the entire experience
* Sell the current version – no matter how amazing the next one is
* Learn from mistakes – preferably others’
* Pick the right people – good people is not enough
* Tell a good story – but don’t make them up
I obviously elaborate on each of these points in the talk, but the points illustrate roughly some of the things I believe I’ve learned … so far
Both Amazon and Google have entered the public datasets space. Infochimps has 14,000+ public datasets. The US has http://www.data.gov/
So clearly the space is both competitive and yet the demand for public data repositories is clearly under served still.
How does DataMarket intend to address this market in a unique way to differentiate itself from others.
HG- DataMarket is about delivering business data to decision makers. We help data seekers find the data they need for planning and informed decision making, and data publishers reaching this audience. DataMarket.com is the meeting point, where data seekers can come to find the best available data, and data publishers can make their data available whether for free or for a fee. We’ve populated the site with a wealth of data from public sources such as the UN, Eurostat, World Bank, IMF and others, but there is also premium data that is only available to those that subscribe to and pay for the access. For example we resell the entire data offering from the EIU (Economist Intelligence Unit) (link: http://datamarket.com/data/list/?q=provider:eiu)
DataMarket.com allows all this data to be searched, visualized, compared and downloaded in a single place in a standard, unified manner.
We see many of these efforts not as competition, but as valuable potential sources of data for our offering, while others may be competing with parts of our proposition, such as easy access to the public data sets.
Ajay- What are your views on data confidentiality and access to data owned by Governments funded by tax payer money.
HG- My views are very simple: Any data that is gathered or created for taxpayers’ money should be open and free of charge unless higher priorities such as privacy or national security indicate otherwise.
Reflecting that, any data that is originally open and free of charge is still open and free of charge on DataMarket.com, just easier to find and work with.
HG- The scene is quite vibrant, given the small community. Good teams with promising concepts have been able to get the funding they need to get started and test their footing internationally. When the rapid growth phase is reached outside funding may still be needed.
There are positive and negative things about any location. Among the good things about Iceland from the stand point of a technology start-up are highly skilled tech people and a relatively simple corporate environment. Among the bad things are a tiny local market, lack of skills in international sales and marketing and capital controls that were put in place after the crash of the Icelandic economy in 2008.
I’ve jokingly said that if a company is hot in the eyes of VCs it would get funding even if it was located in the jungles of Congo, while if they’re only lukewarm towards you, they will be looking for any excuse not to invest. Location can certainly be one of them, and in that case being close to the investor communities – physically – can be very important.
We’re opening up our sales and marketing offices in Boston as we speak. Not to be close to investors though, but to be close to our market and current customers.
Ajay- Describe your hobbies when you are not founding amazing tech startups.
HG- Most of my time is spent working – which happens to by my number one hobby.
It is still important to step away from it all every now and then to see things in perspective and come back with a clear mind.
I *love* traveling to exotic places. Me and my wife have done quite a lot of traveling in Africa and S-America: safari, scuba diving, skiing, enjoying nature. When at home I try to do some sports activities 3-4 times a week at least, and – recently – play with my now 8 month old son as much as I can.
Hjálmar Gíslason, Founder and CEO: Hjalmar is a successful entrepreneur, founder of three startups in the gaming, mobile and web sectors since 1996. Prior to launching DataMarket, Hjalmar worked on new media and business development for companies in the Skipti Group (owners of Iceland Telecom) after their acquisition of his search startup – Spurl. Hjalmar offers a mix of business, strategy and technical expertise. DataMarket is based largely on his vision of the need for a global exchange for structured data.
To know more, have a quick look at http://datamarket.com/
Over the Christmas break, I created a Google Adwords campaign using the $100 credit generously given by Google. I did it using my alumni id, even though I have a perfectly normal gmail id. I guess if Google allows me to use the credit on any account- well I will take it. and so a free experiment was borne.
But whom to target -with Google- but Google itself. It seemed logical
So I created a campaign for the names of prominent Googlers (from a list of Google + at https://plus.google.com/103399926392582289066/posts/LX4g7577DqD ) and limited the ad location to Mountain View, California.
NULL HYPOTHESIS- People who are googled a lot from within the office are either popular or just checking themselves.
My ad was-
Hire Ajay Ohri
or see screenshot below.
Here are the results-88 clicks and 43000 impressions (and 83$ of Google’s own money)
clearly Vic Gundotra is googled a lot within Mountain View, California. Does He Google himself.
so is Matt Cutts. Does HE Google himself or does he get elves to help him.
to my disappointment not many people clicked my LI offer, I am still blogging
and there were few clicks on Marissa Myers. Why Google her when she is right down the corridor.
The null hypothesis is thus rejected. Also most clicks were from display and not from search.
I need to do something better to do with Christmas break this year. I still got a credit of 16$ left.
Part 1 in this series is avaiable at http://www.decisionstats.com/analytics-for-cyber-conflict/
The next articles in this series will cover-
- the kind of algorithms that are currently or being proposed for cyber conflict, as well as or detection
Cyber Conflict requires some basic elements of the following broad disciplines within Computer and Information Science (besides the obvious disciplines of heterogeneous database types for different kinds of data) -
1) Cryptography – particularly a cryptographic hash function that maximizes cost and time of the enemy trying to break it.
The ideal cryptographic hash function has four main or significant properties:
- it is easy (but not necessarily quick) to compute the hash value for any given message
- it is infeasible to generate a message that has a given hash
- it is infeasible to modify a message without changing the hash
- it is infeasible to find two different messages with the same hash
A commercial spin off is to use this to anonymized all customer data stored in any database, such that no database (or data table) that is breached contains personally identifiable information. For example anonymizing the IP Addresses and DNS records with a mashup (embedded by default within all browsers) of Tor and MafiaaFire extensions can help create better information privacy on the internet.
This can also help in creating better encryption between Instant Messengers in Communication
2) Data Disaster Planning for Data Storage (but also simulations for breaches)- including using cloud computing, time sharing, or RAID for backing up data. Planning and creating an annual (?) exercise for a simulated cyber breach of confidential just like a cyber audit- similar to an annual accounting audit
3) Basic Data Reduction Algorithms for visualizing large amounts of information. This can include
- K Means Clustering, http://www.jstor.org/pss/2346830 , http://www.cs.ust.hk/~qyang/Teaching/537/Papers/huang98extensions.pdf , and http://stackoverflow.com/questions/6372397/k-means-with-really-large-matrix
- Topic Models (LDA) http://www.decisionstats.com/topic-models/,
- Social Network Analysis http://en.wikipedia.org/wiki/Social_network_analysis,
- Graph Analysis http://micans.org/mcl/ and http://www.ncbi.nlm.nih.gov/pubmed/19407357
- MapReduce and Parallelization algorithms for computational boosting http://www.slideshare.net/marin_dimitrov/large-scale-data-analysis-with-mapreduce-part-i
In the next article we will examine
- the role of non state agents as well as state agents competing and cooperating,
- and what precautions can knowledge discovery in databases practitioners employ to avoid breaches of security, ethics, and regulation.