Interview: Hjálmar Gíslason, CEO of DataMarket.com

Here is an interview with Hjálmar Gíslason, CEO of Datamarket.com  . DataMarket is an active marketplace for structured data and statistics. Through powerful search and visual data exploration, DataMarket connects data seekers with data providers.

 

Ajay-  Describe your journey as an entrepreneur and techie in Iceland. What are the 10 things that surprised you most as a tech entrepreneur.

HG- DataMarket is my fourth tech start-up since at age 20 in 1996. The previous ones have been in gaming, mobile and web search. I come from a technical background but have been moving more and more to the business side over the years. I can still prototype, but I hope there isn’t a single line of my code in production!

Funny you should ask about the 10 things that have surprised me the most on this journey, as I gave a presentation – literally yesterday – titled: “9 things nobody told me about the start-up business”

They are:
* Do NOT generalize – especially not to begin with
* Prioritize – and find a work-flow that works for you
* Meet people – face to face
* You are a sales person – whether you like it or not
* Technology is not a product – it’s the entire experience
* Sell the current version – no matter how amazing the next one is
* Learn from mistakes – preferably others’
* Pick the right people – good people is not enough
* Tell a good story – but don’t make them up

I obviously elaborate on each of these points in the talk, but the points illustrate roughly some of the things I believe I’ve learned … so far 😉

9 things nobody told me about the start-up business

Ajay-

Both Amazon  and Google  have entered the public datasets space. Infochimps  has 14,000+ public datasets. The US has http://www.data.gov/

So clearly the space is both competitive and yet the demand for public data repositories is clearly under served still. 

How does DataMarket intend to address this market in a unique way to differentiate itself from others.

HG- DataMarket is about delivering business data to decision makers. We help data seekers find the data they need for planning and informed decision making, and data publishers reaching this audience. DataMarket.com is the meeting point, where data seekers can come to find the best available data, and data publishers can make their data available whether for free or for a fee. We’ve populated the site with a wealth of data from public sources such as the UN, Eurostat, World Bank, IMF and others, but there is also premium data that is only available to those that subscribe to and pay for the access. For example we resell the entire data offering from the EIU (Economist Intelligence Unit) (link: http://datamarket.com/data/list/?q=provider:eiu)

DataMarket.com allows all this data to be searched, visualized, compared and downloaded in a single place in a standard, unified manner.

We see many of these efforts not as competition, but as valuable potential sources of data for our offering, while others may be competing with parts of our proposition, such as easy access to the public data sets.

 

Ajay- What are your views on data confidentiality and access to data owned by Governments funded by tax payer money.

HG- My views are very simple: Any data that is gathered or created for taxpayers’ money should be open and free of charge unless higher priorities such as privacy or national security indicate otherwise.

Reflecting that, any data that is originally open and free of charge is still open and free of charge on DataMarket.com, just easier to find and work with.

Ajay-  How is the technology entrepreneurship and venture capital scene in Iceland. What things work and what things can be improved?

HG- The scene is quite vibrant, given the small community. Good teams with promising concepts have been able to get the funding they need to get started and test their footing internationally. When the rapid growth phase is reached outside funding may still be needed.

There are positive and negative things about any location. Among the good things about Iceland from the stand point of a technology start-up are highly skilled tech people and a relatively simple corporate environment. Among the bad things are a tiny local market, lack of skills in international sales and marketing and capital controls that were put in place after the crash of the Icelandic economy in 2008.

I’ve jokingly said that if a company is hot in the eyes of VCs it would get funding even if it was located in the jungles of Congo, while if they’re only lukewarm towards you, they will be looking for any excuse not to invest. Location can certainly be one of them, and in that case being close to the investor communities – physically – can be very important.

We’re opening up our sales and marketing offices in Boston as we speak. Not to be close to investors though, but to be close to our market and current customers.

Ajay- Describe your hobbies when you are not founding amazing tech startups.

HG- Most of my time is spent working – which happens to by my number one hobby.

It is still important to step away from it all every now and then to see things in perspective and come back with a clear mind.

I *love* traveling to exotic places. Me and my wife have done quite a lot of traveling in Africa and S-America: safari, scuba diving, skiing, enjoying nature. When at home I try to do some sports activities 3-4 times a week at least, and – recently – play with my now 8 month old son as much as I can.

About-

http://datamarket.com/p/about/team/

Management

Hjalmar GislasonHjálmar Gíslason, Founder and CEO: Hjalmar is a successful entrepreneur, founder of three startups in the gaming, mobile and web sectors since 1996. Prior to launching DataMarket, Hjalmar worked on new media and business development for companies in the Skipti Group (owners of Iceland Telecom) after their acquisition of his search startup – Spurl. Hjalmar offers a mix of business, strategy and technical expertise. DataMarket is based largely on his vision of the need for a global exchange for structured data.

hjalmar.gislason@datamarket.com

To know more, have a quick  look at  http://datamarket.com/

Using Google Adwords to target Vic Gundotra and Matt Cutts stochastically

Over the Christmas break, I created a Google Adwords campaign using the $100 credit generously given by Google. I did it using my alumni id, even though I have a perfectly normal gmail id. I guess if Google allows me to use the credit on any account- well I will take it. and so a free experiment was borne.

But whom to target -with Google- but Google itself. It seemed logical

So I created a campaign for the names of prominent Googlers  (from a list of Google + at https://plus.google.com/103399926392582289066/posts/LX4g7577DqD ) and limited the ad location to Mountain View, California.

NULL HYPOTHESIS- People who are googled a lot from within the office are either popular or just checking themselves.

Since Google’s privacy policy is great, has been now shown billions of times, well I guess what’s a little ad targetting between brother geeks. Right?

My ad was-

Hire Ajay Ohri
He is
Awesome
linkedin.com/in/ajayohri 

or see screenshot below.

Here are the results-88 clicks and 43000 impressions (and 83$ of Google’s own money)

clearly Vic Gundotra is googled a lot within Mountain View, California. Does He Google himself.

so is Matt Cutts. Does HE Google himself or does he get elves to help him.

to my disappointment not many people clicked my LI offer, I am still blogging

and there were few clicks on Marissa Myers. Why Google her when she is right down the corridor.

The null hypothesis is thus rejected. Also most clicks were from display and not from search.

I need to do something better to do with Christmas break this year. I still got a credit of 16$ left.

 

How to use Bit Torrents

I really liked the software Qbittorent available from http://www.qbittorrent.org/ I think bit torrents should be the default way of sharing huge content especially software downloads. For protecting intellectual property there should be much better codes and software keys than presently available.

The qBittorrent project aims to provide a Free Software alternative to µtorrent. Additionally, qBittorrent runs and provides the same features on all major platforms (Linux, Mac OS X, Windows, OS/2, FreeBSD).

qBittorrent is based on Qt4 toolkit and libtorrent-rasterbar.

qBittorrent v2 Features

  • Polished µTorrent-like User Interface
  • Well-integrated and extensible Search Engine
    • Simultaneous search in most famous BitTorrent search sites
    • Per-category-specific search requests (e.g. Books, Music, Movies)
  • All Bittorrent extensions
    • DHT, Peer Exchange, Full encryption, Magnet/BitComet URIs, …
  • Remote control through a Web user interface
    • Nearly identical to the regular UI, all in Ajax
  • Advanced control over trackers, peers and torrents
    • Torrents queueing and prioritizing
    • Torrent content selection and prioritizing
  • UPnP / NAT-PMP port forwarding support
  • Available in ~25 languages (Unicode support)
  • Torrent creation tool
  • Advanced RSS support with download filters (inc. regex)
  • Bandwidth scheduler
  • IP Filtering (eMule and PeerGuardian compatible)
  • IPv6 compliant
  • Sequential downloading (aka “Download in order”)
  • Available on most platforms: Linux, Mac OS X, Windows, OS/2, FreeBSD
So if you are new to Bit Torrents- here is a brief tutorial
Some terminology from

Tracker

tracker is a server that keeps track of which seeds and peers are in the swarm.

Seed

Seed is used to refer to a peer who has 100% of the data. When a leech obtains 100% of the data, that peer automatically becomes a Seed.

Peer

peer is one instance of a BitTorrent client running on a computer on the Internet to which other clients connect and transfer data.

Leech

leech is a term with two meanings. Primarily leech (or leeches) refer to a peer (or peers) who has a negative effect on the swarm by having a very poor share ratio (downloading much more than they upload, creating a ratio less than 1.0)
1) Download and install the software from  http://www.qbittorrent.org/
2) If you want to search for new files, you can use the nice search features in here
3) If you want to CREATE new bit torrents- go to Tools -Torrent Creator
4) For sharing content- just seed the torrent you just created. What is seeding – hey did you read the terminology in the beginning?
5) Additionally –
From

Trackers: Below are some popular public trackers. They are servers which help peers to communicate.

Here are some good trackers you can use:

 

http://open.tracker.thepiratebay.org/announce
http://www.torrent-downloads.to:2710/announce
http://denis.stalker.h3q.com:6969/announce
udp://denis.stalker.h3q.com:6969/announce
http://www.sumotracker.com/announce

and

Super-seeding

When a file is new, much time can be wasted because the seeding client might send the same file piece to many different peers, while other pieces have not yet been downloaded at all. Some clients, like ABCVuzeBitTornado, TorrentStorm, and µTorrent have a “super-seed” mode, where they try to only send out pieces that have never been sent out before, theoretically making the initial propagation of the file much faster. However the super-seeding becomes less effective and may even reduce performance compared to the normal “rarest first” model in cases where some peers have poor or limited connectivity. This mode is generally used only for a new torrent, or one which must be re-seeded because no other seeds are available.
Note- you use this tutorial and any or all steps at your own risk. I am not legally responsible for any mishaps you get into. Please be responsible while being an efficient bit tor renter. That means respecting individual property rights.

Interview Michal Kosinski , Concerto Web Based App using #Rstats

Here is an interview with Michal Kosinski , leader of the team that has created Concerto – a web based application using R. What is Concerto? As per http://www.psychometrics.cam.ac.uk/page/300/concerto-testing-platform.htm

Concerto is a web based, adaptive testing platform for creating and running rich, dynamic tests. It combines the flexibility of HTML presentation with the computing power of the R language, and the safety and performance of the MySQL database. It’s totally free for commercial and academic use, and it’s open source

Ajay-  Describe your career in science from high school to this point. What are the various stats platforms you have trained on- and what do you think about their comparative advantages and disadvantages?  

Michal- I started with maths, but quickly realized that I prefer social sciences – thus after one year, I switched to a psychology major and obtained my MSc in Social Psychology with a specialization in Consumer Behaviour. At that time I was mostly using SPSS – as it was the only statistical package that was taught to students in my department. Also, it was not too bad for small samples and the rather basic analyses I was performing at that time.

 

My more recent research performed during my Mphil course in Psychometrics at Cambridge University followed by my current PhD project in social networks and research work at Microsoft Research, requires significantly more powerful tools. Initially, I tried to squeeze as much as possible from SPSS/PASW by mastering the syntax language. SPSS was all I knew, though I reached its limits pretty quickly and was forced to switch to R. It was a pretty dreary experience at the start, switching from an unwieldy but familiar environment into an unwelcoming command line interface, but I’ve quickly realized how empowering and convenient this tool was.

 

I believe that a course in R should be obligatory for all students that are likely to come close to any data analysis in their careers. It is really empowering – once you got the basics you have the potential to use virtually any method there is, and automate most tasks related to analysing and processing data. It is also free and open-source – so you can use it wherever you work. Finally, it enables you to quickly and seamlessly migrate to other powerful environments such as Matlab, C, or Python.

Ajay- What was the motivation behind building Concerto?

Michal- We deal with a lot of online projects at the Psychometrics Centre – one of them attracted more than 7 million unique participants. We needed a powerful tool that would allow researchers and practitioners to conveniently build and deliver online tests.

Also, our relationships with the website designers and software engineers that worked on developing our tests were rather difficult. We had trouble successfully explaining our needs, each little change was implemented with a delay and at significant cost. Not to mention the difficulties with embedding some more advanced methods (such as adaptive testing) in our tests.

So we created a tool allowing us, psychometricians, to easily develop psychometric tests from scratch an publish them online. And all this without having to hire software developers.

Ajay -Why did you choose R as the background for Concerto? What other languages and platforms did you consider. Apart from Concerto, how else do you utilize R in your center, department and University?

Michal- R was a natural choice as it is open-source, free, and nicely integrates with a server environment. Also, we believe that it is becoming a universal statistical and data processing language in science. We put increasing emphasis on teaching R to our students and we hope that it will replace SPSS/PASW as a default statistical tool for social scientists.

Ajay -What all can Concerto do besides a computer adaptive test?

Michal- We did not plan it initially, but Concerto turned out to be extremely flexible. In a nutshell, it is a web interface to R engine with a built-in MySQL database and easy-to-use developer panel. It can be installed on both Windows and Unix systems and used over the network or locally.

Effectively, it can be used to build any kind of web application that requires a powerful and quickly deployable statistical engine. For instance, I envision an easy to use website (that could look a bit like SPSS) allowing students to analyse their data using a web browser alone (learning the underlying R code simultaneously). Also, the authors of R libraries (or anyone else) could use Concerto to build user-friendly web interfaces to their methods.

Finally, Concerto can be conveniently used to build simple non-adaptive tests and questionnaires. It might seem to be slightly less intuitive at first than popular questionnaire services (such us my favourite Survey Monkey), but has virtually unlimited flexibility when it comes to item format, test flow, feedback options, etc. Also, it’s free.

Ajay- How do you see the cloud computing paradigm growing? Do you think browser based computation is here to stay?

Michal – I believe that cloud infrastructure is the future. Dynamically sharing computational and network resources between online service providers has a great competitive advantage over traditional strategies to deal with network infrastructure. I am sure the security concerns will be resolved soon, finishing the transformation of the network infrastructure as we know it. On the other hand, however, I do not see a reason why client-side (or browser) processing of the information should cease to exist – I rather think that the border between the cloud and personal or local computer will continually dissolve.

About

Michal Kosinski is Director of Operations for The Psychometrics Centre and Leader of the e-Psychometrics Unit. He is also a research advisor to the Online Services and Advertising group at the Microsoft Research Cambridge, and a visiting lecturer at the Department of Mathematics in the University of Namur, Belgium. You can read more about him at http://www.michalkosinski.com/

You can read more about Concerto at http://code.google.com/p/concerto-platform/ and http://www.psychometrics.cam.ac.uk/page/300/concerto-testing-platform.htm

Facebook IPO- Do you feel lucky?

2 Jan 2011 dealbook.nytimes.com

Facebook has raised $500 million from Goldman Sachs and a Russian investor in a transaction that values the company at $50 billion

29 Jan 2011 -www.bloomberg.com-$82.9-billion

14 Jun 2011-CNBC———————-$100 billion

27 Jun 2011 -news.cnet.com———-$70 billion

27 Sep 2011-Venturebeat.com——-$82.5 billion

100 billion valuation divided by 1000 million subscribers

=100 $ net present value of ad profit (note if 80 billion valuation with 800 million subscribers it is the same)

=250 $ net present value of ad revenues (assuming 40 % profitability)

=2500 $ net present value of online purchases by Facebook ad clicking customer

(assuming advertisers dedicate 10% of revenue to advertising by Facebook)

and the lucky Russian Investor who invested at 50 billion valuation only to see it double in six months, where else has he inVested

http://nymag.com/daily/intel/2011/01/facebooks_russian_investor_hel.html

Digital Sky Technologies co-founder Yuri Milner, who co-invested in the Goldman-Facebook deal, enviably poised in the middle. DST has been investing early and aggressively in some of the biggest names in the tech bubble boom like Facebook (DST first invested in May 2009), Zynga (the company that makes Farmville and Cityville for Facebook), and Groupon (the dudes that just turned down Google’s $6 billion).

NOTE -Both groupon and Zynga IPO  investors lost money as they are now below IPO price.

http://openchannel.msnbc.msn.com/_news/2011/01/05/5771129-russian-facebook-investors-have-sparked-us-concerns

More on Digital Sky Tech and Yuri Milner and the free internet in Putin’s Russia

Digital Sky got particular attention because of its broad control of the Russian Internet. DNI noted that the company is “a dominant force in the Runet,” owning the most popular Websites in the former Soviet Union, including Russia, Ukraine, Kazakhstan, Georgia, and Armenia as well as others in the Czech Republic and Poland. By some estimates it reported “over 70 percent of all page views in the Russian-language Internet are on its companies’ Websites.”

 

 

From Wall Street Journal-

May 1, 2011

http://www.zdnet.com/blog/facebook/wsj-facebook-growth-exceeds-expectations-100-billion-valuation-justifiable/1306

Last month, a private-market transaction of 100,000 shares of Facebook Class B Common Stock priced at $32.00 apiece gave the website a valuation of $80 billion. Two months ago, Facebook was valued at $65 billion, when investment firm General Atlantic reportedly bought 0.1 percent of Facebook by purchasing roughly 2.5 million Facebook shares from former Facebook employees. Three months ago, Kleiner Perkins Caufield & Byers (KPCB) invested $38 million in Facebook, which was only worth 0.00073 percent of the social network, but still resulted in a valuation of $52 billion.

 

related-

http://techcrunch.com/2011/01/10/facebook-5/

 

Something is gotta give?

Go ahead and  Please. Buy Facebook Stock !

Do you feel lucky?

 

 

 

 

Note on Internet Privacy (Updated)and a note on DNSCrypt

I noticed the brouaha on Google’s privacy policy. I am afraid that social networks capture much more private information than search engines (even if they integrate my browser history, my social network, my emails, my search engine keywords) – I am still okay. All they are going to do is sell me better ads (maybe than just flood me with ads hoping to get a click). Of course Microsoft should take it one step forward and capture data from my desktop as well for better ads, that would really complete the curve. In any case , with the Patriot Act, most information is available to the Government anyway.

But it does make sense to have an easier to understand privacy policy, and one of my disappointments is the complete lack of visual appeal in such notices. Make things simple as possible, but no simpler, as Al-E said.

 

Privacy activists forget that ads run on models built on AGGREGATED data, and most models are scored automatically. Unless you do something really weird and fake like, chances are the data pertaining to you gets automatically collected, algorithmic-ally aggregated, then modeled and scored, and a corresponding ad to your score, or segment is shown to you. Probably no human eyes see raw data (but big G can clarify that)

 

( I also noticed Google gets a lot of free advice from bloggers. hey, if you were really good at giving advice to Google- they WILL hire you !)

on to another tool based (than legalese based approach to privacy)

I noticed tools like DNSCrypt increase internet security, so that all my integrated data goes straight to people I am okay with having it (ad sellers not governments!)

Unfortunately it is Mac Only, and I will wait for Windows or X based tools for a better review. I noticed some lag in updating these tools , so I can only guess that the boys of Baltimore have been there, so it is best used for home users alone.

 

Maybe they can find a chrome extension for DNS dummies.

http://www.opendns.com/technology/dnscrypt/

Why DNSCrypt is so significant

In the same way the SSL turns HTTP web traffic into HTTPS encrypted Web traffic, DNSCrypt turns regular DNS traffic into encrypted DNS traffic that is secure from eavesdropping and man-in-the-middle attacks.  It doesn’t require any changes to domain names or how they work, it simply provides a method for securely encrypting communication between our customers and our DNS servers in our data centers.  We know that claims alone don’t work in the security world, however, so we’ve opened up the source to our DNSCrypt code base and it’s available onGitHub.

DNSCrypt has the potential to be the most impactful advancement in Internet security since SSL, significantly improving every single Internet user’s online security and privacy.

and

http://dnscurve.org/crypto.html

The DNSCurve project adds link-level public-key protection to DNS packets. This page discusses the cryptographic tools used in DNSCurve.

Elliptic-curve cryptography

DNSCurve uses elliptic-curve cryptography, not RSA.

RSA is somewhat older than elliptic-curve cryptography: RSA was introduced in 1977, while elliptic-curve cryptography was introduced in 1985. However, RSA has shown many more weaknesses than elliptic-curve cryptography. RSA’s effective security level was dramatically reduced by the linear sieve in the late 1970s, by the quadratic sieve and ECM in the 1980s, and by the number-field sieve in the 1990s. For comparison, a few attacks have been developed against some rare elliptic curves having special algebraic structures, and the amount of computer power available to attackers has predictably increased, but typical elliptic curves require just as much computer power to break today as they required twenty years ago.

IEEE P1363 standardized elliptic-curve cryptography in the late 1990s, including a stringent list of security criteria for elliptic curves. NIST used the IEEE P1363 criteria to select fifteen specific elliptic curves at five different security levels. In 2005, NSA issued a new “Suite B” standard, recommending the NIST elliptic curves (at two specific security levels) for all public-key cryptography and withdrawing previous recommendations of RSA.

Some specific types of elliptic-curve cryptography are patented, but DNSCurve does not use any of those types of elliptic-curve cryptography.

 

%d bloggers like this: