Month: December 2017
Installing xgboost in Windows 10 for Python
!pip install numpy scipy scikit-learn pandas
!pip install deap update_checker tqdm stopit
C:\Users\KOGENTIX>git clone –recursive https://github.com/dmlc/xgboost
Download DLL from http://www.picnet.com.au/blogs/guido/post/2016/09/22/xgboost-windows-x64-binaries-for-download/
and put it in xgboost/python-package folder
Change Environment Variables so it finds xgboost dll
3 Python Libraries to Watch
PyTorch – Deep Learning http://pytorch.org/about/
TPot -Automated Machine Learning https://rhiever.github.io/tpot/
projx -Network Analysis http://davebshow.github.io/projx/getting-started/
also see https://neo4j.com/developer/python/
How AI will be the future of e-commerce
E-Commerce or electronic commerce has grown rapidly in the past decade, leveraging the internet to deliver a wide variety of goods and services. These include players like Amazon, Flipkart and Alibaba that sell a wide variety of products, or players like Pepper Fry that sells furniture.Eventually electronic commerce is supposed to eclipse the traditional brick and mortar enterprises.
E-commerce is efficient in multiple ways. It can save on inventory by using warehouses for dispatch and logistics instead of storing in showrooms. They can use the data captured by online analytics software to better forecast demand of certain stock keeping units (sku’s). Lastly they can offer room for faster experimentation in interfaces including things like recommendation engines (- i.e those who bought this book also bought these other books). The online data captured from customer clickstream can be used to refine pricing and discounts which are critical in a very competitive market.
Ecommerce and Big Data
A large number of customers come to electronic commerce site every day, every hour. They click on certain links, follow certain pages, post reviews, view (but don’t purchase), and finally purchase items. This continuous stream of data, called click-stream adds up to really big numbers of volume and velocity of data, with the different behaviors creating huge variety as well ( i.e some customers view a page and buy, some view twenty pages to buy). This data is like crude oil, it needs to be refined for business to take action on the insights.
Cosnider for example association analysis/ or recommendation engines. Past data will be a huge sparse matrix ( a matrix where most data is 0) where column headers will be huge variety of goods (lets say book titles). By looking at the various book titles that sell well together, the final book page will have a section ( people who viewed this also viewed that or people who bought this also bought that). This in turn will trigger impulse purchases by future customers.
The math behind this is simple, but it cannot be done on static data, it has to be done on rapidly changing data. Thus it will be needing both big data, machine learning and automated interface design. In addition we can do A/B testing on the interface to make it in sync with customer flow.
Behind the scenes data will be in distributed manner on Hbase, and MySQL, using map/reduce and spark to process, and using MLlib and R and scikit-learn for Machine Learning. Big data also helps identify problem pages, where either search rank is low, or where there is high bounce rate (customers leave soon after reaching page).
E-commerce and Analytics and Machine Learning
Just like the association analysis example above, e-commerce uses analytics in a wide variety of way. The following are the ways e-commerce uses analytics
- Inventory Forecasts- this uses prior data and calculates future purchases. Mostly it uses time series data but for probability of purchase/non-purchase it can use classification algorithms like Naive Bayes as well.
- Logistics Optimization- Instead of having showrooms, eCommerce has websites but it does have warehouses. Time to deliver goods in a quality manner is critical to brand reputation and customer loyalty. This leads to optimization of routes for trucks and delivery boys, a modern application of the classification transportation problem in operation research.
- Analysis of customer- mouse(heatmap) or tracing eyeballs to improve web page interface. (see heatmap of a webpage from https://blog.kissmetrics.com/eye-tracking-studies/ )
- Dynamic pricing (or discounts)- Ecommerce depends on discounts and promotions. It takes less than a minute for a customer to open a rival ecommerce site and compare prices (something which was not the case in traditional brick and mortar). By using algorithms for dynamic pricing based on customer behaviour (tracking done by cookies) , ecommerce tries to twek profits. Too much discount and profit is lost. Too less discount and potential customer is lost. So dynanic pricing based on prior behaviour is the key. This can be done using linear models like regressors.
- Classification of large amount of images of stock keeping units(SKUs) , generating tags for a wide amount of data (say a few tags for a computer followed by read more product info) and the webpage accordingly. This can be done using deep learning as well as topic modeling for tags.
- Search results page for different keywords to give the most relevant result- This can use Big Data technologies like Solr
- Classifying reviews into spam/not spam and looking at sentiment – using Text Mining
- Classifying sellers and discarding sellers who are supplying low quality products using reviews.
E-commerce and AI
Both Analytics and Machine Learning are subsets of Artificial Intelligence. Using AI we can build better discounts (by using xgboost regression rather than OLS regression), better web pages (faster A/B testing or using eyeball studies), better routes to travel for trucks, delivery boys (and even drones), better prediction of customer needs (and showing it in prompts). The key thrust will be of course on deep learning and tensor flow sitting on top of Hadoop big data for better eCommerce insights. Indeed the company with the best AI system will dwarf the competition since everything else in the eCommerce world can be copied easily apart from AI based analytics.m at
(Created as part of a blogathon contest by Lymbyc, a company that creates Epoch, AI based platform, and a virtual data scientist at https://www.lymbyc.com/ )