Setting up python env

yum install python2.7
yum install lapack lapack-devel blas blas-devel

pip install virtualenv
cd ~/src/yourprojectdir
virtualenv -p `which python2.7` venv –distribute
source venv/bin/activate
vi requirements.txt
..add ipython, scipy and other dependencies

pip install -r requirements.txt



Install VirtualBox on Centos 6.5

I used this instruction:

1. Change to root User
su -
2. Install Fedora or RHEL Repo Files
cd /etc/yum.repos.d/
3. Update latest packages and check your kernel version
Update packages
yum update
Check that that you are running latest installed kernel version
Output of following commands version numbers should match:
rpm -qa kernel |sort -V |tail -n 1
uname -r
Note: If you got kernel update or run older kernel than newest installed then reboot:
4. Install following dependency packages
CentOS 6/5 and Red Hat (RHEL) 6/5 needs EPEL repository, install it with following command:
## CentOS 6 and RHEL 6 ##
rpm -Uvh

5. Install VirtualBox Latest Version 4.3 (currently 4.3.10)
yum install VirtualBox-4.3
This command create automatically vboxusers group and VirtualBox user must be member of that group.
This command also build needed kernel modules.
Rebuild kernel modules with following command:
service vboxdrv setup
6. Add VirtualBox User(s) to vboxusers Group
Replace user_name with your own user name or some another real user name.
usermod -a -G vboxusers user_name
7. Start VirtualBox
- Start from GUI Applications -> … -> VM

Then I downloaded ready to use VMImage:

It is valid for 30 days only. So I plan to buy licensed Win 8.1 if it’s stable and performance is ok on 8GB RAM. If not I am going to downgrade to 7 or Vista.

Dimensionality Reduction with PCA algorithm

There are a lot of good articles that describe theory of dimensionality reduction with various algorithms such as PCA. Some of them have really good examples (for instance this one:
However in order to apply and use it I want to develop intuition: what does it mean from a mathematical/machine standpoint to reduce 132342 dimensional space into let’s say 2D. After several hours of playing around with sklearn PCA implementation I’ve come up with following representation that shows 1st component of 2 dimensional space:

This is how a machine sees the data. On the left input non transformed 2 input data samples. On the right data samples projected to 2D and represented back into 132342D space for 1st component. Simply 1st element of array of 2 elements multiplied by 1st column of so called U matrix with 132342D elements in it.

As you see after data point is projected into 2D there is clear separation between different data point types, that can be used for further logistic regression algorithm.

Data Science course

101 course practical course on Big Data from Harvard by Hanspeter Pfister and Joe Blitzstein. I highly recommend it for juniors and mids. This course together with ML by Andrew Ng and AI by Sebastian Thurn and Peter Norvig will create solid base on further Intelligent Big Data processing.
Course home page

HW0 – Setup environment:
- Python Environment:

Play 2.0 on Heroku

Несколько простых шагов и ваш веб апликейшн созданный на базе Play 2.0 фреймворка в клауде Heroku.

Install Play2.0 framework
Sign up to Heroku and install toolbelt
Create Public/Private keys
Create new project play new
Create Procfile in root dir of your project with one single line (web: target/start -Dhttp.port=${PORT} ${JAVA_OPTS})
Commit to Git (git init, git add ., git commit -m “init”)
Login to Heroku
Create Cedar stack (heroku create –stack cedar)
Push changes to heroku (git push heroku master)
Scale(start) your first web dynos (heroku ps:scale web=1, heroku restart)


Design high frequency trading system

Approximation of Algo trading.

Within this post I will outline how high frequency trading systems (algorithmic trading system) are designed. the post is an extraction from a developer conference held in april 2010.

You can take it as a rough design guide for users looking for a way to build their own trading system or to learn how such systems are working. there are serveral posts covering the issue and each post goes deeper into the technical details of the system.

So what actually is high frequency trading? the definition varies from the person that describes it. all will agree to the following bullets:

  • piece of software running on a system to trade (buy/sell) certain asset classes
  • trading activity above the execution abilities of human possible
  • direct connected to a brokerage firm, a stock exchange or other trading network


По сути это сводится к определению оптимистического распределения приорных вероятностей по возможным вариантам среды и вынуждает агента на первых порах вести себя так, как если бы повсеместно были разбросаны замечательные познаграждения.