Install MySQL on CentOS

How to install MySQL

Install MySQL
yum install mysql-server mysql php-mysql

How to configure MySQL

Set the MySQL service to start on boot
chkconfig –levels 235 mysqld on
Start the MySQL service
service mysqld start
Log into MySQL
mysql -u root
Set the root user password for all local domains
SET PASSWORD FOR ‘root’@'localhost’ = PASSWORD(‘new-password’);
SET PASSWORD FOR ‘root’@'localhost.localdomain’ = PASSWORD(‘new-password’);
SET PASSWORD FOR ‘root’@’′ = PASSWORD(‘new-password’);
Drop the Any user
DROP USER ”@’localhost’;
DROP USER ”@’localhost.localdomain’;
Exit MySQL

Install Scala on CentOS

cd /tmp
sudo rpm -i scala-2.11.2.rpm

Scala files are now here
tar xvf scala-2.10.1.tgz
sudo mv scala-2.10.1 /usr/lib
sudo ln -s /usr/lib/scala-2.10.1 /usr/lib/scala
export PATH=$PATH:/usr/lib/scala/bin
scala -version

Setting up python env

yum install python2.7
yum install lapack lapack-devel blas blas-devel

pip install virtualenv
cd ~/src/yourprojectdir
virtualenv -p `which python2.7` venv –distribute
source venv/bin/activate
vi requirements.txt
..add ipython, scipy and other dependencies

pip install -r requirements.txt



Dimensionality Reduction with PCA algorithm

There are a lot of good articles that describe theory of dimensionality reduction with various algorithms such as PCA. Some of them have really good examples (for instance this one:
However in order to apply and use it I want to develop intuition: what does it mean from a mathematical/machine standpoint to reduce 132342 dimensional space into let’s say 2D. After several hours of playing around with sklearn PCA implementation I’ve come up with following representation that shows 1st component of 2 dimensional space:

This is how a machine sees the data. On the left input non transformed 2 input data samples. On the right data samples projected to 2D and represented back into 132342D space for 1st component. Simply 1st element of array of 2 elements multiplied by 1st column of so called U matrix with 132342D elements in it.

As you see after data point is projected into 2D there is clear separation between different data point types, that can be used for further logistic regression algorithm.

Data Science course

101 course practical course on Big Data from Harvard by Hanspeter Pfister and Joe Blitzstein. I highly recommend it for juniors and mids. This course together with ML by Andrew Ng and AI by Sebastian Thurn and Peter Norvig will create solid base on further Intelligent Big Data processing.
Course home page

HW0 – Setup environment:
- Python Environment:

Play 2.0 on Heroku

Несколько простых шагов и ваш веб апликейшн созданный на базе Play 2.0 фреймворка в клауде Heroku.

Install Play2.0 framework
Sign up to Heroku and install toolbelt
Create Public/Private keys
Create new project play new
Create Procfile in root dir of your project with one single line (web: target/start -Dhttp.port=${PORT} ${JAVA_OPTS})
Commit to Git (git init, git add ., git commit -m “init”)
Login to Heroku
Create Cedar stack (heroku create –stack cedar)
Push changes to heroku (git push heroku master)
Scale(start) your first web dynos (heroku ps:scale web=1, heroku restart)