December 20, 2013

Apache BigTop (Hadoop ecosystem installation)

Installing all Hadoop and related software is a painful experience.  For development environment, it's probably the easiest to use BigTop.  The site explains it as:
    Bigtop is a project for the development of packaging and tests of the Apache Hadoop ecosystem.

    The primary goal of Bigtop is to build a community around the packaging and interoperability testing of Hadoop-related projects. This includes testing at various levels (packaging, platform, runtime, upgrade, etc...) developed by a community with a focus on the system as a whole, rather than individual projects.
Installing all the Hadoop and related software can take days and weeks -- Hadoop/Yarn, ZooKeeper, Pig, Hive, HBase, Mahout, Whirr, Oozie, Sqoop, Hue, Flume...etc.  And BigTop make it possible to install most of them very quickly (like about an hour or less).

You just need to make sure you to meet all the requirements and follow the steps correctly.  I've done this for CentOS 6.3 64-bit and worked fine. 

Requirements -
Install Steps -

The latest version is 0.7.0, so for the first step, I did this:

  1. sudo wget -O /etc/yum.repos.d/bigtop.repo
  2. sudo yum install hadoop\* flume-* mahout\* oozie\* whirr-* hive\* hue\*
I haven't tested this thoroughly, and it's for single-server environment setup.  For cluster setup, I'm still going to do manually -- I have to know how the systems are structured and what configuration files are involved and located anyway. For development, this seems to be the way to go for quick set up.

December 10, 2013

Bitcoin flow in realtime

You can watch bitcoin flow in realtime.  Pretty cool.  I wonder why CN is getting so much despite China bans it. - "watch the world currencies flow into BTC in realtime"

December 6, 2013

Opera 64-bit, v12.16 download

Get it from here:

ssh problem using Vagrant on Windows

Vagrant is very convenient tool for development tasks.  It's not just a virtualbox VM managmenet tool, but also it is a tool to reuse and share VMs.

If you are not sure what it is or why you should use it, please read these:

To use Vagrant, you need three applications:
  1. VirtualBox
  2. Vagrant
  3. PuTTY or Git for Windows to use SSH
When you install Vagrant on Windows and start up the VM, you'll face an issue -- there is no ssh client installed by default on Windows and you can't log onto the VM you just created.  This can be resolved by using PuTTY or Git's ssh.exe that it comes with.  Or, you can also use cygwin's ssh.  Another issue is that although you can simply use the default id/password, vagrant/vagrant, but to use private key with PuTTY, the private key file needs to be converted into PuTTY's private key format.

Here is an example:

C:\>mkdir vagrant
C:\>cd vagrant
C:\vagrant>vagrant init precise32
C:\vagrant>vagrant up
Bringing machine 'default' up with 'virtualbox' provider...
[default] Importing base box 'precise32'...
[default] Matching MAC address for NAT networking...
[default] Setting the name of the VM...
[default] Clearing any previously set forwarded ports...
[default] Creating shared folders metadata...
[default] Clearing any previously set network interfaces...
[default] Preparing network interfaces based on configuration...
[default] Forwarding ports...
[default] -- 22 => 2222 (adapter 1)
[default] Booting VM...
[default] Waiting for machine to boot. This may take a few minutes...
[default] Machine booted and ready!
[default] The guest additions on this VM do not match the installed version of
VirtualBox! In most cases this is fine, but in rare cases it can
cause things such as shared folders to not work properly. If you see
shared folder errors, please update the guest additions within the
virtual machine and reload your VM.

Guest Additions Version: 4.2.0
VirtualBox Version: 4.3
[default] Mounting shared folders...
[default] -- /vagrant

C:\vagrant>vagrant ssh
`ssh` executable not found in any directories in the %PATH% variable. Is an
SSH client installed? Try installing Cygwin, MinGW or Git, all of which
contain an SSH client. Or use the PuTTY SSH client with the following
authentication information shown below:

Port: 2222
Username: vagrant
Private key: C:/Users/kkim/.vagrant.d/insecure_private_key

As you see here, it can't find ssh.exe and here are a few solutions.  Please note that the default id/password is vagrant/vagrant:

  1. With Git
    • Add path to ssh.exe to PATH environment variable.   If you installed git for Windows from below link (see Download section below), it's here:
      C:\Program Files (x86)\Git\bin
      Add to the environment variable or set it as:
      C:\vagrant>set PATH=%PATH%;C:\Program Files (x86)\Git\bin
      then type,
      C:\vagrant>vagrant ssh
      Downside with this method is ssh runs within cmd window, and unless you use something like console 2, cmd window cannot be resized.  (You probably want to use console2.  It's a lot better than default cmd; it provides tab, and also cygwin, powershell can be used within it.)
  2. With PuTTY
    • Method (a), using PuTTY, ssh to with the default id/password.
    • Method (b), using PuTTY's PuTTYgen to convert above private key to putty's format, and use that to ssh to vagrant VM.
    • Method (c), use this plugin,
      Install it by issuing this command:
      C:\vagrant>vagrant plugin install vagrant-multi-putty
      Then enter,
      C:\vagrant>vagrant putty
      But you still need to have the path set correctly where putty.exe is:
      C:\vagrant>set PATH=%PATH%;C:\Program Files (x86)\PuTTY
      when you enter vagrant putty, it will spawn putty window and connect to the VM.
  3. Using Cygwin ssh
    $ ssh -p 2222 localhost -l vagrant
To suspend (saving state) the VM, do:
C:\vagrant>vagrant suspend

To destroy the VM:
C:\vagrant>vagrant destroy

My environment
  • Windows 7, 64-bit
  • Vagrant v1.3.5
  • VirtualBox v4.3.4

November 18, 2013

Open URL from Sublime Text

I keep a lot of text files to track progress/notes on projects with a lot of URLs.  For some notes, it has too many URLs and it became a hassle to copy&paste in a browser to check the page.  I found this handy tool for the purpose:

I tried on Mac and Windows -- and it works quite well. 

The installation steps are straight forward and well explained -- but I'm just entering the steps here as a note to myself.  You'll need GIT client for your OS to follow below step.

For Windows:
  1. Open Sublime Text
  2. Press CTRL+`
  3. Copy and paste (for Sublime Text 2) this (see
    import urllib2,os; pf='Package Control.sublime-package'; ipp = sublime.installed_packages_path(); os.makedirs( ipp ) if not os.path.exists(ipp) else None; urllib2.install_opener( urllib2.build_opener( urllib2.ProxyHandler( ))); open( os.path.join( ipp, pf), 'wb' ).write( urllib2.urlopen( '' +pf.replace( ' ','%20' )).read()); print( 'Please restart Sublime Text to finish installation')
  4. Restart Sublime Test
  5. Select from menu, Preferences -> Browser Packages.  This will open a folder -- for my case, it is "C:\Users\kkim\AppData\Roaming\Sublime Text 2\Packages"
  6. Open cmd in that folder (or move to that directory)
  7. Run git command (see the above open-url github page):
    git clone --branch st2
  8. Restart Sublime Text.
Use CTRL-u or ALT-double click on URL in text file to open the web page using default browser.

November 16, 2013

Trying ApacheDS

I only had a few times to use LDAP in the past, and now I need to explore some of the features in ApacheDS.  I'm putting some notes here on installation and using it.  The environment used here is CentOS for server, Windows 7 for client, both are 64 bits.

There are mainly two open source LDAP servers:

For installing Apahce DS on CentOS, download RPM from here:

Below is install/setup process:

-bash (1025) $ rpm -qpl apacheds-2.0.0-M15-x86_64.rpm
sudo ln -s /etc/init.d/apacheds-2.0.0_M15-default /etc/init.d/apacheds 
sudo chkconfig apacheds on
sudo service apacheds start

ApacheDS failed to start. In the log file, it showed that it can't locate java executable.  Modify /opt/apacheds-2.0.0_M15/conf/wrapper.conf
to set '' :

# Path to java executable

And use LDAP client to connect to port 10389, no authentication.  LDAP instance files are located at: /var/lib/apacheds-2.0.0_M15/

 LDAP client software

Microsoft Active Directory Explorer
Apache Directory Studio
LDAP Admin

Since I'm using ApacheDS, I want to try Apache Directory Studio here. Here is the screenshot from the site:

If you use this software and just want to test LDAP locally, there is no need to install separeate ApacheDS.  Studio comes with ApacheDS, and you can set up local LDAP servers.

Some readings

MS Strategy for Lightweight Directory Access Protocol (LDAP)

November 1, 2013

October 31, 2013

Setting up ElasticSearch on CentOS: plugin, run as a service

Setting up ElasticSearch is straight forward, but there are a few things to note.

CentOS on VM or a physical machine
- Assumptions: sshd is installed.

1. If it's fresh install of CentOS, you're accessing it remotely and you're sure it's in secure network (not directly connected to the internet), then you can turn off the firewall to access ElasticSearch's default port 9200.
$ sudo service iptables save
$ sudo service iptables stop
$ sudo chkconfig iptables off
$ sudo service ip6tables save
$ sudo service ip6tables stop
$ sudo chkconfig ip6tables off

2. Install ElasticSearch
- Download ElasticSearch.
- Uncompress the files on the CentOS server and put it under where you put optional 3rd party software.  I use /opt.

3. Install as a service
# generate wrapper for ElasticSearch
$ curl -L | tar -xz
$ mv *servicewrapper*/service /usr/local/share/elasticsearch/bin/
$ rm -Rf *servicewrapper*
$ sudo /usr/local/share/elasticsearch/bin/service/elasticsearch install

# Add it as a service
$ sudo chkconfig --add elasticsearch

# check if installed as service
$ sudo chkconfig --list | grep elasticsearch

4. Start it as a service
sudo service elasticsearch start

5. Install the plugins.  The instruction is here.

October 29, 2013


Logo for the PicoLisp programming language.png
I recently found picolisp, anothe lisp dialect - (  Its notable features are: small and fast.  But it's an interpreted environment, it's a "dialect", meaning slightly differnet from common lisp, and fixed point, no floating point support.

It is an MIT open source project and I've compiled it on CentOS 32 and 64 bit environments and also in cygwin.  It runs pretty quick, and I really like its simplicity.  I'm still learning it and so far it looks pretty good.

The author is extremly helpful and the mailing list members respond pretty quickly although the group is small.

Lisp + Emacs + Slime on Windows, Lisp Cabinet

LispBox is very much outdated.  Use Lisp Cabinet:


The small executable will download the packages I select and different lisp, and it's done.  Very convenient.

Neo4j Install on Windows 7 and GUI client

It's been a while since I played with Neo4j -- and this came up today in the discussion with my colleague, so I want to try again.

My main home development system is Windows 7 with lots of linux VMs -- and I have neo4j running on one of VPSs on CentOS, so for local development, I want to install on Windows.

Neo4j currnet version is 2.0.0M06 (community edition), and it has Windows 64 bit with installer.  Nice. (

Neo4j Manual is here:


Ian Robinson - What is a Graph Database? What is Neo4j? from Neo Technology on Vimeo.

And I also found this nice GUI client (windows only):


October 18, 2013

fish shell

I've been using bash shell for a long time.  I looked around for a better shell, and tried a few -- and decided to try fish shell. (

The compilation failed in my VPS running CentOS 6 64-bit for some reason ("missing ncurses" but all ncurses packages are installed), but found RPM from here,  This saves a lot of time of mucking around with the source code and make files.

fish user document is here,

To change the default shell, issue "chsh" command.

So far it's good.  Try it if you haven't.

October 2, 2013

[Windows] Check your computer time

Amazingly, my Win7 PC was 8 seconds behind.  Windows' default time sync frequency is 7 days.  Here is how to change the frequency:

  1. RedEdit this --> HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\w32Time\TimeProviders\NtpClient
  2. Click on "SpecialPollInterval" and modify.
  3. Change to "Decimal".  Default is 604800 seconds.
  4. I changed this to 28800 (8 hours).

July 27, 2013

OpenCL and Erlang

I've recently purchased used Tesla C2050 from ebay (new one is still very expensive for personal use) to experiment GPGPU.  I'm especially interested in use with Python, Java and Erlang.  I found and tested Python and Java bindings but for Erlang, there aren't many choices. 

For Erlang, I found this ( to be a good choice as it is more complete than other implementations. 

My home development machine is Windows 7 64bit -- I prefer CentOS for general development and Mac for general use (I have all three at both work and home), however for some devices and applications I use, I decided to use Windows.  Windows is actually very convenient when it comes to odd hardware.  Mac -- not supported or some workaround is necessary, and for Linux, too much tinkering is needed which I do not want to spend too much time but focus on actual development.
Anyways, for Tony Rogvall's library, the compiling didn't go well in my environment.  After spending several hours, I finally figured it out.  Again, this difficulty and steps only apply for my case:
  • Windows 7 pro, 64bit.
  • Nvidia CUDA 5 SDK installed. (CUDA 5 SDK comes with OpenCL, not a separate SDK)
    • H/W: NVidia Quadro 600 and Tesla C2050.
  • Erlang R15B02 (erts-5.9.2) 64bit.
  • Visual Studio 2010 Pro.
Other tools necessary:
  • rebar
  • Git
How to compile:
  1. open cmd
  2. "C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\vcvarsall.bat" amd64
  3. git clone
  5. update c_src\Makefile, adding "MCL=1"
  6. rebar compile
Step #2 is for VS2010 and 64bit only.  And step #4 is due to rebar and/or Makefile.  Step#5 is not necessary but Makefile says so.
Hope this helps someone facing the same problem.
Now it's time to actually explore OpenCL in Erlang!

UPDATE: the author confirmed that Makefile is not used.  Therefore step#5 is not necessary.

May 31, 2013

[Link] 10 Object Oriented Design Principles Java Programmer should know

Nice posting on OOD for Java.


  • Don't repeat yourself.
  • Encapsulate what changes.
  • Open closed design principle.
  • Single responsibility principle.
  • Dependency injection or inversion principle.
  • Favor composition over inheritance.
  • Liskov substitution principle.
  • Interface segregation principle.
  • Programming for interface not implementation.
  • Delegation principle.

May 28, 2013

Game: Karateka

Retro Karateka game revamped for iOS -- it was a shocking experience seeing Karateka for the first time back in 80's.  Its realistic animation amazed everyone back then.  I believe it was written by the same programmer who developed Prince of Persia.

Last year (2012), it was revamped for iOS, and also there is the classic Karateka for iOS as well:

New Karateka in iTune
Classic Karateka in iTune

I had the apps (new Karateka and the classic) installed and played with my son -- he's not too impressed with the classic version.  It's difficult and too much waiting time (just like real Apple II!) between selecting menus or between different screens (e.g. restarting the game).

And for the new version, my son loved it, but he finished the game in 1-2 hours.  It's way too simple and easy.  Its graphic is very nice and game itself is not too violent -- just martial art style fighting.  iTunes has 9+ rating.

May 27, 2013

Book: The Most Human Human: What Talking with Computers Teaches Us About What It Means to Be Alive

Amazon's Book Description:
Each year, the AI community convenes to administer the famous (and famously controversial) Turing test, pitting sophisticated software programs against humans to determine if a computer can “think.” The machine that most often fools the judges wins the Most Human Computer Award. But there is also a prize, strange and intriguing, for the “Most Human Human.”

Brian Christian—a young poet with degrees in computer science and philosophy—was chosen to participate in a recent competition. This playful, profound book is not only a testament to his efforts to be deemed more human than a computer, but also a rollicking exploration of what it means to be human in the first place.

* * *

I played with Eliza clone when I was a kid, and fascinated by it at first, but got quickly bored of its simple responses.  Nowadays, it's probably true that everyone has talked to a voice automated system or chat-bot for support on some company's web site.  And there are several chat-bot apps for mobile platforms.

As I study on various AI topics like NLP, ANN, etc, and got to know about Turing Test.  This book talks about Turing test and related thoughts, that makes (me) to think about languages, intelligence and meaning of communication more deeply and in different ways.

It is not a technical book with full of algorithms and mathematics.  It's an easy reading you can breeze through it quickly, but still gain a lot of insights and many interesting stories to be fascinated of, and thought provoking.

I really enjoyed the book and couldn't wait 'til get a chance to read it each day.  No wonder why this book got so many good reviews on Amazon.

May 26, 2013

Apple II game: Star Blazer

This was one of my favorite games when I was a kid, playing on Apple II.  I was so fascinated by the movement of dropping bombs and the chasing rockets.  I programmed in BASIC and assembly back then, and knew some graphics programming (bitmap, page switching, vector, a bit of 3D) on it -- I was amazed and puzzled how such smooth realistic bomb dropping and chasing rocket movement was programmed.

It was fun days -- things were simple and more fun programming trying to figure out how things were done -- algorithm and graphics wise.

May 25, 2013

Book: Handbook of Neuroevolution Through Erlang

What is Neuroevolution? (Wikipedia):
Neuroevolution, or neuro-evolution, is a form of machine learning that uses evolutionary algorithms to train artificial neural networks. It is useful for applications such as games and robot motor control, where it is easy to measure a network's performance at a task but difficult or impossible to create a syllabus of correct input-output pairs for use with a supervised learning algorithm. In the classification scheme for neural network learning these methods usually belong in the reinforcement learning category.

Amazon's Book Description:
Handbook of Neuroevolution Through Erlang presents both the theory behind, and the methodology of, developing a neuroevolutionary-based computational intelligence system using Erlang. With a foreword written by Joe Armstrong, this handbook offers an extensive tutorial for creating a state of the art Topology and Weight Evolving Artificial Neural Network (TWEANN) platform. In a step-by-step format, the reader is guided from a single simulated neuron to a complete system. By following these steps, the reader will be able to use novel technology to build a TWEANN system, which can be applied to Artificial Life simulation, and Forex trading. Because of Erlang’s architecture, it perfectly matches that of evolutionary and neurocomptational systems. As a programming language, it is a concurrent, message passing paradigm which allows the developers to make full use of the multi-core & multi-cpu systems. Handbook of Neuroevolution Through Erlang explains how to leverage Erlang’s features in the field of machine learning, and the system’s real world applications, ranging from algorithmic financial trading to artificial life and robotics.

* * *

Disclaimer: I'm not an AI expert -- it's just my personal interest.

When I first started to study Erlang, I was very excited that it is the language for ANN.  And while I reading/studying on GA/GP -- it seems natural that ANN and GA/GP should be emerge.

I wanted to find more on ANN in Erlang implementation but I only found very few resources and examples (may be I didn't try hard enough) -- unfortunately, they were all very rudimentary basic ANN implementation, more of a proof-of-concept.  I also wanted to find a way to combine ANN and GA/GP.  Then I found this book.  It felt like I struck a gold mine -- I ordered the book right away.  I'm still in the middle of it and I enjoy every minute of reading it.  I highly recommend this book.


Book: C++ Concurrency in Action

When I reviewed this book, C++11 wasn't available, and it was called C++0X then; and the development branch of GNU g++ didn't support all the features of the threading specification.  I  haven't dealt with C++ since then, so not sure how things are in 2013, but it was surely a good reading.  Just quick look at the Amazon now, it seems like the book is on demand and many people purchase it despite the high price.

Well, as a Java developer, Java surely made threading so darn easy.  (But still not easy to get it right!) :-)

May 24, 2013

Book: ElasticSearch Server

As far as I know, this is the only printed book on ElasticSearch.  I think Manning has an early access book on it too, but not published yet.  There are many blogs and articles, and ElasticSearch site has many documents too -- but I found the information is too scattered and needed a book, and this was the only one available anyway.  It is actually good and very helpful.  However, it seems that it was rushed to print and I hope the next edition of this book has more special cases and real examples.

ElasticSearch is very impressive software, and there are a lot of things it can do, and so many things can be configured, but no decent document on it (yet), and googling and reading documents available through out the net just takes too much time, and not organized, therefore it's confusing, especially after reading so many postings on the web.  The book helped a lot, but still, for your unique problems, it's better to interact with the community.

What is ElasticSearch? (Wikipedia):
ElasticSearch is a distributed, RESTful, free/open source search server based on Apache Lucene. It is developed by Shay Banon and is released under the terms of the Apache License. ElasticSearch is developed in Java.

May 23, 2013

Book: Web Application Architecture - Principles, Protocols and Practices

I've reviewed the Search section with Otis (co-author of Lucene in Action), and read the whole book -- I know the author personally, and he's one of the most intelligent and energetic person I've known.  The book covers most of the internet technologies and common web application architectures -- and any developers must read this book.

Book: Taming Text

When I was reviewing this book for Manning, I was quite surprised that this book covers all the things I and my colleagues were doing -- NLP, search, entity extraction, categorization, topics, etc...  And with great examples.  If you're working on one of those, you ought to get this book.

May 22, 2013

Book: Hadoop in Action

There are several books on Hadoop now, but this stands out that it's a good introduction book.  I reviewed this book for Manning publication before it was published.  It's a bit old book now -- so unless you can get this really cheap price, I would recommend get another recently published Hadoop book. :-)

Book: Lucene in Action

The popular open source search engines as Solr and ElasticSearch are based on Lucene.  And this is the only book on Lucene library -- and it's written quite well too.

Book: Erlang and OTP in Action

Erlang got a lot of attention until 2-3 years ago, but I noticed more demands on Erlang skill nowadays.  As usual, Manning's In-Action series books, this book introduces Erlang quite well.

What is Erlang? From Wikipedia:
Erlang is a general-purpose concurrent, garbage-collected programming language and runtime system. The sequential subset of Erlang is a functional language, with strict evaluation, single assignment, and dynamic typing. It was designed by Ericsson to support distributed, fault-tolerant, soft-real-time, non-stop applications. It supports hot swapping, so that code can be changed without stopping a system.

By its nature - being functional, distributed and actor based concurrency model, it is the way to do next level of parallel programming, IMO.

Book: HBase in Action

What is HBase?
"Apache HBase is the Hadoop database, a distributed, scalable, big data store."

HBase is an open source, non-relational, distributed database modeled after Google's BigTable and is written in Java. It is developed as part of Apache Software Foundation's Apache Hadoop project and runs on top of HDFS (Hadoop Distributed Filesystem), providing BigTable-like capabilities for Hadoop. That is, it provides a fault-tolerant way of storing large quantities of sparse data.

And this is a book on HBase.  :-)  I've reviewed this book for Manning, and found it's very easy to follow, like any other Manning's in-action series books.

May 21, 2013

Book: Hibernate Search

It was published in 2009, and a bit old – but regardless it is a good book to have and read.  It shows various techniques using search/lucene in practical ways.
I’ve reviewed this book for Manning publication, and I highly recommend this book.

May 19, 2013

Moving VirtualBox's VM files to a different location

Disclaimer: I use Oracle VirtualBox v4.12.2 (updated recently) on Windows 7 64bit.  This instruction may not apply to other version of VirtualBox.

I recently had to move my VM files to a different drive as I'm running out of space on my C: drive.  When I google'd on moving VM files to different location, I only found methods for older version of VirtualBox, involving XML file, which doesn't apply to the version I have.  After playing around with it a bit, I figured out how.

Change the default location of VM files

VirtualBox's default configuration is to create VM files in C: drive -- and this may not be the desirable location.  If no VM has been created yet, the default location can be changed by going to: File -> Preferences (Ctrl-G), and VIrtualBox-Settings dialog box will pop up.  Change the "Default Machine Folder" setting.



Move existing VM files to different location

If a VM is already created -- one caveat is the drive file(s) can be moved, but log files and setting files will still be located where they are created (with this method).  This method doesn't modify any registry or XML files but just using the VirtualBox Manager UI.

  1. Shutdown the guest VM.
  2. Copy the VM folder to a different drive or directory.  E.g. I copied the whole directory "C:\Users\kkim\VirtualBox VMs\CentOS-1" to "G:\VirtualBox VMs\CentOS-1". 
  3. From the VM Virtual Manger go to File -> Virtual Media Manager.   (a) Right click on the drive of the VM you want to move, and select "Release".  The entry should be still there.  (b) Right mouse click on it again, and select "remove".  Since the files were already copied, it is safe to delete the file, but it's not necessary.

  4. From Virtual Manager GUI, right mouse click on the VM, select "settings."  Then Click on General, and "Advanced" tab.  And change the Snapshot folder location to a different VM file location:
  5. Click on Storage and click on "Add Hard Disk" icon next to "Controller: SATA", click on "Choose existing disk":
  6. Select the VM disk file you moved, and start the VM. 


Pretty simple.

March 14, 2013

ElasticSearch Admin UI plug-ins

Tried a few of them and found these are the most useful ones:

How to install:
$ cd /opt/elasticsearch
$ bin/plugin -install mobz/elasticsearch-head
$ bin/plugin -install lukas-vlcek/bigdesk

After install, go to: