Using Numpy and Pandas on AWS Lambda

I am a big fan of AWS Lambda for running small stuff, but the libraries it has are fairly limited, and certainly not in the domain of any data science or statistical stuff.

So, instead of importing libraries from a remote repository (ie. pip) they have you compiling the libraries or dependencies offline and then preparing this zip (which includes your dependencies and code) to upload.

The problem is you must actually compile the libraries on a version of AWS Linux, not theirs or even your computer… So I thought I would post how to do that.

  1. You will have to fire up a version of AWS Linux in a t2.micro EC2 instance (it should be free to spin up).
  2. SSH into the instance using:
    ssh -i <your pem file> ec2-user@<instance ip>
  3. Once in, run the following as a prerequisite:
    sudo yum -y update
    sudo yum -y upgrade
    sudo yum -y groupinstall "Development Tools"
    sudo yum -y install blas --enablerepo=epel
    sudo yum -y install lapack --enablerepo=epel
    sudo yum -y install Cython --enablerepo=epel
    sudo yum install python27-devel python27-pip gcc
  4. Do the following to create and enter the virtualenv:
    virtualenv ~/env
    source ~/env/bin/activate

    The output from this step is very critical, it should tell you where your virtual Python env is located. You will need this location to copy out the pre-compiled versions of the libraries, and eventually to prepare your bundle.

    It should look like the following:

    [ec2-user@ip-xxxxxx ~]$ virtualenv ~/env
    New python executable in /home/ec2-user/env/bin/python2.7
    Also creating executable in /home/ec2-user/env/bin/python
    Installing setuptools, pip...done.
  5. Then, finally install the packages:
    sudo ~/env/bin/pip2.7 install numpy
    sudo ~/env/bin/pip2.7 install pandas
  6. Now, time to get out everything. For convenience, you will need to zip all the site-packages and dist-packages. To do this, run the following:
    cd /home/ec2-user/env/lib64/python2.7/site-packages/ && zip -r ~/site-packages.zip .
    
    cd /home/ec2-user/env/lib/python2.7/site-packages/ && zip -r ~/dist-packages.zip .
  7. Copy the two files back to your computer:
    scp -i xxxx.pem ec2-user@xxxxx:/dist-packages.zip .
    scp -i xxxx.pem ec2-user@xxxxx:/site-packages.zip .
    
  8. Unzip the two files, and create another directory that will hold your lambda python script that you would like to execute.
  9. Expand the newly unzipped site-packages and dist-packages folders, and copy these specific folders/ files out of both, and into the new directory:
    screen-shot-2016-11-24-at-8-11-54-pm
  10. Put your lambda python script into this folder. Note: your script file must be named lambda_function.py and name your main function lambda_handler.
  11. Now, if you put your imports at the top of your file (ie. numpy/ pandas) then your script should run properly.