AI Development:Tools and Techniques (Part II)

Share on facebook
Facebook
Share on google
Google+
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on pinterest
Pinterest
Share on email
Email

AI Development: Tools and Techniques has two parts:

  1. Part I
  2. Part II

This is the second part of the previous post I published last time on the tools and techniques for AI development that everyone would need to know. If you have not had the chance to read the first part, I would suggest that you may take a look at part I first. You can read it at this link.

In Part I, I talk about the big picture of this series and then introduce 5 different levels of tools and techniques you would need to be at least aware of:

  1. Operation Systems (OS)
  2. Programming languages
  3. Shell scripting languages
  4. Environment tools
  5. Integrated development environments (IDE)

In this part, I talk about the last two items: Environment tools and IDE. Generally, these tools collectively make it easier and faster for you to automate systematically your workflows, procedures, and routines and hence be productive in your projects. Part of productivity is the amount of joy and fun you have in every single moment of your work.

4. Environment Tools

So far we have seen the roles of OS, programming languages, and user interfaces for AI development. The main goal is to be able to use advanced tools to spend less time on configurations and setups and more time on core developments and experimentations. For that, we would need to equip ourselves with powerful tools and mechanisms. 

Python is very popular nowadays. Community has created tools for every single part of the Python development framework. One part is managing virtual environments. What is a virtual environment and why do we need one? I mostly try to address the two of three critical questions: what, why, and how. The how question is mostly your part to do. When we install an OS on our system, nothing is installed by nature there. So we would need to go ahead and install Python on our system then. We have a fresh system-wide Python environment in our system at this point. So far everything is fine. The problem arises when we want to install two different libraries or packages in our system that might have conflicting dependencies. This is when we wonder what to do. People have bumped into this already and started to find a solution for that. The solution is to use a virtual environment. As names implies, it creates a virtual environment in our system so then each environment would have separate and non-conflicting dependencies. For instance let’s say in one VE I would like to install Pytorch with Python 3.7 and at the same time I would need to install TensorFlow 1.5 with Python 2.7 in another VE. Using a VE management system I could simply handle the situation and run codes with that specification easily. 

The two most popular VE systems are the Python Virtual Environment (PVE) and Anaconda. Once you are in a VE, then you would need to be able to manage packages in there. You would need to install and uninstall packages and most importantly check for dependencies and install those with proper latest compatible versions. Package managers are defined to handle this for us and make our life easier so then we would be able to focus on the fun parts. Pip is the most common package manager while anaconda has one for itself. Depending on whether the system you are working with is your desktop system or a system on the cloud, you would need to choose one over the other. For this, I highly recommend learning to use both very well. I personally like to use Anaconda on my personal system but at work depending on the cluster system I use, I pick PVE+Pip in most cases. One issue the cluster management teams don’t like about anaconda is that it occupies a large amount of storage for each user while PVE+Pip is more minimal.

One important tool that is very handy to use especially when you are working with a remote system (like in Cloud or Clusters) is to be able to manage your remote SSH shells. The idea is that in order to be able to connect to an instance remotely, you would typically need to use SSH. SSH helps you open a new shell terminal in the remote instance and then start to interact with it. The SSH session is open as long as you keep the terminal on your local machine open. Once you close the shell terminal, the SSH session is closed and all the running applications are killed on the remote system. To handle the situation, You would need to use tools like Tmux. Tmux is a terminal multiplexer. It not only keeps sessions open in the background but also you can open multiple sessions in one shell terminal. This helps avoid dealing with multiple windows and easily switching between one to another. The idea is once you have Tmux in the remote instance, you can run code and scripts inside Tmux and easily close the SSH session while the processes are running in the background of the remote instance. The tool that sits on top of Tmux is Tmuxinator or Tmuxp is to run sessions in a particular layout as quickly as possible. It helps to automate the process as much as possible. I personally don’t use them as much as I use Tmux itself. I prefer to open sessions and run scripts manually myself.

5. Integrated Development Environments

Last but not least is Integrated Development Environments (IDE). IDEs are developed to integrate various development tools and automation scripts in one unified framework. They were very popular for languages like C/C++ and Java. Nowadays scripting languages like Python have changed this a bit. Some developers prefer lightweight and distributive tool sets while the others prefer unified and integrated ones. For the first group, code editors like Vim, Emacs, Atom, and VS Code are the editors they use and run code. While for the others PyCharm or Spyder are common to use. There is a new approach recently that has gained interest and that is browser IDEs like Jupyter Notebook and JupyterLab. They are, more than a different class of IDEs, are more a different approach to scripting. The idea is gradually as you develop your algorithm, you should be able to debug and run your code. This is more like coding in agile. Then stick together blocks of your working code into larger blocks, functions and classes. While in the other paradigm, you begin from the big picture and then gradually dive into smaller pieces.

As you can see, there are so many pieces in development and programming in general. These have been developed by people like us who started earlier and bumped into problems and issues. Then they started to think how to resolve them and make the framework more fun for others. The attitude of modestly sharing with others is brilliant in the open source community and I always admire that. You need to share if you want to get shared. 

To summarize, I presented the following options to setup your system for AI research and development:

  • OS: Linux (Ubuntu, Centos) – I3 WM
  • Programming language: Python
  • Shell scripting language: Zsh
  • Environment tools: Anaconda and Pip
  • IDE: Pycharm

As always, these are just options to get you started thinking about them. You definitely should be able to search and find those that best match your preferences and settings. I have learned that if you want to be a good researcher and developer in AI, you need to never stop learning. This is what makes this field so different from other fields not only in computer science but in other sciences and engineering disciplines. Let me know what you think about my preferences and share with me those you find interesting and useful yourself.

Share on facebook
Facebook
Share on google
Google+
Share on twitter
Twitter
Share on linkedin
LinkedIn
Share on pinterest
Pinterest
Share on email
Email

Leave a Reply