Hello World CUDA program for Jetson TK1

This post is the first in a series of posts which I’ll be writing to provide an introduction to image processing and machine learning using C++ and CUDA parallel processing. Its main contents include unboxing of the development board, remote connection, building, debugging and profiling of a hello world CUDA program.

GPUs have become the go-to platform for accelerating machine learning for many applications, from image classification and natural language processing to robotics and UAVs. And the Jetson TK1 Embedded Dev Kit has become a must-have for mobile and embedded parallel computing due to the amazing level of performance packed into such a low-power board.

Unboxing and booting

The development kit comes pre-installed with Ubuntu, CUDA Toolkit, OpenGL 4.4 drivers, and support for the OpenCV, as stated in official getting started jetson. All you need to do, is to plug in the power cable to connect the monitor with HDMI, a keyboard and to push the power button on the board.  As a result, Ubuntu will start and will be ready for use immediately. The default credentials are ubuntu/ubuntu. Additionally, one can find several tutorials on youtube, which effectively guide users through this process. Here’s an example.

Installing JetPackTK1-1.2

Chances are that that you may buy a kit version with an outdated software. The quick fix solution to this problem is to reinstall it using this all-in-one installer JetPackTK1. The Jetson TK1 Development Pack (JetPack TK1) is an on-demand all-in-one package that bundles and installs all software tools required for NVIDIA® Tegra® K1 development on the NVIDIA Jetson TK1 platform (including flashing your Jetson TK1 with the latest OS images).

The steps for installation are as follows:

  • Download the installer on your Ubuntu 12 or 14, depending on where you plan to develop (not on the board). Newer versions of Ubuntu such as version 15 are currently not supported by the JetPack.
  • Connect your Ubuntu development computer to the board using the USB-microHDMI cable. Neither a network cable nor Monitor connection to the board is required for the installation.
  • Start the installation as described in the official tutorial.
  • Disconnect the USB-microHDMI cable since you will no longer need it for connecting to the board.

Remote connection

One approach to developing programs for Jetson TK1 it to connect the board to the network, then to build your program locally using the software installed by the JetPack and to build, deploy, debug and profile remotely.

The remote connection could be done this way:

  • Connect your board to your local network router or switch using a network cable.
  • Plug in the monitor and a keyboard to the board and connect to the board locally. Because the board has only one USB and one micro-USB you’ll most probably have to use Ubuntu only with the keyboard without a mouse.
  • Open the Desktop Sharing application and configure remote access. This will let you remotely connect using a VNC program. After this you should no longer need the monitor and the keyboard attached to the board.


  • Open a VNC client, like the default Remmina on your development Ubuntu computer  and configure a new remote connection. The default network name of the board is tegra-ubuntu, and username/password is ubuntu/ubuntu. You can configure them as shown in the screenshot below. Select the default configuration for the Advanced and SSH tabs.


This VNC remote connection is not a must-have to build programs with the tools provided by the JetPack, but it will certainly help you visualize what’s happening on your board much better. And similar to a VNC connection, you can a configure SSH connection using Remmina by only changing the protocol to SSH.

Writing, building, deploying, debugging and profiling the Hello World CUDA program

JetPack comes with several tools for building CUDA programs, one of them being NVIDIA® Nsight™ Eclipse Edition, a full-featured IDE powered by the Eclipse platform that provides an all-in-one integrated environment to edit, build, debug and profile CUDA-C applications.

Here are the steps to create a simple new CUDA project in sync mode. It is the same as the third option described in this article Remote Application Development

  • To synchronize CUDA projects between host and target systems, you need to configure git on both the host and the target systems using these commands.

  • Start the Nsight IDE, under Linux by executing the nsight shell command or /usr/local/cuda-6.5/bin/nsight if not found in the path.
  • Click on File->New->CUDA C/C++ Project.
  • Select Empty project and choose a Project name and optionally a Workspace location and click next.


  • Select Generate PTX Code 3.0 and GPU Code 3.2 and click next.
  • Configure the target as shown in the screenshot below to locally create code and to build (compile) and debug remotely. By setting a folder in /tmp, the files will be erased after each board restart.


  • Create a new source file main.cu . The file should be appended with a .cu extension to enable the nvcc compiler to split the gpu code from the cpu code.

  • Enter this sample source code which  1) declares an array with 96 integers, 2) copies the array into the GPU global memory, 3) launches a thread block of 96 threads and computes in parallel the cube of each element in the array, 4) copies the result array back into CPU memory and 6) prints the array.  The complete explanation of this example can be found in this online class Lesson 1 – Cubing Numbers Using CUDA.

  • Switch to remote build by clicking on the arrow near the debug icon, select Debug option under the ubuntu@tegra-ubuntu and click on the icon. The build should first connect to the board using SSH, then copy files using Git and finally build the files locally on the board. The output should be similar to this one.

  • To run the application remotely Right click on the project -> Run As -> Remote C/C++ Application. In the config window keep all defaults and finish. The application will execute on the board and the console should look like this.

  • To Debug the application click on it’s corresponding icon and select ‘Hello-World-CUDA on ubuntu_tegra-ubuntu’.


  • The perspective switches to Debugging, in places where the CPU code, memory, GPU code, memory and threads could be analysed.


  • To Profile the application click on the Profile button and select ‘Hello-World-CUDA on ubuntu_tegra-ubuntu’ Release version. This will start the NVIDIA Visual Profiler.


Git Source Code



Leave a Reply

Your email address will not be published.