Merge pull request #6 from dorodnic/patch-3

Update README.md
nohayassin · Sep 29, 2020 · 022ae6a · 022ae6a
2 parents b623c62 + f78bd75
commit 022ae6a
Showing 1 changed file with 180 additions and 38 deletions.
diff --git a/wrappers/tensorflow/README.md b/wrappers/tensorflow/README.md
@@ -1,36 +1,187 @@
-# Introduction
-# Installation
-To install all relevant python packages, run the following command by using the package manager pip :
+# TensorFlow with Intel RealSense Cameras
+
+## Introduction
+
+TensorFlow is extremely popular open source platform for machine learning. This tutorial series will highlight different ways TensorFlow-based machine learning can be applied with Intel RealSense Depth Cameras. We'll be using python as language of choice, but same concepts can be easily ported to other languages.
+
+## Installation
+
+We'll need the following components:
 
-	pip install -r "...\requirements.txt"
+1. `python 3.6` - [download page](https://www.python.org/downloads/release/python-360/). Version 3.6 was chosen due to its compatibility with components below
+2. `pyrealsense2` - on x86 Linux and Windows platforms can be installed by running `pip install pyrealsense2`. For additional installation instructions please see [official documentation](https://github.com/IntelRealSense/librealsense/tree/master/wrappers/python#installation). We'll be using `pyrealsense2` to communicate with the camera and fetch frames from the device.
+3. `numpy` - We'll be using Numpy for image storage and manipulation. Install via `pip install numpy`
+4. `opencv` - We'll be using OpenCV for loading and saving images, basic image processing, and inference in some examples. OpenCV can be installed via `pip install opencv-python`
+5. `tensorflow` - TensorFlow is the main focus of this set of tutorials. We'll be using TensorFlow version TODO, or TensorFlow-GPU version 2.2.0. We'll also be using a version of Keras library bundled inside TensorFlow installation. Keras is offering set of declarative APIs simplifying network declaration and improving readability. 
+
+> **Note on GPU Support**: In order to run TensorFlow with GPU acceleration on NVidia GPUs you need to install `tensorflow-gpu` python package and compatible versions of CUDA and cuDNN libraries. [List of compatible combinations](https://www.tensorflow.org/install/source_windows#gpu)
 
-Find requirements.txt in installation folder. If you have GPU, install Tensorflow with GPU:
+We assume you are already familiar with the basics of operating Intel RealSense devices in python. Please see [official documentation](https://github.com/IntelRealSense/librealsense/tree/development/wrappers/python#python-wrapper) for more information and code samples. 
 
-	pip install tensorflow-gpu
-	pip install keras
-### Versions
-	Tensorflow-gpu - 2.2.0
-	Keras - 2.4.3
+## Part 1 - Object Detection and Classification
+Intel RealSense Camera can be used for object detection and classification with TensorFlow like any other video source. [Example 1](example1%20-%20object%20detection.py) is showing standard object detection using TensorFlow and data from the RGB sensor. 
 
-# Examples
-Set of example showing the use of Tensorflow.
+In order to run this example, you will need model file. Please download and extract one of the models from [TensorFlow-Object-Detection-API](https://github.com/opencv/opencv/wiki/TensorFlow-Object-Detection-API#use-existing-config-file-for-your-model) page. We are using [Faster-RCNN Inception v2](https://arxiv.org/pdf/1611.10012.pdf) for this example (TODO: Check), but other networks can be easily swapped-in. Extracted `frozen_inference_graph.pb` is expected to be in the working directory when running the script. 
 
-## Example 1 
-explain how to download cocoset etc
+The code should be familiar with anyone who worked with TensorFlow before. We start by creating Graph object and loading it from file:
 
+```py
+# Load the Tensorflow model into memory.
+detection_graph = tf.Graph()
+with detection_graph.as_default():
+    od_graph_def = tf.compat.v1.GraphDef()
+    with tf.compat.v1.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
+        serialized_graph = fid.read()
+        od_graph_def.ParseFromString(serialized_graph)
+        tf.compat.v1.import_graph_def(od_graph_def, name='')
+    sess = tf.compat.v1.Session(graph=detection_graph)
+```
+
+Next, we will initialize the relevant input and output vectors needed in this sample:
+
+```py
+# Input tensor is the image
+image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
+# Output tensors are the detection boxes, scores, and classes
+# Each box represents a part of the image where a particular object was detected
+detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
+# Each score represents level of confidence for each of the objects.
+# The score is shown on the result image, together with the class label.
+detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
+detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
+
+# Number of objects detected
+num_detections = detection_graph.get_tensor_by_name('num_detections:0')
+```
+
+We'll start the camera by specifying the type of stream and its resolution:
 
-## Example 2
-like 1, for each object from RGB (ex 1 - bounding box) take its depth (take example from sergey)
+```py
+pipeline = rs.pipeline()
+config = rs.config()
+config.enable_stream(rs.stream.color, 1280, 720, rs.format.bgr8, 30)
+```
 
+Inside the main loop we will get color data from the camera and convert it into a NumPy array:
 
-## Example 3
-like #1 but with opencv
+```py
+frames = pipeline.wait_for_frames()
+color_frame = frames.get_color_frame()
+color_image = np.asanyarray(color_frame.get_data())
+```
 
+Next we can perform inference using our TensorFlow session:
 
-## Example 4
+```py
+(boxes, scores, classes, num) = sess.run([detection_boxes, detection_scores, detection_classes, num_detections],
+                                         feed_dict={image_tensor: image_expanded})
+```
+
+Finally we will assign random persistent color to each detection class and draw a bounding box around the object. We filter out low confidence predictions using `score` output. 
+
+```py
+for idx in range(int(num)):
+    class_ = classes[idx]
+    score = scores[idx]
+    box = boxes[idx]
+    if class_ not in colors_hash:
+        colors_hash[class_] = tuple(np.random.choice(range(256), size=3))
+    if score > 0.8:
+        left = box[1] * W
+        top = box[0] * H
+        right = box[3] * W
+        bottom = box[2] * H
+
+        width = right - left
+        height = bottom - top
+        bbox = (int(left), int(top), int(width), int(height))
+        p1 = (int(bbox[0]), int(bbox[1]))
+        p2 = (int(bbox[0] + bbox[2]), int(bbox[1] + bbox[3]))
+         # draw box
+        r, g, b = colors_hash[class_]
+        cv2.rectangle(color_image, p1, p2, (int(r), int(g), int(b)), 2, 1)
+```
+
+Expected output: 
+
+TODO
+
+## Part 2 - Augmenting output using Depth data
+
+Since Intel RealSense Cameras also offer per-pixel depth information, we can use this extra data to solve additional problems related to our detection and classification example. In [Example 2](example2%20-%20object%20distances.py) (TODO: Rename) we'll use color data to detect people and depth data to quickly estimate the height of each person.
+
+In this example we will configure depth stream in addition to color:
+```py
+config.enable_stream(rs.stream.depth, 848, 480, rs.format.z16, 30)
+```
+
+We'll also need `pointcloud` and `align` helper objects for depth data manipulation:
+
+```py
+aligned_stream = rs.align(rs.stream.color) # alignment between color and depth
+point_cloud = rs.pointcloud()
+```
+
+Inside the main loop we will first make sure depth data is aligned to color sensor viewport and next generate an array of XYZ coordinates instead of raw depth:
+```py
+frames = aligned_stream.process(frames)
+depth_frame = frames.get_depth_frame()
+points = point_cloud.calculate(depth_frame)
+verts = np.asanyarray(points.get_vertices()).view(np.float32).reshape(-1, W, 3)  # xyz
+```
+
+This allows us to query XYZ coordinates of each detected object and to seperate individual coordinates (in meters):
+
+```py
+obj_points = verts[int(bbox[1]):int(bbox[1] + bbox[3]), int(bbox[0]):int(bbox[0] + bbox[2])].reshape(-1, 3)
+zs = obj_points[:, 2]
+ys = obj_points[:, 1]
+```
+
+To avoid outliers we will delete any Y values corresponding to Z values far away from Z median. This makes sure the background has minimal interference with our calculation. 
+
+```py
+z = np.median(zs)
+ys = np.delete(ys, np.where((zs < z - 1) | (zs > z + 1)))
+```
+
+Assuming camera is horizontal, persons height can be approximated by its length in Y direction. This can be easily calculated using max and min Y:
+
+```py
+my = np.amin(ys, initial=1)
+My = np.amax(ys, initial=-1)
+height = (My - my)
+```
+
+Expected output: TODO
+
+## Part 3 - Deploying TensorFlow model using OpenCV
+
+While TensorFlow is convinient to install and use, it is not as convinient as OpenCV. OpenCV is ported to most platforms and is well optimised for various types of CPUs. It also comes with built-in DNN module capable of loading and using TensorFlow models without having TensorFlow (or its dependencies) installed.
+
+[Example 3](example3%20-%20opencv%20deploy.py) is functionally equal to Example 2, but instead of using TensorFlow APIs directly it is loading and running inference using OpenCV.
+
+In addition to the model file, you will need `pbtxt` file accompanying the model. This file can be found [at this link]([TensorFlow-Object-Detection-API](https://github.com/opencv/opencv/wiki/TensorFlow-Object-Detection-API#use-existing-config-file-for-your-model)).
+
+```py
+net = cv2.dnn.readNetFromTensorflow(r"frozen_inference_graph.pb", 
+                                    r"faster_rcnn_inception_v2_coco_2018_01_28.pbtxt")
+```
+
+After converting color image to NumPy array, inference can be done as follows:
+
+```py
+scaled_size = (int(W), int(H))
+net.setInput(cv2.dnn.blobFromImage(color_image, size=scaled_size, swapRB=True, crop=False))
+detections = net.forward()
+```
+
+Resulting `detections` array will capture all detections and associated information. 
+
+## Part 4 - Training on Depth data using TensorFlow
 
 #### Problem Statement
-To improve the functionality of robots and drones it is essential knowing the accurate depth of objects. In ds5 cameras, infrared is used to measure objects depth, and it is interrupted by environment noises created by waves (e.g electromagnetic, sound, heat, etc), which eventually impacts negatively the measured distance of objects. To eliminate those noise, we built denoise autoencoder using Unet network architecture. 
+In this tutorial we'll show how to train a network for depth denoising and hole filling. Please note this project is provided for education purposes and resulted depth is not claimed to be superior to camera output. Our goal is to document end-to-end process of developing new network using depth data as its input and output. 
 
 #### Unet Network Architecture
 Unet is a deep learning architecture commonly used in image segmentation, denoising and inpainting applications. For original paper please refer to [U-Net: Convolutional Networks for Biomedical Image Segmentation](https://arxiv.org/pdf/1505.04597.pdf).
@@ -46,7 +197,6 @@ Unet offers significant advantages compared to classic autoencoder architecture,
 ![foxdemo](images/Unet.PNG)
 ###### The image is taken from the article reffered above.
 
-
 In the left pathway of Unet, the number of filters (features) increase as we go down, it means that it becomes
 very good at detecting more and more features, the first few layers of a convolution network capture a very small semantic information and lower level
 features, as you go down these features become larger and larger, but when we throw away information the CNN
@@ -171,28 +321,20 @@ The output is a BAG file that could be opened by RealSense viewer.
 ![foxdemo](images/conver_to_bag.PNG)
 
 
-## Example 5:
-In this example, we show a prediction of live scense captured by ds5 camera.
+## Part 5 - Applying trained network to real data:
+[Example 5](https://github.com/nohayassin/librealsense/blob/tensorflow/wrappers/tensorflow/example5%20-%20denoise.py) is showing how to use trained network from Part 4 on live data from Intel RealSense Camera. It can be invoke as follows: 
 
-To make it work, connect a ds5 camera and prepare a trained Tensorflow model. 
-It could be the trained model from example #4 or any model from this path: \\ger\ec\proj\ha\RSG\SA_3DCam\Noha\Tensorflow\models
-Pass the selected model as argument to the tool:
-
-	python camera_simulation.py <path to the model>
+```py
+python camera_simulation.py <path to the model>
+```
 
-It will display original frame captured by the camera, and right to it a frame as predicted by the selected model.
+Expected output is the original frame and model prediction given it as an input. 
 
+(TODO: Better image, also what about IR??)
 
 ![foxdemo](images/camera_simulation.PNG)
 
-It shows also time statistics for each frame :
-
-![foxdemo](images/camera_simulation_performance.PNG)
-
-## Example 6
-How covert keras to frozen graph --> camera simulation as #5 but use only opencv
-
+## Conclusions
 
+This article is showing small number of examples for using deep learning together with Intel RealSense hardware. It is intended to be further extended and you are welcomed to propose enhancements and new code samples. You are also free to use provided sample code, dataset and model for research or commercial use, in compliance with Intel RealSense SDK 2.0 [License](https://github.com/IntelRealSense/librealsense/blob/master/LICENSE)
 
-https://cocodataset.org/#home 
-https://drive.google.com/file/d/1ZCnqb7OB5fkk4ba4lK2WZry3qu0RUZfZ/view?usp=sharing