NVIDIA · sameerz · Aug 28, 2020 · Aug 27, 2020 · Aug 27, 2020 · Aug 27, 2020
diff --git a/docs/FAQ.md b/docs/FAQ.md
@@ -27,8 +27,8 @@ these changes and release updates as quickly as possible.
 ### Which distributions are supported?
 
 The RAPIDS Accelerator for Apache Spark officially supports
-[Apache Spark](get-started/getting-started.md),
-[Databricks Runtime 7.0](get-started/getting-started-with-rapids-accelerator-on-databricks.md)
+[Apache Spark](get-started/getting-started-on-prem.md),
+[Databricks Runtime 7.0](get-started/getting-started-databricks.md)
 and [Google Cloud Dataproc](get-started/getting-started-gcp.md).
 Most distributions based off of Apache Spark 3.0.0 should work, but because the plugin replaces
 parts of the physical plan that Apache Spark considers to be internal the code for those plans
@@ -37,7 +37,7 @@ set up testing and validation on their distributions.
 
 ### What is the right hardware setup to run GPU accelerated Spark?
 
-Reference Architectures should be available around Q4 2020.
+Reference architectures should be available around Q4 2020.
 
 ### What CUDA versions are supported?
 
@@ -75,7 +75,7 @@ speedup, with a 4x speedup typical. We have seen as high as 100x in some specifi
 * Writing Parquet/ORC
 * Reading CSV
 * Transcoding (reading an input file and doing minimal processing before writing it out again,
-possibly in a different format, like CSV to parquet)
+possibly in a different format, like CSV to Parquet)
 
 ### Are there initialization costs?
 
@@ -114,8 +114,9 @@ Yes, DPP still works.  It might not be as efficient as it could be, and we are w
 
 ### Is Adaptive Query Execution (AQE) Supported?
 
-We are in the process of making sure AQE works. Some parts work now, but other parts require some
-changes to the internals of Spark, that we are working with the community to be able to support.
+In the 0.2 release, AQE is supported but all exchanges will default to the CPU.  As of the 0.3 
+release, running on Spark 3.0.1 and higher any operation that is supported on GPU will now stay on 
+the GPU when AQE is enabled. 
 
 ### Are cache and persist supported?
 
@@ -127,28 +128,28 @@ the Spark community on changes that would allow us to accelerate compression whe
 No, that is not currently supported. It would require much larger changes to Apache Spark to be able
 to support this.
 
-### Is pyspark supported?
+### Is PySpark supported?
 
 Yes
 
 ### Are the R APIs for Spark supported?
 
 Yes, but we don't actively test them.
 
-## Are the Java APIs for Spark supported?
+### Are the Java APIs for Spark supported?
 
 Yes, but we don't actively test them.
 
-## Are the Scala APIs for Spark supported?
+### Are the Scala APIs for Spark supported?
 
 Yes
 
-## Is the GPU needed on the driver?  Are there any benefits to having a GPU on the driver?
+### Is the GPU needed on the driver?  Are there any benefits to having a GPU on the driver?
 
 The GPU is not needed on the driver and there is no benefit to having one available on the driver
 for the RAPIDS plugin.
 
-## How does the performance compare to DataBricks' DeltaEngine?
+### How does the performance compare to DataBricks' DeltaEngine?
 
 We have not evaluated the performance yet. DeltaEngine is not open source, so any analysis needs to
 be done with Databricks in some form. When DeltaEngine is generally available and the terms of
@@ -186,7 +187,7 @@ for this issue.
 To fix it you can either disable the IOMMU, or you can disable using pinned memory by setting
 [spark.rapids.memory.pinnedPool.size](configs.md#memory.pinnedPool.size) to 0.
 
-# Is speculative execution supported?
+### Is speculative execution supported?
 
 Yes, speculative execution in Spark is fine with the RAPIDS accelerator plugin.
 
@@ -196,4 +197,4 @@ to see how often task speculation occurs and how often the speculating task (i.e
 later) finishes before the slow task that triggered speculation. If the speculating task often
 finishes first then that's good, it is working as intended. If many tasks are speculating, but the
 original task always finishes first then this is a pure loss, the speculation is adding load to
-the Spark cluster with no benefit.
+the Spark cluster with no benefit.
diff --git a/docs/compatibility.md b/docs/compatibility.md
@@ -177,7 +177,7 @@ For reads when `spark.sql.legacy.parquet.datetimeRebaseModeInWrite` is set to `C
 between the Julian and Gregorian calendars are wrong, but dates are fine. When 
 `spark.sql.legacy.parquet.datetimeRebaseModeInWrite` is set to `LEGACY`, however both dates and
 timestamps are read incorrectly before the Gregorian calendar transition as described
-[here]('https://github.com/NVIDIA/spark-rapids/issues/133).
+[here](https://github.com/NVIDIA/spark-rapids/issues/133).
 
 When writing `spark.sql.legacy.parquet.datetimeRebaseModeInWrite` is currently ignored as described
 [here](https://github.com/NVIDIA/spark-rapids/issues/144).
@@ -193,6 +193,13 @@ The plugin supports reading `uncompressed`, `snappy` and `gzip` Parquet files an
 fall back to the CPU when reading an unsupported compression format, and will error out 
 in that case. 
 
+## Regular Expressions
+The RAPIDS Accelerator for Apache Spark currently supports string literal matches, not wildcard 
+matches. 
+
+If a null char '\0' is in a string that is being matched by a regular expression, `LIKE` sees it as 
+the end of the string.  This will be fixed in a future release. The issue is [here](https://github.com/NVIDIA/spark-rapids/issues/119).
+
 ## Timestamps
 
 Spark stores timestamps internally relative to the JVM time zone.  Converting an
@@ -235,18 +242,18 @@ The following formats/patterns are supported on the GPU. Timezone of UTC is assu
 
 | Format or Pattern     | Supported on GPU? |
 | --------------------- | ----------------- |
-| `"yyyy"`              | Yes.              |
-| `"yyyy-[M]M"`         | Yes.              |
-| `"yyyy-[M]M "`        | Yes.              |
-| `"yyyy-[M]M-[d]d"`    | Yes.              |
-| `"yyyy-[M]M-[d]d "`   | Yes.              |
-| `"yyyy-[M]M-[d]d *"`  | Yes.              |
-| `"yyyy-[M]M-[d]d T*"` | Yes.              |
-| `"epoch"`             | Yes.              |
-| `"now"`               | Yes.              |
-| `"today"`             | Yes.              |
-| `"tomorrow"`          | Yes.              |
-| `"yesterday"`         | Yes.              |
+| `"yyyy"`              | Yes               |
+| `"yyyy-[M]M"`         | Yes               |
+| `"yyyy-[M]M "`        | Yes               |
+| `"yyyy-[M]M-[d]d"`    | Yes               |
+| `"yyyy-[M]M-[d]d "`   | Yes               |
+| `"yyyy-[M]M-[d]d *"`  | Yes               |
+| `"yyyy-[M]M-[d]d T*"` | Yes               |
+| `"epoch"`             | Yes               |
+| `"now"`               | Yes               |
+| `"today"`             | Yes               |
+| `"tomorrow"`          | Yes               |
+| `"yesterday"`         | Yes               |
 
 ## String to Timestamp
 
@@ -257,22 +264,22 @@ Casting from string to timestamp currently has the following limitations.
 
 | Format or Pattern                                                   | Supported on GPU? |
 | ------------------------------------------------------------------- | ------------------|
-| `"yyyy"`                                                            | Yes.              |
-| `"yyyy-[M]M"`                                                       | Yes.              |
-| `"yyyy-[M]M "`                                                      | Yes.              |
-| `"yyyy-[M]M-[d]d"`                                                  | Yes.              |
-| `"yyyy-[M]M-[d]d "`                                                 | Yes.              |
-| `"yyyy-[M]M-[d]dT[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]"` | Partial [1].      |
-| `"yyyy-[M]M-[d]d [h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]"` | Partial [1].      |
-| `"[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]"`                | Partial [1].      |
-| `"T[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]"`               | Partial [1].      |
-| `"epoch"`                                                           | Yes.              |
-| `"now"`                                                             | Yes.              |
-| `"today"`                                                           | Yes.              |
-| `"tomorrow"`                                                        | Yes.              |
-| `"yesterday"`                                                       | Yes.              |
-
-- [1] The timestamp portion must be complete in terms of hours, minutes, seconds, and
+| `"yyyy"`                                                            | Yes               |
+| `"yyyy-[M]M"`                                                       | Yes               |
+| `"yyyy-[M]M "`                                                      | Yes               |
+| `"yyyy-[M]M-[d]d"`                                                  | Yes               |
+| `"yyyy-[M]M-[d]d "`                                                 | Yes               |
+| `"yyyy-[M]M-[d]dT[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]"` | Partial [\[1\]](#Footnote1)       |
+| `"yyyy-[M]M-[d]d [h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]"` | Partial [\[1\]](#Footnote1)       |
+| `"[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]"`                | Partial [\[1\]](#Footnote1)       |
+| `"T[h]h:[m]m:[s]s.[ms][ms][ms][us][us][us][zone_id]"`               | Partial [\[1\]](#Footnote1)       |
+| `"epoch"`                                                           | Yes               |
+| `"now"`                                                             | Yes               |
+| `"today"`                                                           | Yes               |
+| `"tomorrow"`                                                        | Yes               |
+| `"yesterday"`                                                       | Yes               |
+
+- <a name="Footnote1"></a>[1] The timestamp portion must be complete in terms of hours, minutes, seconds, and
  milliseconds, with 2 digits each for hours, minutes, and seconds, and 6 digits for milliseconds. 
  Only timezone 'Z' (UTC) is supported. Casting unsupported formats will result in null values. 
 

diff --git a/docs/dev/README.md b/docs/dev/README.md
@@ -1,7 +1,7 @@
 ---
 layout: page
 title: Developer Overview
-nav_order: 8
+nav_order: 9
 has_children: true
 permalink: /developer-overview/
 ---

diff --git a/...-with-rapids-accelerator-on-databricks.md → ...get-started/getting-started-databricks.md b/...-with-rapids-accelerator-on-databricks.md → ...get-started/getting-started-databricks.md
diff --git a/docs/get-started/getting-started-gcp.md b/docs/get-started/getting-started-gcp.md
@@ -22,7 +22,10 @@ gcloud services enable storage-api.googleapis.com
 ``` 
 
 After the command line environment is setup, log in to your GCP account.  You can now create a Dataproc cluster with the configuration shown below.
-The configuration will allow users to run any of the [notebook demos](../demo/GCP) on GCP.  Alternatively, users can also start 2*2T4 worker nodes.
+The configuration will allow users to run any of the [notebook demos](https://github.com/NVIDIA/spark-rapids/tree/branch-0.2/docs/demo/GCP) on GCP.  Alternatively, users can also start 2*2T4 worker nodes.
+
+The script below will initialize with the following: 
+
 * [GPU Driver](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/gpu) and [RAPIDS Acclerator for Apache Spark](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/rapids) through initialization actions (the init action is only available in US region public buckets as of 2020-07-16)
 * One 8-core master node and 5 32-core worker nodes
 * Four NVIDIA T4 for each worker node
@@ -32,7 +35,7 @@ The configuration will allow users to run any of the [notebook demos](../demo/GC
 
 
 ```bash
-    export REGION=[Your Prefer GCP Region]
+    export REGION=[Your Preferred GCP Region]
     export GCS_BUCKET=[Your GCS Bucket]
     export CLUSTER_NAME=[Your Cluster Name]
     export NUM_GPUS=4
@@ -65,7 +68,7 @@ To use notebooks with a Dataproc cluster, click on the cluster name under the Da
 
 The notebook will first transcode CSV files into Parquet files and then run an ETL query to prepare the dataset for training.  In the sample notebook, we use 2016 data as the evaluation set and the rest as a training set, saving to respective GCS locations.  Using the default notebook configuration the first stage should take ~110 seconds (1/3 of CPU execution time with same config) and the second stage takes ~170 seconds (1/7 of CPU execution time with same config).  The notebook depends on the pre-compiled [Spark RAPIDS SQL plugin](https://mvnrepository.com/artifact/com.nvidia/rapids-4-spark-parent) and [cuDF](https://mvnrepository.com/artifact/ai.rapids/cudf/0.14), which are pre-downloaded by the GCP Dataproc [RAPIDS init script]().
 
-Once data is prepared, we use the [Mortgage XGBoost4j Scala Notebook](../demo/GCP/mortgage-xgboost4j-gpu-scala.zpln) in Dataproc's Zeppelin service to execute the training job on the GPU.  NVIDIA also ships [Spark XGBoost4j](https://github.com/NVIDIA/spark-xgboost) which is based on [DMLC xgboost](https://github.com/dmlc/xgboost).  Precompiled [XGBoost4j]() and [XGBoost4j Spark](https://repo1.maven.org/maven2/com/nvidia/xgboost4j-spark_3.0/1.0.0-0.1.0/) libraries can be downloaded from maven.  They are pre-downloaded by the GCP [RAPIDS init action](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/rapids).  Since github cannot render a Zeppelin notebook, we prepared a [Jupyter Notebook with Scala code](../demo/GCP/mortgage-xgboost4j-gpu-scala.ipynb) for you to view the code content. 
+Once data is prepared, we use the [Mortgage XGBoost4j Scala Notebook](../demo/GCP/mortgage-xgboost4j-gpu-scala.zpln) in Dataproc's Zeppelin service to execute the training job on the GPU.  NVIDIA also ships [Spark XGBoost4j](https://github.com/NVIDIA/spark-xgboost) which is based on [DMLC xgboost](https://github.com/dmlc/xgboost).  Precompiled [XGBoost4j](https://repo1.maven.org/maven2/com/nvidia/xgboost4j_3.0/) and [XGBoost4j Spark](https://repo1.maven.org/maven2/com/nvidia/xgboost4j-spark_3.0/1.0.0-0.1.0/) libraries can be downloaded from maven.  They are pre-downloaded by the GCP [RAPIDS init action](https://github.com/GoogleCloudDataproc/initialization-actions/tree/master/rapids).  Since github cannot render a Zeppelin notebook, we prepared a [Jupyter Notebook with Scala code](../demo/GCP/mortgage-xgboost4j-gpu-scala.ipynb) for you to view the code content. 
 
 The training time should be around 480 seconds (1/10 of CPU execution time with same config).  This is shown under cell:
 ```scala

diff --git a/docs/get-started/getting-started-menu.md b/docs/get-started/getting-started-menu.md