Skip to content

Latest commit

 

History

History
157 lines (107 loc) · 10.3 KB

63. Intro to Plotting.md

File metadata and controls

157 lines (107 loc) · 10.3 KB

Intro to Plotting

1. Intro

Plotting, in the realm of data visualization, refers to the graphical representation of data. It transforms raw data into visual elements like lines, bars, dots, and various other forms to make data more understandable and interpretable.

Why Plotting

Quick Data Interpretation: A well-constructed plot can convey trends, outliers, and patterns in data faster than raw data tables.

Decision Making: In business and research, decisions often rely on data. Visual representations help stakeholders understand data quickly and make informed decisions.

Storytelling: Plots can narrate a story. For instance, a line graph of a company's revenue over the years can tell a story of growth, stagnation, or decline.

Communication: Plots are a universal language. They can be understood across different domains, cultures, and even by those who might not be experts in the data's subject matter

Components of a Plot

Axes: Most plots have horizontal (x-axis) and vertical (y-axis) lines where data points are plotted. The axes are usually labeled to indicate the variables or scale.

Data Points: These are individual values represented on the plot. Their position, size, or color can indicate their actual value.

Legend: A guide that helps readers understand the symbols, colors, and lines used in the plot.

Title and Labels: These provide context. The title gives an overview of what the plot represents, while labels (usually on the axes) indicate what each axis measures.

Grid: A series of horizontal and vertical lines that help readers gauge the value of individual data points.

2. Types of Plots

The choice of plot often depends on the nature of the data and the insights one wishes to extract from it.

Common Type of Plot Description Use-case Example
Line Plots Represents data points in a series connected by straight line segments. Often used for time-series data to show trends over time. Stock prices over a month.
Bar Charts Uses rectangular bars to represent data. Comparing quantities of different categories. Sales of different products.
Histograms Similar to bar charts but for frequency distribution. Understanding the distribution of a dataset. Distribution of student grades.
Scatter Plots Uses dots for two different variables. Observing relationships between two variables. Relationship between age and salary.
Pie Charts Circular chart divided into slices. Showing percentage or proportional data. Market share of companies.
Box Plots Displays data distribution based on five summary statistics. Understanding data spread and spotting outliers. Exam score distribution.
Heatmaps Represents data in a matrix format with colors. Observing variations across two dimensions. Website click patterns.
Area Charts Similar to line plots but with filled area. Representing quantities through time. Website traffic over time.

3. Tools and Libraries

Before Python

Before the rise of Python-based plotting tools, JavaScript (and other languages) had been used for plotting, especially in the context of web-based visualizations. Here's a brief overview:

  • JavaScript Libraries

As JavaScript engines in browsers became more powerful and the language itself evolved, several JavaScript libraries emerged for creating web-based visualizations.

D3.js: Introduced in 2011, D3.js (Data-Driven Documents) quickly became one of the most popular libraries for creating complex, interactive data visualizations in web browsers. It allows direct manipulation of the Document Object Model (DOM) based on data and offers a high degree of customization.

  • Server-side Plotting

Before the rise of client-side plotting with JavaScript, many visualizations were generated on the server side. Tools like GNUplot, R's plotting functions, and others would generate static images (like PNGs) that would then be served to the web browser.

  • Desktop Applications

For non-web-based visualizations, there were (and still are) many desktop applications like Excel, MATLAB, and others that researchers and analysts used for plotting and data visualization.

Python

Python-based plotting tools have gained immense popularity over the past decade, and there are several reasons for this surge:

  • Rise of Python in Data Science: Python has become the de facto language for data science and machine learning. With libraries like NumPy, Pandas, and Scikit-learn, Python offers a comprehensive ecosystem for data analysis. As data visualization is a crucial part of data analysis, it was only natural for Python-based plotting tools to emerge and gain traction.

  • Ease of Use: Python is known for its simplicity and readability.

  • Integration with Data Libraries: Python plotting tools integrate seamlessly with data manipulation libraries like Pandas. This integration allows users to easily visualize data directly from data frames and other data structures.

  • Interactive Visualizations: With the rise of libraries like Plotly and Bokeh, Python users can create interactive web-based visualizations without needing to know JavaScript or other web technologies.

image

Feature/Aspect Matplotlib Seaborn Plotly
Overview Comprehensive library for creating a wide range of static and interactive visualizations. High-level interface built on Matplotlib for drawing attractive statistical graphics. Interactive graphing library with support for over 40 unique chart types.
Interactivity Basic interactivity with support for zooming and panning. Inherits interactivity from Matplotlib. Highly interactive with features like zoom, pan, hover tooltips, and more.
Styling Offers basic styling options. Comes with several built-in themes and color palettes for better aesthetics. Provides a modern look with built-in themes and styles.
Use Cases Scientific computing, academic research, and when high customizability is required. Statistical data visualization, especially when using Pandas DataFrames. Web-based interactive visualizations, dashboards, and applications.

Underlying Technologies

Attribute Matplotlib Seaborn Plotly
Backend Frameworks Supports multiple backends including GUI (Qt5, GTK3) and non-GUI (PDF, SVG). Inherits backends from Matplotlib. Web-based, primarily leveraging D3.js.
Renderer Uses Anti-Grain Geometry (AGG) library for high-quality rendering. Inherits renderer from Matplotlib (AGG). Primarily uses D3.js and WebGL for specific plot types.
Event Handling Has an event handling system for user interactions like mouse clicks and key presses. Inherits event handling from Matplotlib. Built-in interactivity features like zoom, pan, and hover tooltips.

The terms "backend" and "renderer" have distinct roles: The backend is responsible for the entire process of visualizing and displaying a figure or plot. It encompasses everything from handling user interactions (like mouse clicks and keyboard inputs) to the actual drawing of the plot. The renderer is a component of the backend that is specifically responsible for drawing the plot. It takes the abstract representation of graphical elements (like lines, shapes, and text) and translates them into pixels or vector graphics, depending on the output format.

Matplotlib's event handling allows for interactive visualizations by responding to user actions like mouse clicks and key presses. By using the mpl_connect method, specific events can be linked to custom callback functions, which are executed when the event occurs. In contrast, Plotly was built from the ground up for web-based interactivity. Its visualizations are inherently interactive, with built-in tools for zooming, panning, and hovering without the need for custom callbacks.

D3.js (Data-Driven Documents)

D3.js is a JavaScript library for producing dynamic, interactive data visualizations in web browsers.

  • Data Binding: D3 excels at binding data to Document Object Model (DOM)* elements, enabling the generation, update, and removal of visual elements based on data.

  • Flexibility: Unlike many visualization libraries that offer predefined charts, D3 provides the foundational building blocks, allowing for the creation of bespoke visualizations from scratch.

  • Interactivity: Built for the web, D3 visualizations are inherently interactive, allowing for custom user interactions and responsive designs.

Document Object Model (DOM)

<html>
    <head>
        <title>Page Title</title>
    </head>
    <body>
        <h1>Page Title</h1>
        <p>This is a really interesting paragraph.</p>
    </body>
</html>

The Document Object Model refers to the hierarchical structure of HTML. Each bracketed tag is an element, and we refer to elements’ relative relationships to each other in human terms: parent, child, sibling, ancestor, and descendant. In the HTML above, body is the parent element to both of its children, h1 and p (which are siblings to each other). All elements on the page are descendants of html.

Web browsers parse the DOM in order to make sense of a page’s content.

References

https://insights.stackoverflow.com/trends?tags=matplotlib%2Cseaborn%2Cplotly

ChatGPT 4 with Plugins

https://plotly.com/python/basic-charts/

https://en.wikipedia.org/wiki/D3.js

https://scottmurray.org/tutorials/d3/fundamentals