Skip to content

rigogsilva/sqldf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 

Repository files navigation

sqldf

Library for quering DataFrames using SQL.

Overview

SQL DataFrame is a library used for querying multiple types of DataFrame using ANSI SQL syntax. DataFrames supported are:

  1. RAW: usually used so the user doesn’t need to have pandas installed. Example of a RAW "DataFrame":
list_of_dictionaries = [{'name': 'Rigo', 'age': 33}, {'name': 'Bruno', 'age': 33}]
  1. pandas DataFrame.

  2. pyspark DataFrame: sqldf uses pyspark DataFrame in the background to process big data. Note, that the DataFrame return will be a pyspark DataFrame

How to use sqldf

from sqldf import sqldf

# RAW DataFrame
inventory = [{'item': 'Banana', 'quantity': 33}, {'item': 'Apple', 'quantity': 2}]
orders = [{'order_number': 1, 'item': 'Banana', 'quantity': 10}, {'order_number': 2, 'item': 'Apple', 'quantity': 10}]

To select data from a DataFrame and also register a table in memory do the following:

print('Inventory:')
inventory_pyspark_df = sqldf.sql(
	"""
	SELECT item,
           quantity AS quantity_available
      FROM inventory_table
	""",
	inventory,
	table='inventory_table')
inventory_pyspark_df.show()

print('Orders:')
orders_pyspark_df = sqldf.sql(
	"""
	SELECT order_number,
           item,
           quantity AS quantity_ordered
      FROM order_table
	""",
	orders,
	table='order_table')
orders_pyspark_df.show()

Since the table has been specified above, the table will be registered in memory. The next time you want to select data from the table jut do the following (Note that we don't specify table or DataFrame):

# Get inventory below quantity of 10 so we can order more of these items.
print('Items low in quantity:')
inventory_low = sqldf.sql(
	"""
	SELECT item,
           quantity AS quantity_low
      FROM inventory_table
     WHERE quantity < {{ quantity }}
	""",
	quantity=10)
inventory_low.show()

Now we are going to get a list of orders that will be able to be fulfilled. You can specify the table name if you want to register the results of this query in memory so it can be used later.

print('Orders with inventory: ')
orders_with_inventory = sqldf.sql(
	"""
	SELECT ot.*
	  FROM inventory_table it
	  JOIN order_table ot
	    ON it.item = ot.item
	 WHERE it.quantity >= ot.quantity
	"""
	)
orders_with_inventory.show()

About

Library for quering DataFrames using SQL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages