Skip to content

Commit

Permalink
Check in helpful mapping notebook
Browse files Browse the repository at this point in the history
  • Loading branch information
e-belfer committed Sep 26, 2024
1 parent ac1ce73 commit e2b6f96
Show file tree
Hide file tree
Showing 2 changed files with 125 additions and 1 deletion.
123 changes: 123 additions & 0 deletions devtools/pudl_id_mapping_help.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "f22a1df7",
"metadata": {},
"source": [
"# PUDL ID Mapping Help\n",
"\n",
"This notebook helps to support the manual mapping of FERC to EIA plant IDs. See the [PUDL ID mapping](https://catalystcoop-pudl.readthedocs.io/en/latest/dev/pudl_id_mapping.html) documentation for more information."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "5d7c4cba-9e4c-4f00-b120-046d63929ed7",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import pudl\n",
"import pudl.logging_helpers\n",
"from pudl.etl import default_assets, defs\n",
"\n",
"logger = pudl.logging_helpers.get_logger(__name__)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "acada4d1-cb5e-4a9b-a372-52596e2cf5f0",
"metadata": {},
"outputs": [],
"source": [
"plants_eia = defs.load_asset_value(\"out_eia__yearly_plants\")\n",
"plants_pudl = defs.load_asset_value(\"core_pudl__entity_plants_pudl\")\n",
"plants_ferc = defs.load_asset_value(\"out_ferc1__yearly_all_plants\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3f0e2a0c-c94b-4b11-88f9-720094bb768a",
"metadata": {},
"outputs": [],
"source": [
"cols_eia = [\"plant_id_pudl\",\"plant_id_eia\",\"plant_name_eia\",\"utility_name_eia\",\"city\",\"county\", \"latitude\",\"longitude\",\"state\"]\n",
"cols_ferc = [\"plant_id_pudl\",\"plant_id_ferc1\",\"plant_name_ferc1\", \"utility_name_ferc1\", \"capacity_mw\", \"record_id\"]\n",
"plants = pd.merge(\n",
" plants_pudl,\n",
" plants_eia[cols_eia].drop_duplicates(),\n",
" how=\"outer\",\n",
" on=[\"plant_id_pudl\"],\n",
" validate=\"1:m\"\n",
").merge(\n",
" plants_ferc[cols_ferc].drop_duplicates(subset=[col for col in cols_ferc if col != \"record_id\"]),\n",
" how=\"outer\",\n",
" on=[\"plant_id_pudl\"],\n",
" suffixes=(\"_eia\", \"_ferc\")\n",
")\n",
"plants.plant_name_eia = plants.plant_name_eia.str.lower()"
]
},
{
"cell_type": "markdown",
"id": "607c048f",
"metadata": {},
"source": [
"Use the snippet of code below to speed up searching for plant matches. Update the matching ID value in the spreadsheet by linking it to the cell, _not_ by hard-coding the value!"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "8fda99b8-091f-4c16-958b-51ebffbc95eb",
"metadata": {},
"outputs": [],
"source": [
"name_bit = \"richmond\"\n",
"# when you actually need to restrict it by state bc there are too many\n",
"# add your state and un-comment out the state line below\n",
"state = \"VT\"\n",
"plants[\n",
" (plants.plant_name_eia.str.contains(name_bit)\n",
" | plants.plant_name_pudl.str.contains(name_bit)\n",
" | plants.plant_name_ferc1.str.contains(name_bit))\n",
" & ((plants.state == state) | plants.state.isnull())\n",
"].sort_values([\"latitude\"])"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c6accad0",
"metadata": {},
"outputs": [],
"source": [
"plants_entity = defs.load_asset_value(\"out_eia__entity_plants\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
3 changes: 2 additions & 1 deletion docs/dev/pudl_id_mapping.rst
Original file line number Diff line number Diff line change
Expand Up @@ -164,7 +164,8 @@ plant name string (e.g. for ``chenango solar``, you could search for ``chen``,
or ``chenan``). Searching the entire plant tab helps find other records within
both FERC and EIA that may be the same or part of the same facility. Searching
for a piece can help catch misspellings in the plant name, which are more common
in the FERC records.
in the FERC records. Use the ``devtools/pudl_id_mapping_help.ipynb`` notebook to speed
up this process.

* **If co-located EIA plants have distinct plant IDs and no FERC 1 plant:**
they should not be lumped under a single PUDL Plant ID, as that artificially
Expand Down

0 comments on commit e2b6f96

Please sign in to comment.