Skip to content

Troubleshooting geo data

nvkelso edited this page Sep 8, 2012 · 2 revisions

#Generating Shapefile shx Files

From Joel Lawhead, PMP's expired blog post November 2, 2011.

Shapefile shx files help software locate records quickly but they are not strictly necessary. If for some reason, you end up with a shapefile that is missing the shx file then most software is going to complain and refuse to deal with it.

The purpose of the shx file is to provide faster access to a particular record in a shapefile without storing the entire record set of the shp and dbf files in memory. The header of the shx file is 100 bytes long. Each record is 8 bytes long. So if I want to access record 3, I know that 2*8 = 16 and I can jump to byte 100+16=116 in the shx file, read the 8-byte record to get the offset and record length within the shp file, and then jump straight to that location in the shp file.

While the shx file is convienient it isn't necessary. Most software balks if it is not there though. However pyshp handles it gracefully. If the shx index is there it is used for record access, if not then pyshp reads through the shp records into memory and handles the records as a python list.

Sometimes shx files become corrputed or go missing. You can build a new shx index using pyshp. It's kind of a hack but still very simple. In the following example we build an index file for a point shapefile named "myshape" that has two files: "myshape.shp" and "myshape.dbf"

# Build a new shx index file
import shapefile
# Explicitly name the shp and dbf file objects
# so pyshp ignores the missing/corrupt shx
myshp = open("myshape.shp", "rb")
mydbf = open("myshape.dbf", "rb")
r = shapefile.Reader(shp=myshp, shx=None, dbf=mydbf)
w = shapefile.Writer(r.shapeType)
# Copy everything from reader object to writer object
w._shapes = r.shapes()
w.records = r.records()
w.fields = list(r.fields)
# saving will generate the shx
w.save("myshape")

If the shx file is missing it will be created. If it's corrupt it will be overwritten. So the moral of the story is because shapefiles consist of multiple files, it is actually a robust format. The data in the individual files can usually be accessed in isolation from the other files despite what the standard requires - assuming the software you're using is willing to cooperate.

Clone this wiki locally