-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PaddleDetection-based Layout Model #54
Add PaddleDetection-based Layout Model #54
Conversation
setup.py
Outdated
@@ -27,12 +27,22 @@ | |||
"torch", | |||
"torchvision", | |||
"iopath", | |||
"tqdm", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个去掉吧,安装paddleocr的话,会自动安装这个模块的
|
||
if not enforce_cpu: | ||
# initial GPU memory(M), device ID | ||
config.enable_use_gpu(200, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
改为2000吧
Thanks for this PR. Here are some general questions/thoughts:
|
Thanks for your replay, I'll modify it later! |
Hi, For the
|
Hi,I've modied the code, including:
|
setup.py
Outdated
@@ -33,6 +33,15 @@ | |||
'google-cloud-vision==1', | |||
'pytesseract' | |||
], | |||
"paddleocr": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't change anything here - I'll modify it later myself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/layoutparser/__init__.py
Outdated
@@ -11,11 +11,13 @@ | |||
|
|||
from .ocr import ( | |||
GCVFeatureType, GCVAgent, | |||
TesseractFeatureType, TesseractAgent | |||
TesseractFeatureType, TesseractAgent, | |||
PaddleocrAgent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't add OCR Agent in this PR -- right now we focus on LayoutModel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -86,7 +86,6 @@ | |||
0: "Table" | |||
}, | |||
} | |||
# fmt: on |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I've modified it back
} | ||
|
||
# fmt: off | ||
LABEL_MAP_CATALOG = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the labeling mappings for unused datasets.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
# fmt: on | ||
|
||
|
||
class LayoutParserDetectron2ModelHandler(PathHandler): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change class name to LayoutParserPaddleModelHandler
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
trt_min_shape=1, | ||
trt_max_shape=1280, | ||
trt_opt_shape=640, | ||
min_subgraph_size=3): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And are there any documentations on how to set and adjust these parameters, in English? If not, I would suggest removing them..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
enable_mkldnn=True, | ||
thread_num=10, | ||
min_subgraph_size=3, | ||
use_dynamic_shape=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And apparently some of the hyperparameters/configurations are not used. Please remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
inputs = self.create_inputs(im, im_info) | ||
return inputs | ||
|
||
def postprocess(self, np_boxes, np_masks, inputs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need merge the postprocess
with gather_output
. It's not reasonable to break them into two functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've merged ,thks
|
||
if not enforce_cpu: | ||
# initial GPU memory(M), device ID | ||
config.enable_use_gpu(2000, 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why it is 2000 rather than other values? Please leave a note in the comment. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
setup.py
Outdated
@@ -33,6 +33,12 @@ | |||
'google-cloud-vision==1', | |||
'pytesseract' | |||
], | |||
"paddlepaddle": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for not being clear earlier. Please remove all the new extras_require
, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
@@ -10,7 +10,7 @@ class DropboxHandler(HTTPURLHandler): | |||
""" | |||
|
|||
def _get_supported_prefixes(self): | |||
return ["https://www.dropbox.com"] | |||
return ["https://www.dropbox.com","https://paddle-model-ecology.bj.bcebos.com"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is a good idea -- perhaps it's better to create something like class PaddleModelURLHandler(HTTPURLHandler):
in the paddle model catalog.py
file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the comments below for more detailed instructions.
layout = self.gather_output(np_boxes, np_masks) | ||
return layout | ||
|
||
def untar_files(self, model_tar, model_dir): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels counter-intuitive to put this function inside the modeling folder, and I think it's better to include this part inside the PathManager, by rewriting this function _get_local_path
. So basically it will be something like this:
class PaddleModelURLHandler(HTTPURLHandler):
"""
Supports download and file check for dropbox links
"""
def _get_supported_prefixes(self):
return ["https://paddle-model-ecology.bj.bcebos.com"]
def _isfile(self, path):
return path in self.cache_map
def _get_local_path(
self,
path: str,
force: bool = False,
cache_dir: Optional[str] = None,
**kwargs: Any,
) -> str:
"""
This implementation downloads the remote resource and caches it locally.
The resource will only be downloaded if not previously requested.
"""
self._check_kwargs(kwargs)
if (
force
or path not in self.cache_map
or not os.path.exists(self.cache_map[path])
):
logger = logging.getLogger(__name__)
parsed_url = urlparse(path)
dirname = os.path.join(
get_cache_dir(cache_dir), os.path.dirname(parsed_url.path.lstrip("/"))
)
filename = path.split("/")[-1]
if len(filename) > self.MAX_FILENAME_LEN:
filename = filename[:100] + "_" + uuid.uuid4().hex
cached = os.path.join(dirname, filename)
with file_lock(cached):
if not os.path.isfile(cached):
logger.info("Downloading {} ...".format(path))
cached = download(path, dirname, filename=filename)
#### ----> Add from here
if path.endswith(".tar"):
untar_function()
logger.info("URL {} cached in {}".format(path, cached))
self.cache_map[path] = cached
return self.cache_map[path]
You might also need to check whether the untar'ed version exists beforehand to avoid duplicated downloading.
image, im_info = permute(image, im_info) | ||
|
||
inputs = {} | ||
inputs['image'] = np.array((image, )).astype('float32') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels weird -- probably it's better to have np.array(image)[np.newaxis, :].astype('float32')
.
config of model, defined by `Config(model_dir)` | ||
model_path (str): | ||
The path to the saved weights of the model. | ||
threshold (float): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to clean up the docstrings -- and you can specify what the extra configs are under the extra_configs
block.
config_path = self._reconstruct_path_with_detector_name(config_path) | ||
model_tar = PathManager.get_local_path(config_path) | ||
|
||
pre_dir = os.path.dirname(model_tar) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove this -- see the details below.
thread_num=extra_config.get('thread_num',10)) | ||
|
||
self.threshold = extra_config.get('threshold',0.5) | ||
self.input_shape = extra_config.get('input_shape',[3,640,640]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As input_shape
is in self.im_info
, you might want to remove this variable.
self.threshold = extra_config.get('threshold',0.5) | ||
self.input_shape = extra_config.get('input_shape',[3,640,640]) | ||
self.label_map = label_map | ||
self.im_info = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So basically this is something like default_image_info
, right? It's better to use a more descriptive name rather than im_info
, which is also the outputs of the preprocessors.
im_info (dict): info of processed image | ||
""" | ||
image = image.astype(np.float32, copy=False) | ||
mean = np.array(mean)[np.newaxis, np.newaxis, :] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why initializing the arrays this every time? We can just initialize the mean
and std
to be the appropriate ndarray
s.
inputs (dict): input of model | ||
""" | ||
# read rgb image | ||
image, im_info = decode_image(image, self.im_info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the design here:
- in the current setting, the
self.im_info
will be changed for all different image inputs, which doesn't make sense. - im_info is only changed in the
resize
function, and thedecode_image
doesn't changeimage
at all - can we further simplify the APIs?
For example, an alternative example could be:
image = image.copy()
input_shape = np.array(image.shape[:2], dtype=np.float32)
image, scale_factor = resize(image) # change the return value
image = (image - self.extra_config['pixel_mean']) \
/ image - self.extra_config['pixel_std'] # may need change how to load the values
image = image.transpose((2, 0, 1)).copy() # The model requires channel-first input
model_input_image_shape = np.array(image.shape[:2], dtype=np.float32)
image_info = {
'scale_factor': scale_factor,
'im_shape': model_input_image_shape,
'input_shape': input_shape,
}
Thanks for the updates! The current version is better than the previous one, but I think the code could be further simplified and more structured -- please check the comments for possible updates. |
I've also carefully checked the code, and added some changes. Some key updates:
Please note, you mentioned in the previous version that paddle mode may support specifying And in the future we might need to move the paddle model to dropbox/aws as the downloading speed is ~2MB/s in US, taking around 2~3 minutes to download the whole model, which might be suboptimal. Otherwise I think this PR is ready to be merged - please take a quick look and let me know if the new changes are OK. Thanks for the great work! |
Hi, thanks for your new changes, which further simplifies the code.
|
The changes are ok for me,too. Through this repo and the whole modification process, I also learned a lot. Thank you very much! |
I think ultimately we'll provide better downloading support, e.g., mirrored downloading sites and or something mentioned in #43 . At that time, ppl from different regions can choose the best downloading methods accordingly. |
Hi, I have reorganized the file structure of Paddle.
Please review.Thx