Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support %B directive for strptime #91

Closed
slashmili opened this issue Sep 23, 2021 · 4 comments
Closed

Support %B directive for strptime #91

slashmili opened this issue Sep 23, 2021 · 4 comments

Comments

@slashmili
Copy link
Owner

We want to support %B and %b directive in strptime function:

This directive is supported in datetime module:

>>> import datetime
>>> datetime.datetime.strptime('14 September 2021', "%d %B %Y")
datetime.datetime(2021, 9, 14, 0, 0)

Expected behaviour:

jdatetime.datetime.strptime('14 Khordad 1400', '%d %B %Y')
jdatetime.datetime(1400, 3, 14, 0, 0)

Open questions:

  • How should we support this directive considering different possible locale?
@Mojtaba-saf
Copy link
Contributor

to clarify by different locale you mean it should also behave like this?

jdatetime.datetime.strptime('14 خرداد 1400', '%d %B %Y')
jdatetime.datetime(1400, 3, 14, 0, 0)

it meant to be "day j_month year" but persian name was moved to last in editor

@slashmili
Copy link
Owner Author

the strptime is flexible enough to move the parts around.

the request should be :

jdatetime.datetime.strptime('۱۴ خرداد ۱۴۰۰', '%d %B %Y') # the string is in : day month-name year

jdatetime.datetime.strptime('14 Khordad 1400, "%d %B %Y")

My question was if we should support mix of English and Persian chars like:

jdatetime.datetime.strptime('14 خرداد 1400', '%d %B %Y')

And if yes, how would it be implemented?

@Mojtaba-saf
Copy link
Contributor

Mojtaba-saf commented Feb 7, 2022

I think python 3 can cast Unicode numbers to integer.

int("۱۴")=14
int("1۴")=14

but current regex patterns won't detect Persian numbers.
I propose we change the patterns to something like this to detect Persian and ASCII numbers.

_DIRECTIVE_PATTERNS = {
    '%Y': '(?P<Y>[0-9\u06F0-\u06F9]{4})',
    '%y': '(?P<y>[0-9\u06F0-\u06F9]{2})',
    '%m': '(?P<m>[0-9\u06F0-\u06F9]{1,2})',
    '%d': '(?P<d>[0-9\u06F0-\u06F9]{1,2})',
    '%H': '(?P<H>[0-9\u06F0-\u06F9]{1,2})',
    '%M': '(?P<M>[0-9\u06F0-\u06F9]{1,2})',
    '%S': '(?P<S>[0-9\u06F0-\u06F9]{1,2})',
    '%f': '(?P<f>[0-9\u06F0-\u06F9]{1,6})',
}

with this pattern it can detect any combination of numbers for example all of these now are acceptable.

"1400"
"۱۴۰۰"
"۱۴01"

we do the same for %B. pattern below detects all en_j_months pattern:
%B': '(?P<B>[a-zA-Z]{3,12})'
so Farvardin and farvardin and FaRvardin will be detected.
we add the Unicode Persian characters to this pattern and now it detects Persian names as well.

_DIRECTIVE_PATTERNS = {
    '%Y': '(?P<Y>[0-9\u06F0-\u06F9]{4})',
    '%y': '(?P<y>[0-9\u06F0-\u06F9]{2})',
    '%m': '(?P<m>[0-9\u06F0-\u06F9]{1,2})',
    '%d': '(?P<d>[0-9\u06F0-\u06F9]{1,2})',
    '%H': '(?P<H>[0-9\u06F0-\u06F9]{1,2})',
    '%M': '(?P<M>[0-9\u06F0-\u06F9]{1,2})',
    '%S': '(?P<S>[0-9\u06F0-\u06F9]{1,2})',
    '%f': '(?P<f>[0-9\u06F0-\u06F9]{1,6})',
    '%B': '(?P<B>[a-zA-Z\u0600-\u06EF\uFB8A\u067E\u0686\u06AF]{3,12})',
    '%b': '(?P<b>[a-zA-Z]{3})',
}

now it can detect Persian words but we need to map these names to a number so after regex we try to detect if %b or %B is used and if it is used we check if its ASCII or not to use the list of month names in date class to map them to a number:

match = _re.match(regex + '$', date_string)
month = get('B') or get('b') or int(get('m', 1))
if isinstance(month, str):
            try:
                if get('b'):
                    month = date.j_month_short_to_num(name=month)
                elif month.isascii():
                    month = date.j_month_to_num(name=month)
                else:
                    month = date.j_months_fa_to_num(name=month)
            except ValueError:
                raise ValueError(
                    "time data '%s' does not match format '%s'" %
                    (date_string, format)
                )

this should work for all the instances below:

jdatetime.datetime.strptime('14 Khordad 1400', "%d %B %Y")
jdatetime.datetime.strptime('۱۴ Khordad ۱۴۰۰', "%d %B %Y")
jdatetime.datetime.strptime('۱۴ Khordad 1400', "%d %B %Y")
jdatetime.datetime.strptime('۱۴ خرداد ۱۴۰۰', '%d %B %Y') # the string is in : day month-name year
jdatetime.datetime.strptime('14 خرداد 1400', '%d %B %Y') # the string is in : day month-name year
jdatetime.datetime.strptime('1۴ خرداد 14۰0', '%d %B %Y') # the string is in : day month-name year

ps:
pattern to detect Persian month names could be optimized cause max length would be for "اردیبهشت" and it's 8 characters.

@slashmili
Copy link
Owner Author

Deployed as part of v4.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants