Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pull] master from torvalds:master #35

Merged
merged 639 commits into from
Jun 4, 2020
Merged

[pull] master from torvalds:master #35

merged 639 commits into from
Jun 4, 2020
This pull request is big! We’re only showing the most recent 250 commits.

Commits on May 20, 2020

  1. media: atomisp: fix several typos

    Running checkpatch.pl codespell logic found several typos at atomisp
    driver.
    
    Fix them using --fix-inline.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    4636a85 View commit details
    Browse the repository at this point in the history
  2. media: atomisp: fix several coding style issues

    Use checkpatch.pl --fix-inplace --strict to solve several
    coding style issues, manually reviewing the produced code and
    fixing some troubles caused by checkpatch.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    bdfe0be View commit details
    Browse the repository at this point in the history
  3. media: atomisp: do lots of other coding style cleanups

    Use some auto-reformat tools to make the atomisp style
    a little better. There are still lots of weird things there,
    but this will hopefully reduce the number of pure coding
    style patches submitted upstream.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    eaa399e View commit details
    Browse the repository at this point in the history
  4. media: atomisp: remove some dead code

    There are several parts of atomisp that are meant to be
    built on different environments, tested using ifdefs.
    
    Remove some of them, as this code should build only on
    Linux.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    02330fb View commit details
    Browse the repository at this point in the history
  5. media: atomisp: simplify math_support.h

    There are some uneeded defines there. Simplify it, and make
    it independent of defines.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    9a0d7fb View commit details
    Browse the repository at this point in the history
  6. media: atomisp: add a way for the driver to know the chipset version

    The atomisp supports two different chipsets: ISP2400 and ISP2401.
    Right now, this is controlled by ugly #defines inside the driver.
    
    Add a global bolean to identify the type of hardware. While this
    is hacky, it would be a quick way to start removing the ugly
    ifdefs.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    e1ac35b View commit details
    Browse the repository at this point in the history
  7. media: atomisp: atomisp_cmd.c test ISP version in runtime

    The logic there has lots of ifdef dependencies if the hardware
    is either ISP2400 or ISP2041.
    
    Replace them by runtime checks.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    ea419fd View commit details
    Browse the repository at this point in the history
  8. media: atomisp: atomisp_dfs_tables.h: don't depend on ISP version

    There's a dependency on this header for the ISP model. While
    this sounds really weird (as just one resolution needs it),
    as we don't know what's the right value, let's just keep it.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    268ff5b View commit details
    Browse the repository at this point in the history
  9. media: atomisp: pci/atomisp2/*.h remove #ifdef ISP2401

    Those ifs can easily be removed without breaking the code.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    643405b View commit details
    Browse the repository at this point in the history
  10. media: atomisp: atomisp_ioctl.c: get rid of a ISP2400/ISP2401 dependency

    Replace #ifdef occurrences there with runtime checks.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    9ace178 View commit details
    Browse the repository at this point in the history
  11. media: atomisp: atomisp_v4l2.c: set wdt timers according with ISP ver…

    …sion
    
    Add a runtime check to use the proper wdt timer init at runtime,
    depending on the chipset revision.
    
    For now, we can't get rid of the remaining version checks, as
    the rest of the code is not prepared yet to detect the ISP
    version on runtime.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    a19b190 View commit details
    Browse the repository at this point in the history
  12. media: atomisp: atomisp_subdev.c check ISP version on runtime

    Remove ISP-version-dependent ifdefs.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    02c3923 View commit details
    Browse the repository at this point in the history
  13. media: atomisp: atomisp_csi2.c: remove useless ifdefs

    The ifdefs there are meaningless. Just remove them for good.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    78e2888 View commit details
    Browse the repository at this point in the history
  14. media: atomisp: atomisp_compat_css20.c: detect ISP at runtime

    Remove ifdefs that check ISP version from the code, switching
    to specific ISP-dependent code at runtime.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    7ef17aa View commit details
    Browse the repository at this point in the history
  15. media: atomisp: atomisp_compat_ioctl32.c: be independent of ISP version

    There are two ioctls that are only available with ISP2401. Yet,
    at the compat level, we don't really need to take care, as
    the native ioctl handler will already return an error code if
    the ioctl doesn't exist.
    
    So, let's just remove the ifdefs here.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    483f521 View commit details
    Browse the repository at this point in the history
  16. media: atomisp: sh_css_defs.h: get rid of build time dependencies

    There are several #ifdefs checking for ISP version there. Some
    of them are just two different ways to represent the same contants,
    while 3 parameters are actually different, depending on the ISP
    version.
    
    Change the header in a way that it will be compatible with both
    versions, and change dependent code to keep running, removing
    ifdefs there only when possible.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    ffa1236 View commit details
    Browse the repository at this point in the history
  17. media: atomisp: make sh_css_struct.h independent of ISP version

    Use the same struct for both ISP2400 and ISP2401.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    977e77c View commit details
    Browse the repository at this point in the history
  18. media: atomisp: make sh_css_sp_init_pipeline() ISP version independent

    This function call has two parameters that are used only with
    ISP2401, enclosed on some ugly ifdefs. Make the function independent,
    passing NULL values for ISP2400.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    4f744a5 View commit details
    Browse the repository at this point in the history
  19. media: atomisp: remove ISP version macros from sh_css_legacy.h

    This header is really version-independent. So, just get rid
    of the macros from it.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    6a2782c View commit details
    Browse the repository at this point in the history
  20. media: atomisp: remove table duplication from dfs tables

    The way atomisp_dfs_tables.h is defined, it ends by duplicating
    all data structs there on both atomisp_v4l2.c and atomisp_cmd.c.
    
    Change the logic in order to place the definitions only on a single
    place.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    5e09474 View commit details
    Browse the repository at this point in the history
  21. media: atomisp: unify sh_css_params_shading_id_table_generate()

    Instead of packing parameters differently on ISP2400 and ISP2401,
    use just one way of passing them for both.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    19801a1 View commit details
    Browse the repository at this point in the history
  22. media: atomisp: sh_css_param_dvs.h remove ISP version macros

    As namespaces aren't duplicated here, just remove the ifdefs.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    1e97292 View commit details
    Browse the repository at this point in the history
  23. media: atomisp: print css_version in runtime

    The CSS version returned by ISP2400 is different than the one
    returned by ISP2401.
    
    While we could return just one version for both, as this sounds
    like just an informative string, for now, let's keep returning
    different versions, as we don't know if this would affect
    userspace.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    fe670b2 View commit details
    Browse the repository at this point in the history
  24. media: atomisp: add support for possible new names

    This patch addresses what it sounds to be a change at the
    name of some ACPI registers on newer ACPI tables.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    1c874c1 View commit details
    Browse the repository at this point in the history
  25. media: atomisp: css_trace.h: use the newest tracing code

    The css_trace header for ISP2401 also builds on older versions, and
    seems to be compatible with all versions. So, remove all ifdefs
    in favor of the CSP2401 version.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    bd3016e View commit details
    Browse the repository at this point in the history
  26. media: atomisp: ia_css_binary_get_shading_info(): don't test version

    It doesn't make any sense to change the number of parameters
    for this function depending on the ISP version.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    c06e212 View commit details
    Browse the repository at this point in the history
  27. media: atomisp: get rid of some non-existing functions for ISP2401

    There are no ia_css_set_system_mode() nor
    ia_css_is_system_mode_suspend_or_resume() functions at the driver.
    
    So, get rid of the code that would try to call it.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    8a85fe1 View commit details
    Browse the repository at this point in the history
  28. media: atomisp: make util.c work with ISP2401 runtime detection

    Don't hide those small functions behind ifdefs.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    7535c68 View commit details
    Browse the repository at this point in the history
  29. media: atomisp: sh_css: detect ISP version at runtime

    Get rid of all those ifdefs that were checking for ISP2401 inside
    sh_css.c.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    406ae76 View commit details
    Browse the repository at this point in the history
  30. media: atomisp: isp_const.h: get rid of an unused big define list

    None of those SH_CSS_BINARY_ID_* symbols are used by this driver
    anymore. So, get rid of all of them.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    badd9b3 View commit details
    Browse the repository at this point in the history
  31. media: atomisp: get rid of several typedef-style defines

    Those vars aren't used anymore at this driver. Get rid of
    them.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    d4cf993 View commit details
    Browse the repository at this point in the history
  32. media: atomisp: get rid of trivial ISP2401 dependencies on header files

    On several header files, the dependency for ISP2401 is
    trivial: for example, it just adds new fields on structs or
    declare new functions.
    
    Get rid of those trivial cases.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    8fba22f View commit details
    Browse the repository at this point in the history
  33. media: atomisp: get rid of unused header files

    Those 4 header files aren't used anyware. So, send them to
    the trash can.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    4dcf781 View commit details
    Browse the repository at this point in the history
  34. media: atomisp: remove unused definitions at */isp_capture_defs.h

    The isp_capture_defs.h contain several unused defines.
    Get rid of some of them, making all 3 instances identical.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    c8b1a84 View commit details
    Browse the repository at this point in the history
  35. media: atomisp: remove several duplicated files

    Those files have identical contents, but are located at
    different parts of the driver. As their contents are identical,
    we can simply remove them.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    7c2b6c1 View commit details
    Browse the repository at this point in the history
  36. media: atomisp: remove unused hive_isp_css_host_ids_hrt.h

    Nothing here is really used by the driver. So, let's just
    get rid of them.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    ecdb2e3 View commit details
    Browse the repository at this point in the history
  37. media: atomisp: hive_isp_css_defs.h: keep just one copy of it

    While those headers are different, the different fields
    aren't used at the driver. So, remove those different
    unused fields, rename one define and use just one header
    for all 3 different versions of the ISP.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    14131db View commit details
    Browse the repository at this point in the history
  38. media: atomisp: get finish de-duplication of hrt/hive*.h

    The last header (hive_isp_css_2401_irq_types_hrt.h) is also
    almost identical, except by an if ISP2400 inside a comment
    block.
    
    Remove the duplication and keep just one file.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    c343a51 View commit details
    Browse the repository at this point in the history
  39. media: atomisp: add Asus Transform T101HA ACPI vars

    Those were extracted from an ACPI dump:
    
     * Original Table Header:
     *     Signature        "DSDT"
     *     Length           0x0001A0BD (106685)
     *     Revision         0x02
     *     Checksum         0x76
     *     OEM ID           "_ASUS_"
     *     OEM Table ID     "Notebook"
     *     OEM Revision     0x01072009 (17244169)
     *     Compiler ID      "INTL"
     *     Compiler Version 0x20120913 (538052883)
     */
    DefinitionBlock ("", "DSDT", 2, "_ASUS_", "Notebook", 0x01072009)
    ...
                        Local0 = Package (0x12)
                            {
                                "CamId",
                                "ov2680",
                                "CamType",
                                "1",
                                "CsiPort",
                                "0",
                                "CsiLanes",
                                "1",
                                "CsiFmt",
                                "15",
                                "CsiBayer",
                                "0",
                                "CamClk",
                                "1",
                                "Regulator1p8v",
                                "0",
                                "Regulator2p8v",
                                "0"
                            }
    
    Note: the DMI_MATCH() line probably needs to be tweaked.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    0a76fd8 View commit details
    Browse the repository at this point in the history
  40. media: atomisp: use regulator_get_optional() for first attempt

    Some BIOSes seem to use different names for some regulators.
    
    Use regulator_get_optional() for the first attempt, in order
    to avoid using the dummy regulator and produce a warning, in
    the case that the first attempt fails.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    5060e35 View commit details
    Browse the repository at this point in the history
  41. media: atomisp: remove bayer_io_ls duplication

    There are two instances of those, one for isp2401 and another
    one for isp2400, both with identical contents, except for
    comments and an ifdef.
    
    Get rid of one of them.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    3a5e9f4 View commit details
    Browse the repository at this point in the history
  42. media: atomisp: rename anr2 param header file

    This file is different than the anr1 version. So, let's name
    it differently.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    5254591 View commit details
    Browse the repository at this point in the history
  43. media: atomisp: get rid of io_ls/ subdir

    The contents of this file is identical to ipu2_io_ls, except
    for the bayer directory, with is only at ipu2_io_ls.
    
    So, get rid of the duplicated code.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    33c0411 View commit details
    Browse the repository at this point in the history
  44. media: atomisp: remove unused duplicated files

    Those files aren't used. So, just get rid of them.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    1360fa6 View commit details
    Browse the repository at this point in the history
  45. media: atomisp: get rid of trivial version checks at *.h

    Most of the remaining ifdefs check for ISP2401 are trivial.
    
    Get rid of them.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    e6c1310 View commit details
    Browse the repository at this point in the history
  46. media: atomisp: get rid of ia_css_sc_param.h version dependency

    That's the last header file which had ifdefs for ISP2401.
    
    The problem is that the conflicting dependencies were on another
    file (sh_css_defs.h). Move the conflicting code to it, adding
    a prefix which would describe what version the macro applies.
    
    Then, ensure that binary.c will use the right version,
    according with the hardware version.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    8022c2e View commit details
    Browse the repository at this point in the history
  47. media: atomisp: get rid of ISP_VMEM_IS_BAMEM unused defines

    There are several defines on ISP-specific definition sets
    that are unused, related to VMEM_BAMEM. Get rid of those.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    e3292f8 View commit details
    Browse the repository at this point in the history
  48. media: atomisp: get rid of __ISP define tests

    This is not defined anywhere, so just get rid of the dead
    source code.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    c6552ae View commit details
    Browse the repository at this point in the history
  49. media: atomisp: make all file names unique at atomisp driver

    The *system_*.h files contain ISP-specific definitions, and are
    used everywhere.
    
    While the best would be to get rid of those in favor of some
    ISP-specific structs, a change like that would require lots
    of changes.
    
    So, instead, let's rename those files replacing them by new
    ones with ISP ifdefs on it, in order to select between the
    two different versions.
    
    We shall later convert this to some abrstraction layer,
    but this change should help to be able to build support for
    either ISP2400 or ISP2401.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    b82cd6b View commit details
    Browse the repository at this point in the history
  50. media: atomisp: simplify makefiles

    Remove an uneeded define and Makefile.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    d876334 View commit details
    Browse the repository at this point in the history
  51. media: atomisp: cleanup directory hierarchy

    This driver has very long directories without a good
    reason (IMHO). Let's drop two directories from such hierarchy,
    in order to simplify things a little bit and make the dir
    output a bit more readable.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    9d4fa1a View commit details
    Browse the repository at this point in the history
  52. media: atomisp: get rid of some broken code

    Probably due to some version conflicts while the atomisp code
    were generated, some things don't build for ISP2401. So, use
    the ISP2400 variant when available, or get rid of the
    code that doesn't build.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    8d4af31 View commit details
    Browse the repository at this point in the history
  53. media: atomisp: change function worders and fix include

    With the current way, it will produce lots of errors because
    the public header contains wrong definitions and the private
    one has functions defined at the wrong order.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    fe4586c View commit details
    Browse the repository at this point in the history
  54. media: atomisp: allow building for isp2401

    Now that everything needed to build for ISP2401 is solved,
    we can setup atomisp to build either for ISP2400 or ISP2401.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    0850936 View commit details
    Browse the repository at this point in the history
  55. media: atomisp: cleanup contents of css_2400_system/

    Everything there is for ISP2400 only. So, we can trivially
    solve all ifdefs at once
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    9935e29 View commit details
    Browse the repository at this point in the history
  56. media: atomisp: cleanup contents of css_2401_csi2p_system

    Everything there is for ISP2401 only. So, we can trivially
    solve all ifdefs at once.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    f172f6e View commit details
    Browse the repository at this point in the history
  57. media: atomisp: cleanup contents of css_2401_system

    Everything there is for ISP2401 only. So, we can trivially
    solve all ifdefs at once.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    f064805 View commit details
    Browse the repository at this point in the history
  58. media: atomisp: get rid of most checks for ISP2401 version

    There are lots of places inside this driver checking for
    ISP2400/ISP2401 verison. Get rid of most of those, while
    keep building for both.
    
    Most of stuff in this patch is trivial to solve.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    3c0538f View commit details
    Browse the repository at this point in the history
  59. media: atomisp: add firmware load code for ISP2401 rev B0

    The Asus Transformer T101HA comes with a newer hardware
    version. Add support to load firmware for it.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    bbf3f78 View commit details
    Browse the repository at this point in the history
  60. media: atomisp: remove some file duplication and do more dir renames

    There are currently two identical copies of some files, one
    at css_2401_csi2p_system/ and another one at css_2401_system/.
    
    Get rid of one of them, moving the remaining files to the
    directory with the shortest name.
    
    While here, do more renames, in order to get smaller path
    names.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    0057131 View commit details
    Browse the repository at this point in the history
  61. media: atomisp: use add_qos_request instead of update

    It doesn't make senst to update a request that was not
    created. So, instead of using cpu_latency_qos_update_request(),
    let's use, instead cpu_latency_qos_add_request() at device
    probing code.
    
    This should fix this issue:
    
    [    9.691775] cpu_latency_qos_update_request called for unknown object
    [    9.695279] WARNING: CPU: 3 PID: 523 at kernel/power/qos.c:296 cpu_latency_qos_update_request+0x3a/0xb0
    [    9.698826] Modules linked in: snd_soc_acpi_intel_match snd_rawmidi snd_soc_acpi snd_soc_rl6231 snd_soc_core ath mac80211 snd_compress snd_hdmi_lpe_audio ac97_bus hid_sensor_accel_3d snd_pcm_dmaengine hid_sensor_gyro_3d hid_sensor_trigger industrialio_triggered_buffer kfifo_buf hid_sensor_iio_common processor_thermal_device industrialio cfg80211 snd_pcm snd_seq intel_rapl_common atomisp(C+) libarc4 intel_soc_dts_iosf cros_ec_ishtp intel_xhci_usb_role_switch mei_txe cros_ec videobuf_vmalloc mei roles atomisp_ov2680(C) videobuf_core snd_seq_device snd_timer spi_pxa2xx_platform videodev snd mc dw_dmac intel_hid dw_dmac_core 8250_dw soundcore int3406_thermal int3400_thermal intel_int0002_vgpio acpi_pad acpi_thermal_rel soc_button_array int3403_thermal int340x_thermal_zone mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 hid_sensor_custom hid_sensor_hub intel_ishtp_loader intel_ishtp_hid crct10dif_pclmul crc32_pclmul ghash_clmulni_intel i915 mmc_block i2c_algo_bit
    [    9.698885]  aesni_intel crypto_simd drm_kms_helper cryptd syscopyarea sysfillrect glue_helper sysimgblt fb_sys_fops cec intel_ish_ipc drm lpc_ich intel_ishtp hid_asus intel_soc_pmic_chtdc_ti asus_wmi i2c_hid sparse_keymap sdhci_acpi wmi video sdhci hid_generic usbhid hid
    [    9.736699] CPU: 3 PID: 523 Comm: systemd-udevd Tainted: G         C        5.7.0-rc1+ #2
    [    9.741309] Hardware name: ASUSTeK COMPUTER INC. T101HA/T101HA, BIOS T101HA.305 01/24/2018
    [    9.745962] RIP: 0010:cpu_latency_qos_update_request+0x3a/0xb0
    [    9.750615] Code: 89 e5 41 55 41 54 41 89 f4 53 48 89 fb 48 81 7f 28 e0 7f c6 9e 74 1c 48 c7 c6 60 f3 65 9e 48 c7 c7 e8 a9 99 9e e8 b2 a6 f9 ff <0f> 0b 5b 41 5c 41 5d 5d c3 0f 1f 44 00 00 44 3b 23 74 ef 44 89 e2
    [    9.760065] RSP: 0018:ffffa865404f39c0 EFLAGS: 00010282
    [    9.764734] RAX: 0000000000000000 RBX: ffff9d2aefc84350 RCX: 0000000000000000
    [    9.769435] RDX: ffff9d2afbfa97c0 RSI: ffff9d2afbf99808 RDI: ffff9d2afbf99808
    [    9.774125] RBP: ffffa865404f39d8 R08: 0000000000000304 R09: 0000000000aaaaaa
    [    9.778804] R10: 0000000000000000 R11: 0000000000000001 R12: 00000000ffffffff
    [    9.783491] R13: ffff9d2afb4640b0 R14: ffffffffc07ecf20 R15: 0000000091000000
    [    9.788187] FS:  00007efe67ff8880(0000) GS:ffff9d2afbf80000(0000) knlGS:0000000000000000
    [    9.792864] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [    9.797482] CR2: 00007ffc6424bdc8 CR3: 0000000178998000 CR4: 00000000001006e0
    [    9.802126] Call Trace:
    [    9.806775]  atomisp_pci_probe.cold.19+0x15f/0x116f [atomisp]
    [    9.811441]  local_pci_probe+0x47/0x80
    [    9.816085]  pci_device_probe+0xff/0x1b0
    [    9.820706]  really_probe+0x1c8/0x3e0
    [    9.825247]  driver_probe_device+0xd9/0x120
    [    9.829769]  device_driver_attach+0x58/0x60
    [    9.834294]  __driver_attach+0x8f/0x150
    [    9.838782]  ? device_driver_attach+0x60/0x60
    [    9.843205]  ? device_driver_attach+0x60/0x60
    [    9.847634]  bus_for_each_dev+0x79/0xc0
    [    9.852033]  ? kmem_cache_alloc_trace+0x167/0x230
    [    9.856462]  driver_attach+0x1e/0x20
    
    Reported-by: Patrik Gfeller <patrik.gfeller@gmail.com>
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    ac378c9 View commit details
    Browse the repository at this point in the history
  62. media: atomisp: fix querycap initialization logic

    Some recent changes at V4L2 core changed the way querycap is handled.
    
    Due to that, this warning is generated:
    
    	WARNING: CPU: 1 PID: 503 at drivers/media/v4l2-core/v4l2-dev.c:885 __video_register_device+0x93e/0x1120 [videodev]
    
    as introduced by this commit:
    
    	commit 3c13505
    	Author: Hans Verkuil <hverkuil-cisco@xs4all.nl>
    	Date:   Tue Jul 23 04:21:25 2019 -0400
    
    	    media: v4l2-dev/ioctl: require non-zero device_caps, verify sane querycap results
    
    	    Now that all V4L2 drivers set device_caps in struct video_device, we can add
    	    a check for this to ensure all future drivers fill this in.
    
    The fix is simple: we just need to initialize dev_caps before
    registering the V4L2 dev.
    
    While here, solve other problems at VIDIOC_QUERYCAP ioctl.
    
    Reported-by: Patrik Gfeller <patrik.gfeller@gmail.com>
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    8ac1714 View commit details
    Browse the repository at this point in the history
  63. media: atomisp: move ia_css_configure_sc() implementation

    With the changes, this function is now undefined if built
    for ISP2400. So, move its implementation to the file which
    calls it.
    
    Reported-by: Francescodario Cuzzocrea <francescodario.cuzzocrea@mail.polimi.it>
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    32efca3 View commit details
    Browse the repository at this point in the history
  64. media: atomisp: disable the dummy PM driver is atomisp driver is built

    As the atomisp driver should already be handling the ISP
    PCI ID, there's no sense on keeping the dummy driver enabled
    in tis case.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    1ab7098 View commit details
    Browse the repository at this point in the history
  65. media: atomisp: print a better message when fw version is wrong

    The printed message when a firmware version is wrong says nothing
    usefull:
    
    	atomisp-isp2 0000:00:03.0: Fw version check failed.
    	atomisp-isp2: probe of 0000:00:03.0 failed with error -22
    
    Print the expected and the received firmware version instead.
    
    In order to do that, the firmware functions will need at least
    a struct device pointer, so pass it.
    
    While writing this patch, it was noticed that some of the
    abstraction layers of this driver have functions that are never
    called, but use this interface. Get rid of them.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    8568fe6 View commit details
    Browse the repository at this point in the history
  66. media: atomisp: limit the name of the firmware file

    The firmware header has 64 bytes. Properly limit it to such
    size.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    f770e91 View commit details
    Browse the repository at this point in the history
  67. media: atomisp: fix clock rate frequency setting

    changeset d5426f4 ("media: staging: atomisp: use clock framework for camera clocks")
    removed a platform-specific code to set the clock rate, in favor of
    using the Kernel clock framework.
    
    However, instead of passing the frequency for clk_set_rate(),
    it is passing either 0 or 1.
    
    Looking at the original patchset, it seems that there are two
    possible configurations for the ISP:
    
    	0 - it will use a 25 MHz XTAL to provide the clock;
    	1 - it will use a PLL with is set to 19.2 MHz
    	    (only for the CHT version?)
    
    Eventually, different XTALs and/or PLL frequencies might
    be possible some day, so, re-implent the logic for it to be
    more generic.
    
    Fixes: d5426f4 ("media: staging: atomisp: use clock framework for camera clocks")
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    9b7632e View commit details
    Browse the repository at this point in the history
  68. media: atomisp: improve device detection code

    - Remove useless check if !dev at the probe function: if
      such function is called, the device is defined.
    - Cleanup the PCI ID table using macros.
    - Use the same macros at the version-dependent part of the
      atomisp_v4l2.c file;
    - Add print messages to help understand what model the
      driver detect;
    - If device is not valid, better explain why.
    
    Signed-off-by: Mauro Carvalho Chehehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    ca133c3 View commit details
    Browse the repository at this point in the history
  69. media: atomisp: relax firmware version detection criteria

    As getting the exact version used by the driver is not easy,
    let's relax the version detection and hope for the best,
    producing just a warning.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    33c24f8 View commit details
    Browse the repository at this point in the history
  70. media: atomisp: free PCI resources when probing fail

    The atomisp probe error logic is incomplete. Add the missing
    bits to return the PCI device to its original state.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    25bccb9 View commit details
    Browse the repository at this point in the history
  71. media: atomisp: make dfs_config_merr_117a struct const

    This setting is used only for one of te Merryfield PCI IDs.
    
    As this is an ISP2400, we can just get rid of a version
    test, writing the right value directly inside the struct.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    9972311 View commit details
    Browse the repository at this point in the history
  72. media: atomisp: add -dDEBUG when building this driver

    This driver still has lots of issues. Let's enable debug
    there inconditionally, as we need more information in order
    to address the pending issues.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    88a4711 View commit details
    Browse the repository at this point in the history
  73. media: atomisp: Add some ACPI detection info

    When someone would report problems with a new device, we
    need to know the DMI product ID and the ACPI name for the
    detected sensor. So, print them at dmesg.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    0d64e94 View commit details
    Browse the repository at this point in the history
  74. media: atomisp: better display DMI and EFI found entries

    There are several device-specific data that are obtained
    either via DMI or EFI, with changes the driver's behavior.
    
    Display what has been detected, as such info may help
    identifying troubles at the driver.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    85df845 View commit details
    Browse the repository at this point in the history
  75. media: atomisp: print the type of PMIC that will be used

    While the current code is hardcoded to just one specific
    type of PMIC, it can support several types. Those should
    be board-dependent. Instead of just printing a number,
    change the message to display what type of PMIC control
    is used at runtime.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    d03f2e2 View commit details
    Browse the repository at this point in the history
  76. media: atomisp: reduce the risk of a race condition

    This driver is really on bad shape. One of the problems
    is that, as soon as the I2C transfers start to happen, it
    timeouts detecting a camera:
    
    	ov2680 i2c-OVTI2680:00: ov2680_probe: ACPI detected it on bus ID=CAM1, HID=OVTI2680
    	atomisp-isp2 0000:00:03.0: no camera attached or fail to detect
    	ov2680 i2c-OVTI2680:00: gmin: initializing atomisp module subdev data using PMIC regulator
    	...
    
    The right fix here would be to use defer probe, but driver is
    still on too bad shape.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    09d8746 View commit details
    Browse the repository at this point in the history
  77. media: atomisp: warn if unsupported subdevs are found

    Right now, the driver supports just one VCM and just one
    flash device. Warn if more than one such devices were
    probed.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    a79afb9 View commit details
    Browse the repository at this point in the history
  78. media: atomisp: detect the PMIC type

    Sub-device's power management can be provided via different ways.
    
    Instead of hardcoding it, add a code that would be detecting it.
    
    This uses a code similar to what's found at the atomisp driver
    inside the Intel Aero repository:
    
    	https://github.com/intel-aero/meta-intel-aero.git
    
    (driver was removed on some commit, but it can be found on
    git history).
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    93e24ec View commit details
    Browse the repository at this point in the history
  79. media: atomisp: move atomisp_gmin_platform.c to pci/ dir

    The atomisp_gmin_platform.c is not a platform driver anymore,
    but it is, instead, part of the atomisp driver.
    
    Move it to be together with the driver. As a bonus, as the
    atomisp i2c drivers depends on its contents, probing them
    should load automatically the atomisp core. This should
    likely avoid some possible race conditions.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    0741bf6 View commit details
    Browse the repository at this point in the history
  80. media: atomisp: add support for different PMIC configurations

    This patch required lots of research and work. The existing
    atomisp driver at staging assumed that all Intel PMIC would
    be using regulators, but upstream didn't follow it. Instead,
    the intel_pmic.c driver added a hack, instead of using i2c_transfer,
    it writes I2C values directly via regmapped registers.
    
    Oh, well... At least, it provided a common API for doing that.
    
    The PMIC settings used here came from the driver at the
    yocto Aero distribution:
    
    	https://download.01.org/aero/deb/pool/main/l/linux-4.4.76-aero-1.3/
    
    The logic itself was re-written, in order to use the I2C address
    detected by the probing part.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    b4dc4e1 View commit details
    Browse the repository at this point in the history
  81. media: atomisp: spctrl: be sure to zero .code_addr after free

    We need that to avoid trying to double-free the driver.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    4877b19 View commit details
    Browse the repository at this point in the history
  82. media: atomisp: use pcim_enable_device() again

    Changing to pci_enable_device() didn't produce the expected
    result. It could also eventually led to problems when driver
    is removed, due to object lifetime issues. So, let's just
    return to the previous behavior.
    
    Suggested-by: Andy Shevchenko <andy.shevchenko@gmail.com>
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    a27b581 View commit details
    Browse the repository at this point in the history
  83. media: atomisp: simplify the power down/up code

    Use the version from intel_atomisp2_pm.c for power up/down,
    removing some code duplication and using just one kAPI call
    for modifying the ISPSSPM0 register.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    0f441fd View commit details
    Browse the repository at this point in the history
  84. media: atomisp: remove a misplaced #endif

    There is an endif in the middle of a comment at
    ia_css_xnr3.host.c. Remove it.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    1351ea6 View commit details
    Browse the repository at this point in the history
  85. media: atomisp: fix an inverted logic

    When changing the IFs to select isp2401 at runtime, one of
    the conditions ended by being written wrong.
    
    Code double-checked on both Yocto Aero's driver version and
    against the previous code.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    3117ddd View commit details
    Browse the repository at this point in the history
  86. media: atomisp: get rid of spmem_dump.c

    Those files seem to be firmware-dependent, probably being used
    by some debug interface.
    
    Well, their contents are not really used by atomisp, so let's
    just send them to the trash can, as it shouldn't have any
    usage upstream.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    983e5ac View commit details
    Browse the repository at this point in the history
  87. media: atomisp: get rid of __bo_alloc() macro

    Simplify the hmm_bo a little bit by removing this
    macro. This will avoid printing twice errors when
    allocations happen.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    5f1e9dd View commit details
    Browse the repository at this point in the history
  88. media: atomisp: fix a slab error due to a wrong free

    The mmu mapping logic uses a different logic depending on the
    RAM size: if it is lower than 2GB, it uses kmem_cache_zalloc(),
    but if memory is bigger than that, it uses its own way to
    allocate memory.
    
    Yet, when freeing, it uses kmem_cache_free() for any cases.
    
    On recent Kernels, slab tracks the memory allocated on it,
    with causes those warnings:
    
     virt_to_cache: Object is not a Slab page!
     WARNING: CPU: 0 PID: 758 at mm/slab.h:475 cache_from_obj+0xab/0xf0
     Modules linked in: snd_soc_sst_cht_bsw_rt5645(E) mei_hdcp(E) gpio_keys(E) intel_rapl_msr(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) ghash_clmulni_intel(E) atomisp_ov2680(CE) intel_cstate(E) asus_nb_wmi(E) wdat_wdt(E) pcspkr(E) ath10k_pci(E) ath10k_core(E) intel_chtdc_ti_pwrbtn(E) ath(E) mac80211(E) btusb(E) joydev(E) btrtl(E) btbcm(E) btintel(E) bluetooth(E) libarc4(E) ecdh_generic(E) cfg80211(E) ecc(E) hid_sensor_gyro_3d(E) hid_sensor_accel_3d(E) hid_sensor_trigger(E) hid_sensor_iio_common(E) industrialio_triggered_buffer(E) kfifo_buf(E) industrialio(E) atomisp(CE) videobuf_vmalloc(E) videobuf_core(E) videodev(E) mc(E) snd_soc_rt5645(E) snd_soc_rl6231(E) snd_intel_sst_acpi(E) snd_intel_sst_core(E) snd_soc_sst_atom_hifi2_platform(E) snd_soc_acpi_intel_match(E) intel_hid(E) spi_pxa2xx_platform(E) snd_soc_acpi(E) snd_soc_core(E) snd_compress(E) dw_dmac(E) intel_xhci_usb_role_switch(E) int3406_thermal(E)
      snd_hdmi_lpe_audio(E) int3403_thermal(E) int3400_thermal(E) acpi_thermal_rel(E) snd_seq(E) intel_int0002_vgpio(E) soc_button_array(E) snd_seq_device(E) acpi_pad(E) snd_pcm(E) snd_timer(E) snd(E) soundcore(E) lpc_ich(E) mei_txe(E) mei(E) processor_thermal_device(E) intel_soc_dts_iosf(E) intel_rapl_common(E) int340x_thermal_zone(E) ip_tables(E) hid_sensor_hub(E) intel_ishtp_loader(E) intel_ishtp_hid(E) mmc_block(E) hid_multitouch(E) crc32c_intel(E) i915(E) i2c_algo_bit(E) drm_kms_helper(E) hid_asus(E) asus_wmi(E) sparse_keymap(E) rfkill(E) drm(E) intel_ish_ipc(E) intel_ishtp(E) wmi(E) video(E) i2c_hid(E) sdhci_acpi(E) sdhci(E) mmc_core(E) pwm_lpss_platform(E) pwm_lpss(E) fuse(E)
     CPU: 0 PID: 758 Comm: v4l_id Tainted: G         C  E     5.7.0-rc2+ #40
     Hardware name: ASUSTeK COMPUTER INC. T101HA/T101HA, BIOS T101HA.306 04/23/2019
     RIP: 0010:cache_from_obj+0xab/0xf0
     Code: c3 31 c0 80 3d 1c 38 72 01 00 75 f0 48 c7 c6 20 12 06 b5 48 c7 c7 10 f3 37 b5 48 89 04 24 c6 05 01 38 72 01 01 e8 2c 99 e0 ff <0f> 0b 48 8b 04 24 eb ca 48 8b 57 58 48 8b 48 58 48 c7 c6 30 12 06
     RSP: 0018:ffffb0a4c07cfb10 EFLAGS: 00010282
     RAX: 0000000000000029 RBX: 0000000000000048 RCX: 0000000000000000
     RDX: ffffa004fbca5b80 RSI: ffffa004fbc19cc8 RDI: ffffa004fbc19cc8
     RBP: 0000000000c49000 R08: 00000000000004f7 R09: 0000000000000001
     R10: 0000000000aaaaaa R11: ffffffffb50e0600 R12: ffffffffc0be0a00
     R13: ffffa003f2448000 R14: 0000000000c49000 R15: ffffa003f2448000
     FS:  00007f9060c9cb80(0000) GS:ffffa004fbc00000(0000) knlGS:0000000000000000
     CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
     CR2: 0000559fc55b8000 CR3: 0000000165b02000 CR4: 00000000001006f0
     Call Trace:
      kmem_cache_free+0x19/0x180
      mmu_l2_unmap+0xd1/0x100 [atomisp]
      mmu_unmap+0xd0/0xf0 [atomisp]
      hmm_bo_unbind+0x62/0xb0 [atomisp]
      hmm_free+0x44/0x60 [atomisp]
      ia_css_spctrl_unload_fw+0x30/0x50 [atomisp]
      ia_css_uninit+0x3a/0x90 [atomisp]
      atomisp_open+0x50b/0x5c0 [atomisp]
      v4l2_open+0x85/0xf0 [videodev]
      chrdev_open+0xdd/0x210
      ? cdev_device_add+0xc0/0xc0
      do_dentry_open+0x13a/0x380
      path_openat+0xa9a/0xfe0
      do_filp_open+0x75/0x100
      ? __check_object_size+0x12e/0x13c
      ? __alloc_fd+0x44/0x150
      do_sys_openat2+0x8a/0x130
      __x64_sys_openat+0x46/0x70
      do_syscall_64+0x5b/0xf0
      entry_SYSCALL_64_after_hwframe+0x44/0xa9
    
    Solve it by calling free_page() directly
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    7f98b89 View commit details
    Browse the repository at this point in the history
  89. media: atomisp: fix the value for CamClk on Asus T101HA

    The value returned by BIOS is 1. Fix it at the driver,
    as it won't read this from EFI.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    39c91e1 View commit details
    Browse the repository at this point in the history
  90. media: atomisp: keep the ISP powered on when setting it

    The current code causes ISP2401 to power down and never return
    back to live, causing the driver to crash.
    
    Fix it by commenting out the bad code. It should be noticed that
    the Yocto Aero code has something similar to it.
    
    Maybe the issue is related to an ISP bug (or maybe PM is
    controlled on a different way for this hardware).
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    95d1f39 View commit details
    Browse the repository at this point in the history
  91. media: atomisp: change the code to properly wait for sensor

    The sensor should finish its init before atomisp driver, as
    otherwise the atomisp driver won't be able to talk with it.
    
    So, we need to turn atomisp_gmin_platform into a module
    again, for it to not depend on atomisp driver to finish
    probing, and add some delay at atomisp to let the sensor
    driver to finish probing.
    
    Yeah, this is hacky. The real solution here would be to use
    the async framework, but for now, our goal is to make the
    driver to work. So, let's postpone such change to be done
    later.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    1d6e5c3 View commit details
    Browse the repository at this point in the history
  92. media: atomisp: ov2680: improve debug messages

    Change some code at ov2680 for it to better report what's
    happening there at sensor's level.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    5589ea0 View commit details
    Browse the repository at this point in the history
  93. media: atomisp: use read/write routines from mainstream

    There is an ov2680 driver mainstream. Use the read/write
    routines from it, as the ones inside this driver are
    generating some errors:
    
    	ov2680 i2c-OVTI2680:00: ov2680_i2c_write: i2c write reg=0x3086, value 0x00, error -121
    
    Maybe the code that changes from/to BE are not right.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    4f78f08 View commit details
    Browse the repository at this point in the history
  94. media: atomisp-ov2680: get rid of the type field

    This isn't really used, so get rid, in order to make the code
    simpler.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    b0ac238 View commit details
    Browse the repository at this point in the history
  95. media: atomisp: simplify ov2680 array write logic

    Instead of trying to send multiple bytes at the same time,
    just go one by one, like the upstream driver does.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    1bc075c View commit details
    Browse the repository at this point in the history
  96. media: atomisp: turn on camera before setting it

    Camera cannot be set on power off mode.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    eda1310 View commit details
    Browse the repository at this point in the history
  97. media: atomisp: disable the dynamic and reserved pools

    The memory management code for atomisp is complex: it has 2
    extra pools (plus some ION-specific code).
    
    The code for those extra pools are complex, and there are even
    some parts of code over there that were forked from some
    mm/ code, probably from Kernel 3.10.
    
    Let's just use a single one, in order to make the driver
    simpler.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    814634b View commit details
    Browse the repository at this point in the history
  98. media: atomisp: add a notice about possible leak resources

    Calling acpi_bus_get_device() may end allocating resources that
    aren't freed. So, add a notice about that, as, if those drivers
    get out of staging, we may need some changes.
    
    Fixes: 0d64e94 ("media: atomisp: Add some ACPI detection info")
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    c03496b View commit details
    Browse the repository at this point in the history
  99. media: atomisp: isp_mmu: don't use kmem_cache

    Instead of using it only if system memory is below 2GB,
    don't use it at all. The problem is that the code there is not
    compatible anymore with modern Kernels:
    
    [  179.552797] virt_to_cache: Object is not a Slab page!
    [  179.552821] WARNING: CPU: 0 PID: 1414 at mm/slab.h:475 cache_from_obj+0xab/0xf0
    [  179.552824] Modules linked in: ccm(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) ip6table_nat(E) ip6table_mangle(E) ip6table_raw(E) ip6table_security(E) iptable_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) libcrc32c(E) nf_defrag_ipv4(E) iptable_mangle(E) iptable_raw(E) iptable_security(E) ip_set(E) nf_tables(E) nfnetlink(E) ip6table_filter(E) ip6_tables(E) iptable_filter(E) cmac(E) bnep(E) sunrpc(E) vfat(E) fat(E) mei_hdcp(E) snd_soc_sst_cht_bsw_rt5645(E) gpio_keys(E) intel_rapl_msr(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) kvm(E) irqbypass(E) crct10dif_pclmul(E) crc32_pclmul(E) asus_nb_wmi(E) ath10k_pci(E) ghash_clmulni_intel(E) ath10k_core(E) intel_cstate(E) wdat_wdt(E) pcspkr(E) ath(E) mac80211(E) intel_chtdc_ti_pwrbtn(E) joydev(E) btusb(E) btrtl(E) btbcm(E) btintel(E) libarc4(E) bluetooth(E) cfg80211(E) ecdh_generic(E) ecc(E) mei_txe(E) mei(E) lpc_ich(E)
    [  179.552887]  hid_sensor_accel_3d(E) hid_sensor_gyro_3d(E) hid_sensor_trigger(E) hid_sensor_iio_common(E) industrialio_triggered_buffer(E) kfifo_buf(E) industrialio(E) atomisp_ov2680(CE) snd_soc_rt5645(E) snd_intel_sst_acpi(E) snd_soc_rl6231(E) snd_intel_sst_core(E) snd_soc_sst_atom_hifi2_platform(E) intel_hid(E) snd_soc_acpi_intel_match(E) spi_pxa2xx_platform(E) snd_soc_acpi(E) snd_soc_core(E) snd_compress(E) dw_dmac(E) snd_hdmi_lpe_audio(E) int3400_thermal(E) int3406_thermal(E) snd_seq(E) acpi_thermal_rel(E) int3403_thermal(E) atomisp(CE) snd_seq_device(E) snd_pcm(E) intel_int0002_vgpio(E) soc_button_array(E) acpi_pad(E) intel_xhci_usb_role_switch(E) snd_timer(E) videobuf_vmalloc(E) videobuf_core(E) snd(E) atomisp_gmin_platform(CE) soundcore(E) videodev(E) processor_thermal_device(E) intel_soc_dts_iosf(E) mc(E) intel_rapl_common(E) int340x_thermal_zone(E) ip_tables(E) hid_sensor_hub(E) intel_ishtp_loader(E) intel_ishtp_hid(E) mmc_block(E) hid_multitouch(E) crc32c_intel(E) i915(E)
    [  179.552936]  hid_asus(E) i2c_algo_bit(E) asus_wmi(E) sparse_keymap(E) rfkill(E) drm_kms_helper(E) intel_ish_ipc(E) intel_ishtp(E) drm(E) wmi(E) video(E) i2c_hid(E) pwm_lpss_platform(E) pwm_lpss(E) sdhci_acpi(E) sdhci(E) mmc_core(E) fuse(E)
    [  179.552961] CPU: 0 PID: 1414 Comm: v4l2grab Tainted: G         C  EL    5.7.0-rc2+ #42
    [  179.552963] Hardware name: ASUSTeK COMPUTER INC. T101HA/T101HA, BIOS T101HA.306 04/23/2019
    [  179.552968] RIP: 0010:cache_from_obj+0xab/0xf0
    [  179.552973] Code: c3 31 c0 80 3d 1c 38 72 01 00 75 f0 48 c7 c6 20 12 06 9f 48 c7 c7 10 f3 37 9f 48 89 04 24 c6 05 01 38 72 01 01 e8 2c 99 e0 ff <0f> 0b 48 8b 04 24 eb ca 48 8b 57 58 48 8b 48 58 48 c7 c6 30 12 06
    [  179.552976] RSP: 0018:ffffaf1f00c3fae0 EFLAGS: 00010282
    [  179.552980] RAX: 0000000000000029 RBX: 00000000000003ff RCX: 0000000000000007
    [  179.552983] RDX: 00000000fffffff8 RSI: 0000000000000082 RDI: ffff9cb6bbc19cc0
    [  179.552985] RBP: 0000000001000000 R08: 00000000000005a4 R09: ffffaf1f00c3f970
    [  179.552988] R10: 0000000000000005 R11: 0000000000000000 R12: ffffffffc0713da0
    [  179.552991] R13: ffff9cb5a7bb1000 R14: 0000000001000000 R15: ffff9cb5a7bb1000
    [  179.552995] FS:  0000000000000000(0000) GS:ffff9cb6bbc00000(0000) knlGS:0000000000000000
    [  179.552998] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  179.553000] CR2: 00007fe780544400 CR3: 000000002480a000 CR4: 00000000001006f0
    [  179.553003] Call Trace:
    [  179.553015]  kmem_cache_free+0x19/0x180
    [  179.553070]  mmu_l2_unmap+0xd1/0x100 [atomisp]
    [  179.553113]  ? __bo_merge+0x8f/0xa0 [atomisp]
    [  179.553155]  mmu_unmap+0xd0/0xf0 [atomisp]
    [  179.553198]  hmm_bo_unbind+0x62/0xb0 [atomisp]
    [  179.553240]  hmm_free+0x44/0x60 [atomisp]
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    1985e93 View commit details
    Browse the repository at this point in the history
  100. media: atomisp: print IRQ when debugging

    Add a debug printk to show what IRQ is popping up.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    cf3cd3b View commit details
    Browse the repository at this point in the history
  101. media: atomisp: don't produce errs for ignored IRQs

    Depending on the ISP-specific HAS_NO_INPUT_FORMATTER macro,
    some IRQs will be ignored by the driver. Yet, those keep
    happening, as reported by this debug print:
    
    	[   61.620746] atomisp-isp2 0000:00:03.0: atomisp_css_irq_enable: css irq info 0x00000004: disable.
    
    Causing this warning:
    	[   61.620749] atomisp-isp2 0000:00:03.0: atomisp_css_irq_enable:Invalid irq info.
    
    Well, if this is a normal situation, just ignore it without
    warnings.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    58d6ccc View commit details
    Browse the repository at this point in the history
  102. media: atomisp: adjust some code at sh_css that could be broken

    When checking sh_css.c against the Yocto Aero's version, it can
    be noticed that some isp2401 dependencies may have been taken
    wrongly.
    
    Change the code to work like the Yocto Aero, as this driver
    was tested in the past with an ISP2401 device.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    27333da View commit details
    Browse the repository at this point in the history
  103. media: atomisp: update TODO with the current data

    The TODO list doesn't reflect the current status of the driver.
    
    Update it.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    6456267 View commit details
    Browse the repository at this point in the history
  104. media: atomisp: unify the version for isp2401 a0 and b0 versions

    Based on Yocto Aero's repository, the file name for the isp2401
    is the same for the B0 release.
    
    So, unify it at the driver.
    
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    8c86642 View commit details
    Browse the repository at this point in the history
  105. media: staging: dt-bindings: phy-rockchip-dphy-rx0: remove non-used r…

    …eg property
    
    reg property is not used in Rockchip MIPI DPHY RX0 bindings, thus remove
    it.
    
    Suggested-by: Johan Jonker <jbx6244@gmail.com>
    Signed-off-by: Helen Koike <helen.koike@collabora.com>
    Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    helen-fornazier authored and mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    00994f0 View commit details
    Browse the repository at this point in the history
  106. media: dt-bindings: phy: phy-rockchip-dphy-rx0: move rockchip dphy rx…

    …0 bindings out of staging
    
    Move phy-rockchip-dphy-rx0 bindings to Documentation/devicetree/bindings/phy
    
    Verified with:
    make ARCH=arm64 dt_binding_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/phy/rockchip-mipi-dphy-rx0.yaml
    
    Signed-off-by: Helen Koike <helen.koike@collabora.com>
    Acked-by: Rob Herring <robh@kernel.org>
    Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    helen-fornazier authored and mchehab committed May 20, 2020
    Configuration menu
    Copy the full SHA
    960b2de View commit details
    Browse the repository at this point in the history

Commits on May 25, 2020

  1. media: dvbdev: Fix tuner->demod media controller link

    Fixes bug exposed by:
    
    [a3fbc2e: media: mc-entity.c: use WARN_ON, validate link pads]
    
    The dvbdev incorrectly requests a tuner sink pad to connect to a demux
    sink pad. The media controller failure percolates back and the dvb device
    creation fails. Fix this by requesting a tuner source pad. Instead of
    forcing that pad to be index zero, check if a negative integer error
    is returned. A note is added that first source pad found is chosen.
    
    Affected bridges cx231xx and em28xx printed the below warning[s]
    when a variety of media controller dvb enabled devices were connected.
    The warning returns an error causing all affected devices to fail DVB
    device creation.
    
    [  253.138332] ------------[ cut here ]------------
    [  253.138339] WARNING: CPU: 0 PID: 1550 at drivers/media/mc/mc-entity.c:669 media_create_pad_link+0x1e0/0x200 [mc]
    [  253.138339] Modules linked in: si2168 em28xx_dvb(+) em28xx si2157 lgdt3306a cx231xx_dvb dvb_core cx231xx_alsa cx25840 cx231xx tveeprom cx2341x i2c_mux videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc ir_rc5_decoder rc_hauppauge mceusb rc_core eda
    c_mce_amd kvm nls_iso8859_1 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper efi_pstore wmi_bmof k10temp asix usbnet mii nouveau snd_hda_codec_realtek snd_hda_codec_generic input_leds ledtrig_audio snd_hda_codec_hdmi mxm_wmi snd_hda_in
    tel video snd_intel_dspcfg ttm snd_hda_codec drm_kms_helper snd_hda_core drm snd_hwdep snd_seq_midi snd_seq_midi_event i2c_algo_bit snd_pcm snd_rawmidi fb_sys_fops snd_seq syscopyarea sysfillrect snd_seq_device sysimgblt snd_timer snd soundcore ccp mac_hid sch_fq_codel parport_p
    c ppdev lp parport ip_tables x_tables autofs4 vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio hid_generic usbhid hid i2c_piix4 ahci libahci wmi gpio_amdpt
    [  253.138370]  gpio_generic
    [  253.138372] CPU: 0 PID: 1550 Comm: modprobe Tainted: G        W         5.7.0-rc2+ #181
    [  253.138373] Hardware name: MSI MS-7A39/B350M GAMING PRO (MS-7A39), BIOS 2.G0 04/27/2018
    [  253.138376] RIP: 0010:media_create_pad_link+0x1e0/0x200 [mc]
    [  253.138378] Code: 26 fd ff ff 44 8b 4d d0 eb d9 0f 0b 41 b9 ea ff ff ff 44 89 c8 c3 0f 0b 41 b9 ea ff ff ff eb f2 0f 0b 41 b9 ea ff ff ff eb e8 <0f> 0b 41 b9 ea ff ff ff eb af 0f 0b 41 b9 ea ff ff ff eb a5 66 90
    [  253.138379] RSP: 0018:ffffb9ecc0ee7a78 EFLAGS: 00010246
    [  253.138380] RAX: ffff943f706c99d8 RBX: 0000000000000000 RCX: 0000000000000000
    [  253.138381] RDX: ffff943f613e0180 RSI: 0000000000000000 RDI: ffff943f706c9958
    [  253.138381] RBP: ffffb9ecc0ee7ab0 R08: 0000000000000001 R09: ffff943f613e0180
    [  253.138382] R10: ffff943f613e0180 R11: ffff943f706c9400 R12: 0000000000000000
    [  253.138383] R13: 0000000000000001 R14: ffff943f706c9958 R15: 0000000000000001
    [  253.138384] FS:  00007f3cd29ba540(0000) GS:ffff943f8ec00000(0000) knlGS:0000000000000000
    [  253.138385] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  253.138385] CR2: 000055f7de0ca830 CR3: 00000003dd208000 CR4: 00000000003406f0
    [  253.138386] Call Trace:
    [  253.138392]  media_create_pad_links+0x104/0x1b0 [mc]
    [  253.138397]  dvb_create_media_graph+0x350/0x5f0 [dvb_core]
    [  253.138402]  em28xx_dvb_init+0x5ea/0x2600 [em28xx_dvb]
    [  253.138408]  em28xx_register_extension+0x63/0xc0 [em28xx]
    [  253.138410]  ? 0xffffffffc039c000
    [  253.138412]  em28xx_dvb_register+0x15/0x1000 [em28xx_dvb]
    [  253.138416]  do_one_initcall+0x71/0x250
    [  253.138418]  ? do_init_module+0x27/0x22e
    [  253.138421]  ? _cond_resched+0x1a/0x50
    [  253.138423]  ? kmem_cache_alloc_trace+0x1ec/0x270
    [  253.138425]  ? __vunmap+0x1e3/0x240
    [  253.138427]  do_init_module+0x5f/0x22e
    [  253.138430]  load_module+0x2525/0x2d40
    [  253.138436]  __do_sys_finit_module+0xe5/0x120
    [  253.138438]  ? __do_sys_finit_module+0xe5/0x120
    [  253.138442]  __x64_sys_finit_module+0x1a/0x20
    [  253.138443]  do_syscall_64+0x57/0x1b0
    [  253.138445]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [  253.138446] RIP: 0033:0x7f3cd24dc839
    [  253.138448] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
    [  253.138449] RSP: 002b:00007ffe4fc514d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
    [  253.138450] RAX: ffffffffffffffda RBX: 000055a9237f63f0 RCX: 00007f3cd24dc839
    [  253.138451] RDX: 0000000000000000 RSI: 000055a922c3ad2e RDI: 0000000000000000
    [  253.138451] RBP: 000055a922c3ad2e R08: 0000000000000000 R09: 0000000000000000
    [  253.138452] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
    [  253.138453] R13: 000055a9237f5550 R14: 0000000000040000 R15: 000055a9237f63f0
    [  253.138456] ---[ end trace a60f19c54aa96ec4 ]---
    
    [  234.915628] ------------[ cut here ]------------
    [  234.915640] WARNING: CPU: 0 PID: 1502 at drivers/media/mc/mc-entity.c:669 media_create_pad_link+0x1e0/0x200 [mc]
    [  234.915641] Modules linked in: si2157 lgdt3306a cx231xx_dvb(+) dvb_core cx231xx_alsa cx25840 cx231xx tveeprom cx2341x i2c_mux videobuf2_vmalloc videobuf2_memops videobuf2_v4l2 videobuf2_common videodev mc ir_rc5_decoder rc_hauppauge mceusb rc_core edac_mce_amd kvm nls_iso8859
    _1 crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper efi_pstore wmi_bmof k10temp asix usbnet mii nouveau snd_hda_codec_realtek snd_hda_codec_generic input_leds ledtrig_audio snd_hda_codec_hdmi mxm_wmi snd_hda_intel video snd_intel_dspcf
    g ttm snd_hda_codec drm_kms_helper snd_hda_core drm snd_hwdep snd_seq_midi snd_seq_midi_event i2c_algo_bit snd_pcm snd_rawmidi fb_sys_fops snd_seq syscopyarea sysfillrect snd_seq_device sysimgblt snd_timer snd soundcore ccp mac_hid sch_fq_codel parport_pc ppdev lp parport ip_tab
    les x_tables autofs4 vfio_pci irqbypass vfio_virqfd vfio_iommu_type1 vfio hid_generic usbhid hid i2c_piix4 ahci libahci wmi gpio_amdpt gpio_generic
    [  234.915700] CPU: 0 PID: 1502 Comm: modprobe Not tainted 5.7.0-rc2+ #181
    [  234.915702] Hardware name: MSI MS-7A39/B350M GAMING PRO (MS-7A39), BIOS 2.G0 04/27/2018
    [  234.915709] RIP: 0010:media_create_pad_link+0x1e0/0x200 [mc]
    [  234.915712] Code: 26 fd ff ff 44 8b 4d d0 eb d9 0f 0b 41 b9 ea ff ff ff 44 89 c8 c3 0f 0b 41 b9 ea ff ff ff eb f2 0f 0b 41 b9 ea ff ff ff eb e8 <0f> 0b 41 b9 ea ff ff ff eb af 0f 0b 41 b9 ea ff ff ff eb a5 66 90
    [  234.915714] RSP: 0018:ffffb9ecc1b6fa50 EFLAGS: 00010246
    [  234.915717] RAX: ffff943f8c94a9d8 RBX: 0000000000000000 RCX: 0000000000000000
    [  234.915719] RDX: ffff943f613e0900 RSI: 0000000000000000 RDI: ffff943f8c94a958
    [  234.915721] RBP: ffffb9ecc1b6fa88 R08: 0000000000000001 R09: ffff943f613e0900
    [  234.915723] R10: ffff943f613e0900 R11: ffff943f6b590c00 R12: 0000000000000000
    [  234.915724] R13: 0000000000000001 R14: ffff943f8c94a958 R15: 0000000000000001
    [  234.915727] FS:  00007f4ca3646540(0000) GS:ffff943f8ec00000(0000) knlGS:0000000000000000
    [  234.915729] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [  234.915731] CR2: 00007fff7a53ba18 CR3: 00000003da614000 CR4: 00000000003406f0
    [  234.915733] Call Trace:
    [  234.915745]  media_create_pad_links+0x104/0x1b0 [mc]
    [  234.915756]  dvb_create_media_graph+0x350/0x5f0 [dvb_core]
    [  234.915766]  dvb_init.part.4+0x691/0x1360 [cx231xx_dvb]
    [  234.915780]  dvb_init+0x1a/0x20 [cx231xx_dvb]
    [  234.915787]  cx231xx_register_extension+0x71/0xa0 [cx231xx]
    [  234.915791]  ? 0xffffffffc042f000
    [  234.915796]  cx231xx_dvb_register+0x15/0x1000 [cx231xx_dvb]
    [  234.915802]  do_one_initcall+0x71/0x250
    [  234.915807]  ? do_init_module+0x27/0x22e
    [  234.915811]  ? _cond_resched+0x1a/0x50
    [  234.915816]  ? kmem_cache_alloc_trace+0x1ec/0x270
    [  234.915820]  ? __vunmap+0x1e3/0x240
    [  234.915826]  do_init_module+0x5f/0x22e
    [  234.915831]  load_module+0x2525/0x2d40
    [  234.915848]  __do_sys_finit_module+0xe5/0x120
    [  234.915850]  ? __do_sys_finit_module+0xe5/0x120
    [  234.915862]  __x64_sys_finit_module+0x1a/0x20
    [  234.915865]  do_syscall_64+0x57/0x1b0
    [  234.915870]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
    [  234.915872] RIP: 0033:0x7f4ca3168839
    [  234.915876] Code: 00 f3 c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 1f f6 2c 00 f7 d8 64 89 01 48
    [  234.915878] RSP: 002b:00007ffcea3db3b8 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
    [  234.915881] RAX: ffffffffffffffda RBX: 000055af22c29340 RCX: 00007f4ca3168839
    [  234.915882] RDX: 0000000000000000 RSI: 000055af22c38390 RDI: 0000000000000001
    [  234.915884] RBP: 000055af22c38390 R08: 0000000000000000 R09: 0000000000000000
    [  234.915885] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
    [  234.915887] R13: 000055af22c29060 R14: 0000000000040000 R15: 0000000000000000
    [  234.915896] ---[ end trace a60f19c54aa96ec3 ]---
    
    Signed-off-by: Brad Love <brad@nextdimension.cc>
    Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
    Signed-off-by: Sean Young <sean@mess.org>
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    b-rad-NDi authored and mchehab committed May 25, 2020
    Configuration menu
    Copy the full SHA
    9f984ca View commit details
    Browse the repository at this point in the history
  2. media: dvb-usb: Add Cinergy S2 PCIe Dual Port support

    Terratec Cinergy S2 PCIe Dual is a PCIe device with two tuners that
    actually contains two USB devices. The devices are visible in the
    lsusb printout.
    
    Bus 004 Device 002: ID 153b:1182 TerraTec Electronic GmbH Cinergy S2 PCIe Dual Port 2
    Bus 003 Device 002: ID 153b:1181 TerraTec Electronic GmbH Cinergy S2 PCIe Dual Port 1
    
    The devices use the Montage M88DS3000/M88TS2022 demod/tuner.
    
    Signed-off-by: Dirk Nehring <dnehring@gmx.net>
    Signed-off-by: Sean Young <sean@mess.org>
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    dnehring7 authored and mchehab committed May 25, 2020
    Configuration menu
    Copy the full SHA
    528b1a1 View commit details
    Browse the repository at this point in the history
  3. media: dt-bindings: ov8856: Document YAML bindings

    This patch adds documentation of device tree in YAML schema for the
    OV8856 CMOS image sensor.
    
    Signed-off-by: Dongchun Zhu <dongchun.zhu@mediatek.com>
    Signed-off-by: Robert Foss <robert.foss@linaro.org>
    Reviewed-by: Maxime Ripard <mripard@kernel.org>
    Reviewed-by: Rob Herring <robh@kernel.org>
    Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    Dongchun Zhu authored and mchehab committed May 25, 2020
    Configuration menu
    Copy the full SHA
    932300e View commit details
    Browse the repository at this point in the history
  4. media: ov8856: Add devicetree support

    Add match table, enable ov8856_probe() to support
    both ACPI and DT modes.
    
    ACPI and DT modes are primarily distinguished from
    by checking for ACPI mode and by having resource like
    be NULL.
    
    Signed-off-by: Robert Foss <robert.foss@linaro.org>
    Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    robertfoss authored and mchehab committed May 25, 2020
    Configuration menu
    Copy the full SHA
    0c2c7a1 View commit details
    Browse the repository at this point in the history
  5. media: ov8856: Implement sensor module revision identification

    Query the sensor for its module revision, and compare it
    to known revisions.
    
    Currently 2A and 1B revision indentification is supported.
    
    [Sakari Ailus: Wrap a line over 80, alignment, use %u for printing u32]
    
    Signed-off-by: Robert Foss <robert.foss@linaro.org>
    Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    robertfoss authored and mchehab committed May 25, 2020
    Configuration menu
    Copy the full SHA
    96b5b11 View commit details
    Browse the repository at this point in the history
  6. media: i2c: Add ov2740 image sensor driver

    OminiVision ov2740 is a 2 megapixels RAW RGB image sensor which can
    deliver 1920x1080@60fps frames. This driver add the support of
    vertical blanking, exposure, test pattern, digital and analog gain
    control for sensor.
    
    Signed-off-by: Bingbu Cao <bingbu.cao@intel.com>
    Signed-off-by: Shawn Tu <shawnx.tu@intel.com>
    Signed-off-by: Qiu, Tianshu <tian.shu.qiu@intel.com>
    Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    bingbucao authored and mchehab committed May 25, 2020
    Configuration menu
    Copy the full SHA
    866edc8 View commit details
    Browse the repository at this point in the history
  7. media: i2c: imx219: Drop <linux/clk-provider.h> and <linux/clkdev.h>

    The IMX219 camera driver is not a clock provider, but merely a clock
    consumer, and thus does not need to include <linux/clk-provider.h> and
    <linux/clkdev.h>.
    
    Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
    Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    geertu authored and mchehab committed May 25, 2020
    Configuration menu
    Copy the full SHA
    cd25993 View commit details
    Browse the repository at this point in the history
  8. media: s5k5baf: Replace zero-length array with flexible-array

    The current codebase makes use of the zero-length array language
    extension to the C90 standard, but the preferred mechanism to declare
    variable-length types such as these ones is a flexible array member[1][2],
    introduced in C99:
    
    struct foo {
            int stuff;
            struct boo array[];
    };
    
    By making use of the mechanism above, we will get a compiler warning
    in case the flexible array does not occur last in the structure, which
    will help us prevent some kind of undefined behavior bugs from being
    inadvertently introduced[3] to the codebase from now on.
    
    Also, notice that, dynamic memory allocations won't be affected by
    this change:
    
    "Flexible array members have incomplete type, and so the sizeof operator
    may not be applied. As a quirk of the original implementation of
    zero-length arrays, sizeof evaluates to zero."[1]
    
    sizeof(flexible-array-member) triggers a warning because flexible array
    members have incomplete type[1]. There are some instances of code in
    which the sizeof operator is being incorrectly/erroneously applied to
    zero-length arrays and the result is zero. Such instances may be hiding
    some bugs. So, this work (flexible-array member conversions) will also
    help to get completely rid of those sorts of issues.
    
    This issue was found with the help of Coccinelle.
    
    [1] https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
    [2] KSPP#21
    [3] commit 7649773 ("cxgb3/l2t: Fix undefined behaviour")
    
    Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
    Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    GustavoARSilva authored and mchehab committed May 25, 2020
    Configuration menu
    Copy the full SHA
    142d064 View commit details
    Browse the repository at this point in the history
  9. media: Documentation: media: Refer to mbus format documentation from …

    …CSI-2 docs
    
    The media bus formats to be used on serial busses are documented but there
    was no reference from CSI-2 documentation. Add that now.
    
    Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
    Acked-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
    Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
    Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
    Sakari Ailus authored and mchehab committed May 25, 2020
    Configuration menu
    Copy the full SHA
    938b29d View commit details
    Browse the repository at this point in the history

Commits on Jun 4, 2020

  1. mm: thp: make the THP mapcount atomic against __split_huge_pmd_locked()

    Write protect anon page faults require an accurate mapcount to decide
    if to break the COW or not. This is implemented in the THP path with
    reuse_swap_page() ->
    page_trans_huge_map_swapcount()/page_trans_huge_mapcount().
    
    If the COW triggers while the other processes sharing the page are
    under a huge pmd split, to do an accurate reading, we must ensure the
    mapcount isn't computed while it's being transferred from the head
    page to the tail pages.
    
    reuse_swap_cache() already runs serialized by the page lock, so it's
    enough to add the page lock around __split_huge_pmd_locked too, in
    order to add the missing serialization.
    
    Note: the commit in "Fixes" is just to facilitate the backporting,
    because the code before such commit didn't try to do an accurate THP
    mapcount calculation and it instead used the page_count() to decide if
    to COW or not. Both the page_count and the pin_count are THP-wide
    refcounts, so they're inaccurate if used in
    reuse_swap_page(). Reverting such commit (besides the unrelated fix to
    the local anon_vma assignment) would have also opened the window for
    memory corruption side effects to certain workloads as documented in
    such commit header.
    
    Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
    Suggested-by: Jann Horn <jannh@google.com>
    Reported-by: Jann Horn <jannh@google.com>
    Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Fixes: 6d0a07e ("mm: thp: calculate the mapcount correctly for THP pages during WP faults")
    Cc: stable@vger.kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    aagit authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    c444eb5 View commit details
    Browse the repository at this point in the history
  2. mm/slub: fix a memory leak in sysfs_slab_add()

    syzkaller reports for memory leak when kobject_init_and_add() returns an
    error in the function sysfs_slab_add() [1]
    
    When this happened, the function kobject_put() is not called for the
    corresponding kobject, which potentially leads to memory leak.
    
    This patch fixes the issue by calling kobject_put() even if
    kobject_init_and_add() fails.
    
    [1]
      BUG: memory leak
      unreferenced object 0xffff8880a6d4be88 (size 8):
      comm "syz-executor.3", pid 946, jiffies 4295772514 (age 18.396s)
      hex dump (first 8 bytes):
        70 69 64 5f 33 00 ff ff                          pid_3...
      backtrace:
         kstrdup+0x35/0x70 mm/util.c:60
         kstrdup_const+0x3d/0x50 mm/util.c:82
         kvasprintf_const+0x112/0x170 lib/kasprintf.c:48
         kobject_set_name_vargs+0x55/0x130 lib/kobject.c:289
         kobject_add_varg lib/kobject.c:384 [inline]
         kobject_init_and_add+0xd8/0x170 lib/kobject.c:473
         sysfs_slab_add+0x1d8/0x290 mm/slub.c:5811
         __kmem_cache_create+0x50a/0x570 mm/slub.c:4384
         create_cache+0x113/0x1e0 mm/slab_common.c:407
         kmem_cache_create_usercopy+0x1a1/0x260 mm/slab_common.c:505
         kmem_cache_create+0xd/0x10 mm/slab_common.c:564
         create_pid_cachep kernel/pid_namespace.c:54 [inline]
         create_pid_namespace kernel/pid_namespace.c:96 [inline]
         copy_pid_ns+0x77c/0x8f0 kernel/pid_namespace.c:148
         create_new_namespaces+0x26b/0xa30 kernel/nsproxy.c:95
         unshare_nsproxy_namespaces+0xa7/0x1e0 kernel/nsproxy.c:229
         ksys_unshare+0x3d2/0x770 kernel/fork.c:2969
         __do_sys_unshare kernel/fork.c:3037 [inline]
         __se_sys_unshare kernel/fork.c:3035 [inline]
         __x64_sys_unshare+0x2d/0x40 kernel/fork.c:3035
         do_syscall_64+0xa1/0x530 arch/x86/entry/common.c:295
    
    Fixes: 80da026 ("mm/slub: fix slab double-free in case of duplicate sysfs filename")
    Reported-by: Hulk Robot <hulkci@huawei.com>
    Signed-off-by: Wang Hai <wanghai38@huawei.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Christoph Lameter <cl@linux.com>
    Cc: Pekka Enberg <penberg@kernel.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Link: http://lkml.kernel.org/r/20200602115033.1054-1-wanghai38@huawei.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Wang Hai authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    dde3c6b View commit details
    Browse the repository at this point in the history
  3. mm/memcg: optimize memory.numa_stat like memory.stat

    Currently reading memory.numa_stat traverses the underlying memcg tree
    multiple times to accumulate the stats to present the hierarchical view of
    the memcg tree.  However the kernel already maintains the hierarchical
    view of the stats and use it in memory.stat.  Just use the same mechanism
    in memory.numa_stat as well.
    
    I ran a simple benchmark which reads root_mem_cgroup's memory.numa_stat
    file in the presense of 10000 memcgs.  The results are:
    
    Without the patch:
    $ time cat /dev/cgroup/memory/memory.numa_stat > /dev/null
    
    real    0m0.700s
    user    0m0.001s
    sys     0m0.697s
    
    With the patch:
    $ time cat /dev/cgroup/memory/memory.numa_stat > /dev/null
    
    real    0m0.001s
    user    0m0.001s
    sys     0m0.000s
    
    [akpm@linux-foundation.org: avoid forcing out-of-line code generation]
    Signed-off-by: Shakeel Butt <shakeelb@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Link: http://lkml.kernel.org/r/20200304022058.248270-1-shakeelb@google.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    shakeelb authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    dd8657b View commit details
    Browse the repository at this point in the history
  4. mm/gup: move __get_user_pages_fast() down a few lines in gup.c

    Patch series "mm/gup, drm/i915: refactor gup_fast, convert to pin_user_pages()", v2.
    
    In order to convert the drm/i915 driver from get_user_pages() to
    pin_user_pages(), a FOLL_PIN equivalent of __get_user_pages_fast() was
    required.  That led to refactoring __get_user_pages_fast(), with the
    following goals:
    
    1) As above: provide a pin_user_pages*() routine for drm/i915 to call,
       in place of __get_user_pages_fast(),
    
    2) Get rid of the gup.c duplicate code for walking page tables with
       interrupts disabled. This duplicate code is a minor maintenance
       problem anyway.
    
    3) Make it easy for an upcoming patch from Souptick, which aims to
       convert __get_user_pages_fast() to use a gup_flags argument, instead
       of a bool writeable arg.  Also, if this series looks good, we can
       ask Souptick to change the name as well, to whatever the consensus
       is. My initial recommendation is: get_user_pages_fast_only(), to
       match the new pin_user_pages_only().
    
    This patch (of 4):
    
    This is in order to avoid a forward declaration of
    internal_get_user_pages_fast(), in the next patch.
    
    This is code movement only--all generated code should be identical.
    
    Signed-off-by: John Hubbard <jhubbard@nvidia.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Daniel Vetter <daniel@ffwll.ch>
    Cc: David Airlie <airlied@linux.ie>
    Cc: Jani Nikula <jani.nikula@linux.intel.com>
    Cc: "Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>
    Cc: Matthew Auld <matthew.auld@intel.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Cc: Souptick Joarder <jrdr.linux@gmail.com>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: http://lkml.kernel.org/r/20200522051931.54191-1-jhubbard@nvidia.com
    Link: http://lkml.kernel.org/r/20200519002124.2025955-1-jhubbard@nvidia.com
    Link: http://lkml.kernel.org/r/20200519002124.2025955-2-jhubbard@nvidia.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    johnhubbard authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    9e1f058 View commit details
    Browse the repository at this point in the history
  5. mm/gup: refactor and de-duplicate gup_fast() code

    There were two nearly identical sets of code for gup_fast() style of
    walking the page tables with interrupts disabled.  This has lead to the
    usual maintenance problems that arise from having duplicated code.
    
    There is already a core internal routine in gup.c for gup_fast(), so just
    enhance it very slightly: allow skipping the fall-back to "slow" (regular)
    get_user_pages(), via the new FOLL_FAST_ONLY flag.  Then, just call
    internal_get_user_pages_fast() from __get_user_pages_fast(), and adjust
    the API to match pre-existing API behavior.
    
    There is a change in behavior from this refactoring: the nested form of
    interrupt disabling is used in all gup_fast() variants now.  That's
    because there is only one place that interrupt disabling for page walking
    is done, and so the safer form is required.  This should, if anything,
    eliminate possible (rare) bugs, because the non-nested form of enabling
    interrupts was fragile at best.
    
    [jhubbard@nvidia.com: fixup]
      Link: http://lkml.kernel.org/r/20200521233841.1279742-1-jhubbard@nvidia.com
    Signed-off-by: John Hubbard <jhubbard@nvidia.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Daniel Vetter <daniel@ffwll.ch>
    Cc: David Airlie <airlied@linux.ie>
    Cc: Jani Nikula <jani.nikula@linux.intel.com>
    Cc: "Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>
    Cc: Matthew Auld <matthew.auld@intel.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Cc: Souptick Joarder <jrdr.linux@gmail.com>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: http://lkml.kernel.org/r/20200519002124.2025955-3-jhubbard@nvidia.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    johnhubbard authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    376a34e View commit details
    Browse the repository at this point in the history
  6. mm/gup: introduce pin_user_pages_fast_only()

    This is the FOLL_PIN equivalent of __get_user_pages_fast(), except with a
    more descriptive name, and gup_flags instead of a boolean "write" in the
    argument list.
    
    Signed-off-by: John Hubbard <jhubbard@nvidia.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Daniel Vetter <daniel@ffwll.ch>
    Cc: David Airlie <airlied@linux.ie>
    Cc: Jani Nikula <jani.nikula@linux.intel.com>
    Cc: "Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>
    Cc: Matthew Auld <matthew.auld@intel.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Cc: Souptick Joarder <jrdr.linux@gmail.com>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Link: http://lkml.kernel.org/r/20200519002124.2025955-4-jhubbard@nvidia.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    johnhubbard authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    104acc3 View commit details
    Browse the repository at this point in the history
  7. drm/i915: convert get_user_pages() --> pin_user_pages()

    This code was using get_user_pages*(), in a "Case 2" scenario (DMA/RDMA),
    using the categorization from [1].  That means that it's time to convert
    the get_user_pages*() + put_page() calls to pin_user_pages*() +
    unpin_user_pages() calls.
    
    There is some helpful background in [2]: basically, this is a small part
    of fixing a long-standing disconnect between pinning pages, and file
    systems' use of those pages.
    
    [1] Documentation/core-api/pin_user_pages.rst
    
    [2] "Explicit pinning of user-space pages":
        https://lwn.net/Articles/807108/
    
    Signed-off-by: John Hubbard <jhubbard@nvidia.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
    Cc: Souptick Joarder <jrdr.linux@gmail.com>
    Cc: Matthew Wilcox <willy@infradead.org>
    Cc: Jani Nikula <jani.nikula@linux.intel.com>
    Cc: "Joonas Lahtinen" <joonas.lahtinen@linux.intel.com>
    Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
    Cc: David Airlie <airlied@linux.ie>
    Cc: Daniel Vetter <daniel@ffwll.ch>
    Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
    Cc: Matthew Auld <matthew.auld@intel.com>
    Link: http://lkml.kernel.org/r/20200519002124.2025955-5-jhubbard@nvidia.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    johnhubbard authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    2170ecf View commit details
    Browse the repository at this point in the history
  8. mm/gup: might_lock_read(mmap_sem) in get_user_pages_fast()

    Instead of scattering these assertions across the drivers, do this
    assertion inside the core of get_user_pages_fast*() functions.  That also
    includes pin_user_pages_fast*() routines.
    
    Add a might_lock_read(mmap_sem) call to internal_get_user_pages_fast().
    
    Suggested-by: Matthew Wilcox <willy@infradead.org>
    Signed-off-by: John Hubbard <jhubbard@nvidia.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Matthew Wilcox <willy@infradead.org>
    Cc: Michel Lespinasse <walken@google.com>
    Cc: Jason Gunthorpe <jgg@ziepe.ca>
    Link: http://lkml.kernel.org/r/20200522010443.1290485-1-jhubbard@nvidia.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    johnhubbard authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    f81cd17 View commit details
    Browse the repository at this point in the history
  9. kasan: stop tests being eliminated as dead code with FORTIFY_SOURCE

    Patch series "Fix some incompatibilites between KASAN and FORTIFY_SOURCE", v4.
    
    3 KASAN self-tests fail on a kernel with both KASAN and FORTIFY_SOURCE:
    memchr, memcmp and strlen.
    
    When FORTIFY_SOURCE is on, a number of functions are replaced with
    fortified versions, which attempt to check the sizes of the operands.
    However, these functions often directly invoke __builtin_foo() once they
    have performed the fortify check.  The compiler can detect that the
    results of these functions are not used, and knows that they have no other
    side effects, and so can eliminate them as dead code.
    
    Why are only memchr, memcmp and strlen affected?
    ================================================
    
    Of string and string-like functions, kasan_test tests:
    
     * strchr  ->  not affected, no fortified version
     * strrchr ->  likewise
     * strcmp  ->  likewise
     * strncmp ->  likewise
    
     * strnlen ->  not affected, the fortify source implementation calls the
                   underlying strnlen implementation which is instrumented, not
                   a builtin
    
     * strlen  ->  affected, the fortify souce implementation calls a __builtin
                   version which the compiler can determine is dead.
    
     * memchr  ->  likewise
     * memcmp  ->  likewise
    
     * memset ->   not affected, the compiler knows that memset writes to its
    	       first argument and therefore is not dead.
    
    Why does this not affect the functions normally?
    ================================================
    
    In string.h, these functions are not marked as __pure, so the compiler
    cannot know that they do not have side effects.  If relevant functions are
    marked as __pure in string.h, we see the following warnings and the
    functions are elided:
    
    lib/test_kasan.c: In function `kasan_memchr':
    lib/test_kasan.c:606:2: warning: statement with no effect [-Wunused-value]
      memchr(ptr, '1', size + 1);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~
    lib/test_kasan.c: In function `kasan_memcmp':
    lib/test_kasan.c:622:2: warning: statement with no effect [-Wunused-value]
      memcmp(ptr, arr, size+1);
      ^~~~~~~~~~~~~~~~~~~~~~~~
    lib/test_kasan.c: In function `kasan_strings':
    lib/test_kasan.c:645:2: warning: statement with no effect [-Wunused-value]
      strchr(ptr, '1');
      ^~~~~~~~~~~~~~~~
    ...
    
    This annotation would make sense to add and could be added at any point,
    so the behaviour of test_kasan.c should change.
    
    The fix
    =======
    
    Make all the functions that are pure write their results to a global,
    which makes them live.  The strlen and memchr tests now pass.
    
    The memcmp test still fails to trigger, which is addressed in the next
    patch.
    
    [dja@axtens.net: drop patch 3]
      Link: http://lkml.kernel.org/r/20200424145521.8203-2-dja@axtens.net
    Fixes: 0c96350 ("lib/test_kasan.c: add tests for several string/memory API functions")
    Signed-off-by: Daniel Axtens <dja@axtens.net>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: David Gow <davidgow@google.com>
    Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
    Cc: Daniel Micay <danielmicay@gmail.com>
    Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
    Cc: Alexander Potapenko <glider@google.com>
    Link: http://lkml.kernel.org/r/20200423154503.5103-1-dja@axtens.net
    Link: http://lkml.kernel.org/r/20200423154503.5103-2-dja@axtens.net
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    daxtens authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    adb72ae View commit details
    Browse the repository at this point in the history
  10. string.h: fix incompatibility between FORTIFY_SOURCE and KASAN

    The memcmp KASAN self-test fails on a kernel with both KASAN and
    FORTIFY_SOURCE.
    
    When FORTIFY_SOURCE is on, a number of functions are replaced with
    fortified versions, which attempt to check the sizes of the operands.
    However, these functions often directly invoke __builtin_foo() once they
    have performed the fortify check.  Using __builtins may bypass KASAN
    checks if the compiler decides to inline it's own implementation as
    sequence of instructions, rather than emit a function call that goes out
    to a KASAN-instrumented implementation.
    
    Why is only memcmp affected?
    ============================
    
    Of the string and string-like functions that kasan_test tests, only memcmp
    is replaced by an inline sequence of instructions in my testing on x86
    with gcc version 9.2.1 20191008 (Ubuntu 9.2.1-9ubuntu2).
    
    I believe this is due to compiler heuristics.  For example, if I annotate
    kmalloc calls with the alloc_size annotation (and disable some fortify
    compile-time checking!), the compiler will replace every memset except the
    one in kmalloc_uaf_memset with inline instructions.  (I have some WIP
    patches to add this annotation.)
    
    Does this affect other functions in string.h?
    =============================================
    
    Yes. Anything that uses __builtin_* rather than __real_* could be
    affected. This looks like:
    
     - strncpy
     - strcat
     - strlen
     - strlcpy maybe, under some circumstances?
     - strncat under some circumstances
     - memset
     - memcpy
     - memmove
     - memcmp (as noted)
     - memchr
     - strcpy
    
    Whether a function call is emitted always depends on the compiler.  Most
    bugs should get caught by FORTIFY_SOURCE, but the missed memcmp test shows
    that this is not always the case.
    
    Isn't FORTIFY_SOURCE disabled with KASAN?
    ========================================-
    
    The string headers on all arches supporting KASAN disable fortify with
    kasan, but only when address sanitisation is _also_ disabled.  For example
    from x86:
    
     #if defined(CONFIG_KASAN) && !defined(__SANITIZE_ADDRESS__)
     /*
      * For files that are not instrumented (e.g. mm/slub.c) we
      * should use not instrumented version of mem* functions.
      */
     #define memcpy(dst, src, len) __memcpy(dst, src, len)
     #define memmove(dst, src, len) __memmove(dst, src, len)
     #define memset(s, c, n) __memset(s, c, n)
    
     #ifndef __NO_FORTIFY
     #define __NO_FORTIFY /* FORTIFY_SOURCE uses __builtin_memcpy, etc. */
     #endif
    
     #endif
    
    This comes from commit 6974f0c ("include/linux/string.h: add the
    option of fortified string.h functions"), and doesn't work when KASAN is
    enabled and the file is supposed to be sanitised - as with test_kasan.c
    
    I'm pretty sure this is not wrong, but not as expansive it should be:
    
     * we shouldn't use __builtin_memcpy etc in files where we don't have
       instrumentation - it could devolve into a function call to memcpy,
       which will be instrumented. Rather, we should use __memcpy which
       by convention is not instrumented.
    
     * we also shouldn't be using __builtin_memcpy when we have a KASAN
       instrumented file, because it could be replaced with inline asm
       that will not be instrumented.
    
    What is correct behaviour?
    ==========================
    
    Firstly, there is some overlap between fortification and KASAN: both
    provide some level of _runtime_ checking. Only fortify provides
    compile-time checking.
    
    KASAN and fortify can pick up different things at runtime:
    
     - Some fortify functions, notably the string functions, could easily be
       modified to consider sub-object sizes (e.g. members within a struct),
       and I have some WIP patches to do this. KASAN cannot detect these
       because it cannot insert poision between members of a struct.
    
     - KASAN can detect many over-reads/over-writes when the sizes of both
       operands are unknown, which fortify cannot.
    
    So there are a couple of options:
    
     1) Flip the test: disable fortify in santised files and enable it in
        unsanitised files. This at least stops us missing KASAN checking, but
        we lose the fortify checking.
    
     2) Make the fortify code always call out to real versions. Do this only
        for KASAN, for fear of losing the inlining opportunities we get from
        __builtin_*.
    
    (We can't use kasan_check_{read,write}: because the fortify functions are
    _extern inline_, you can't include _static_ inline functions without a
    compiler warning. kasan_check_{read,write} are static inline so we can't
    use them even when they would otherwise be suitable.)
    
    Take approach 2 and call out to real versions when KASAN is enabled.
    
    Use __underlying_foo to distinguish from __real_foo: __real_foo always
    refers to the kernel's implementation of foo, __underlying_foo could be
    either the kernel implementation or the __builtin_foo implementation.
    
    This is sometimes enough to make the memcmp test succeed with
    FORTIFY_SOURCE enabled. It is at least enough to get the function call
    into the module. One more fix is needed to make it reliable: see the next
    patch.
    
    Fixes: 6974f0c ("include/linux/string.h: add the option of fortified string.h functions")
    Signed-off-by: Daniel Axtens <dja@axtens.net>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: David Gow <davidgow@google.com>
    Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
    Cc: Daniel Micay <danielmicay@gmail.com>
    Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
    Cc: Alexander Potapenko <glider@google.com>
    Link: http://lkml.kernel.org/r/20200423154503.5103-3-dja@axtens.net
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    daxtens authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    47227d2 View commit details
    Browse the repository at this point in the history
  11. mm: clarify __GFP_MEMALLOC usage

    It seems that the existing documentation is not explicit about the
    expected usage and potential risks enough.  While it is calls out that
    users have to free memory when using this flag it is not really apparent
    that users have to careful to not deplete memory reserves and that they
    should implement some sort of throttling wrt.  freeing process.
    
    This is partly based on Neil's explanation [1].
    
    Let's also call out that a pre allocated pool allocator should be
    considered.
    
    [1] http://lkml.kernel.org/r/877dz0yxoa.fsf@notabene.neil.brown.name
    
    [akpm@linux-foundation.org: coding style fixes]
    [mhocko@kernel.org: update]
      Link: http://lkml.kernel.org/r/20200406070137.GC19426@dhcp22.suse.cz
    Signed-off-by: Michal Hocko <mhocko@suse.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Neil Brown <neilb@suse.de>
    Cc: Paul E. McKenney <paulmck@kernel.org>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Link: http://lkml.kernel.org/r/20200403083543.11552-2-mhocko@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Michal Hocko authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    574c1ae View commit details
    Browse the repository at this point in the history
  12. mm: memblock: replace dereferences of memblock_region.nid with API calls

    Patch series "mm: rework free_area_init*() funcitons".
    
    After the discussion [1] about removal of CONFIG_NODES_SPAN_OTHER_NODES
    and CONFIG_HAVE_MEMBLOCK_NODE_MAP options, I took it a bit further and
    updated the node/zone initialization.
    
    Since all architectures have memblock, it is possible to use only the
    newer version of free_area_init_node() that calculates the zone and node
    boundaries based on memblock node mapping and architectural limits on
    possible zone PFNs.
    
    The architectures that still determined zone and hole sizes can be
    switched to the generic code and the old code that took those zone and
    hole sizes can be simply removed.
    
    And, since it all started from the removal of
    CONFIG_NODES_SPAN_OTHER_NODES, the memmap_init() is now updated to iterate
    over memblocks and so it does not need to perform early_pfn_to_nid() query
    for every PFN.
    
    [1] https://lore.kernel.org/lkml/1585420282-25630-1-git-send-email-Hoan@os.amperecomputing.com
    
    This patch (of 21):
    
    There are several places in the code that directly dereference
    memblock_region.nid despite this field being defined only when
    CONFIG_HAVE_MEMBLOCK_NODE_MAP=y.
    
    Replace these with calls to memblock_get_region_nid() to improve code
    robustness and to avoid possible breakage when
    CONFIG_HAVE_MEMBLOCK_NODE_MAP will be removed.
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Reviewed-by: Baoquan He <bhe@redhat.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Mike Rapoport <rppt@kernel.org>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/20200412194859.12663-1-rppt@kernel.org
    Link: http://lkml.kernel.org/r/20200412194859.12663-2-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    d622abf View commit details
    Browse the repository at this point in the history
  13. mm: make early_pfn_to_nid() and related defintions close to each other

    early_pfn_to_nid() and its helper __early_pfn_to_nid() are spread around
    include/linux/mm.h, include/linux/mmzone.h and mm/page_alloc.c.
    
    Drop unused stub for __early_pfn_to_nid() and move its actual generic
    implementation close to its users.
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Reviewed-by: Baoquan He <bhe@redhat.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/20200412194859.12663-3-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    6f24fbd View commit details
    Browse the repository at this point in the history
  14. mm: remove CONFIG_HAVE_MEMBLOCK_NODE_MAP option

    CONFIG_HAVE_MEMBLOCK_NODE_MAP is used to differentiate initialization of
    nodes and zones structures between the systems that have region to node
    mapping in memblock and those that don't.
    
    Currently all the NUMA architectures enable this option and for the
    non-NUMA systems we can presume that all the memory belongs to node 0 and
    therefore the compile time configuration option is not required.
    
    The remaining few architectures that use DISCONTIGMEM without NUMA are
    easily updated to use memblock_add_node() instead of memblock_add() and
    thus have proper correspondence of memblock regions to NUMA nodes.
    
    Still, free_area_init_node() must have a backward compatible version
    because its semantics with and without CONFIG_HAVE_MEMBLOCK_NODE_MAP is
    different.  Once all the architectures will use the new semantics, the
    entire compatibility layer can be dropped.
    
    To avoid addition of extra run time memory to store node id for
    architectures that keep memblock but have only a single node, the node id
    field of the memblock_region is guarded by CONFIG_NEED_MULTIPLE_NODES and
    the corresponding accessors presume that in those cases it is always 0.
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>	[arm64]
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/20200412194859.12663-4-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    3f08a30 View commit details
    Browse the repository at this point in the history
  15. mm: free_area_init: use maximal zone PFNs rather than zone sizes

    Currently, architectures that use free_area_init() to initialize memory
    map and node and zone structures need to calculate zone and hole sizes.
    We can use free_area_init_nodes() instead and let it detect the zone
    boundaries while the architectures will only have to supply the possible
    limits for the zones.
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Reviewed-by: Baoquan He <bhe@redhat.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/20200412194859.12663-5-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    fa3354e View commit details
    Browse the repository at this point in the history
  16. mm: use free_area_init() instead of free_area_init_nodes()

    free_area_init() has effectively became a wrapper for
    free_area_init_nodes() and there is no point of keeping it.  Still
    free_area_init() name is shorter and more general as it does not imply
    necessity to initialize multiple nodes.
    
    Rename free_area_init_nodes() to free_area_init(), update the callers and
    drop old version of free_area_init().
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Reviewed-by: Baoquan He <bhe@redhat.com>
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/20200412194859.12663-6-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    9691a07 View commit details
    Browse the repository at this point in the history
  17. alpha: simplify detection of memory zone boundaries

    free_area_init() only requires the definition of maximal PFN for each of
    the supported zone rater than calculation of actual zone sizes and the
    sizes of the holes between the zones.
    
    After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
    available to all architectures.
    
    Using this function instead of free_area_init_node() simplifies the zone
    detection.
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/20200412194859.12663-7-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    3076020 View commit details
    Browse the repository at this point in the history
  18. arm: simplify detection of memory zone boundaries

    free_area_init() only requires the definition of maximal PFN for each of
    the supported zone rater than calculation of actual zone sizes and the
    sizes of the holes between the zones.
    
    After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
    available to all architectures.
    
    Using this function instead of free_area_init_node() simplifies the zone
    detection.
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/20200412194859.12663-8-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    a32c1c6 View commit details
    Browse the repository at this point in the history
  19. arm64: simplify detection of memory zone boundaries for UMA configs

    The free_area_init() function only requires the definition of maximal PFN
    for each of the supported zone rater than calculation of actual zone sizes
    and the sizes of the holes between the zones.
    
    After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
    available to all architectures.
    
    Using this function instead of free_area_init_node() simplifies the zone
    detection.
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Acked-by: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/20200412194859.12663-9-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    584cb13 View commit details
    Browse the repository at this point in the history
  20. csky: simplify detection of memory zone boundaries

    The free_area_init() function only requires the definition of maximal PFN
    for each of the supported zone rater than calculation of actual zone sizes
    and the sizes of the holes between the zones.
    
    After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
    available to all architectures.
    
    Using this function instead of free_area_init_node() simplifies the zone
    detection.
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/20200412194859.12663-10-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    8f4693f View commit details
    Browse the repository at this point in the history
  21. m68k: mm: simplify detection of memory zone boundaries

    free_area_init() only requires the definition of maximal PFN for each of
    the supported zone rater than calculation of actual zone sizes and the
    sizes of the holes between the zones.
    
    After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
    available to all architectures.
    
    Using this function instead of free_area_init_node() simplifies the zone
    detection.
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/20200412194859.12663-11-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    5d2ee1a View commit details
    Browse the repository at this point in the history
  22. parisc: simplify detection of memory zone boundaries

    free_area_init() only requires the definition of maximal PFN for each of
    the supported zone rater than calculation of actual zone sizes and the
    sizes of the holes between the zones.
    
    After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
    available to all architectures.
    
    Using this function instead of free_area_init_node() simplifies the zone
    detection.
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/20200412194859.12663-12-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    625bf73 View commit details
    Browse the repository at this point in the history
  23. sparc32: simplify detection of memory zone boundaries

    free_area_init() only requires the definition of maximal PFN for each of
    the supported zone rater than calculation of actual zone sizes and the
    sizes of the holes between the zones.
    
    After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
    available to all architectures.
    
    Using this function instead of free_area_init_node() simplifies the zone
    detection.
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/20200412194859.12663-13-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    bee3b3c View commit details
    Browse the repository at this point in the history
  24. unicore32: simplify detection of memory zone boundaries

    free_area_init() only requires the definition of maximal PFN for each of
    the supported zone rater than calculation of actual zone sizes and the
    sizes of the holes between the zones.
    
    After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
    available to all architectures.
    
    Using this function instead of free_area_init_node() simplifies the zone
    detection.
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/20200412194859.12663-14-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    1b02ec0 View commit details
    Browse the repository at this point in the history
  25. xtensa: simplify detection of memory zone boundaries

    free_area_init() only requires the definition of maximal PFN for each of
    the supported zone rater than calculation of actual zone sizes and the
    sizes of the holes between the zones.
    
    After removal of CONFIG_HAVE_MEMBLOCK_NODE_MAP the free_area_init() is
    available to all architectures.
    
    Using this function instead of free_area_init_node() simplifies the zone
    detection.
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/20200412194859.12663-15-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    da50c57 View commit details
    Browse the repository at this point in the history
  26. mm: memmap_init: iterate over memblock regions rather that check each…

    … PFN
    
    When called during boot the memmap_init_zone() function checks if each PFN
    is valid and actually belongs to the node being initialized using
    early_pfn_valid() and early_pfn_in_nid().
    
    Each such check may cost up to O(log(n)) where n is the number of memory
    banks, so for large amount of memory overall time spent in early_pfn*()
    becomes substantial.
    
    Since the information is anyway present in memblock, we can iterate over
    memblock memory regions in memmap_init() and only call memmap_init_zone()
    for PFN ranges that are know to be valid and in the appropriate node.
    
    [cai@lca.pw: fix a compilation warning from Clang]
      Link: http://lkml.kernel.org/r/CF6E407F-17DC-427C-8203-21979FB882EF@lca.pw
    [bhe@redhat.com: fix the incorrect hole in fast_isolate_freepages()]
      Link: http://lkml.kernel.org/r/8C537EB7-85EE-4DCF-943E-3CC0ED0DF56D@lca.pw
      Link: http://lkml.kernel.org/r/20200521014407.29690-1-bhe@redhat.com
    Signed-off-by: Baoquan He <bhe@redhat.com>
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Cc: Qian Cai <cai@lca.pw>
    Link: http://lkml.kernel.org/r/20200412194859.12663-16-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Baoquan He authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    73a6e47 View commit details
    Browse the repository at this point in the history
  27. mm: remove early_pfn_in_nid() and CONFIG_NODES_SPAN_OTHER_NODES

    The memmap_init() function was made to iterate over memblock regions and
    as the result the early_pfn_in_nid() function became obsolete.  Since
    CONFIG_NODES_SPAN_OTHER_NODES is only used to pick a stub or a real
    implementation of early_pfn_in_nid(), it is also not needed anymore.
    
    Remove both early_pfn_in_nid() and the CONFIG_NODES_SPAN_OTHER_NODES.
    
    Co-developed-by: Hoan Tran <Hoan@os.amperecomputing.com>
    Signed-off-by: Hoan Tran <Hoan@os.amperecomputing.com>
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/20200412194859.12663-17-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    acd3f5c View commit details
    Browse the repository at this point in the history
  28. mm: free_area_init: allow defining max_zone_pfn in descending order

    Some architectures (e.g.  ARC) have the ZONE_HIGHMEM zone below the
    ZONE_NORMAL.  Allowing free_area_init() parse max_zone_pfn array even it
    is sorted in descending order allows using free_area_init() on such
    architectures.
    
    Add top -> down traversal of max_zone_pfn array in free_area_init() and
    use the latter in ARC node/zone initialization.
    
    [rppt@kernel.org: ARC fix]
      Link: http://lkml.kernel.org/r/20200504153901.GM14260@kernel.org
    [rppt@linux.ibm.com: arc: free_area_init(): take into account PAE40 mode]
      Link: http://lkml.kernel.org/r/20200507205900.GH683243@linux.ibm.com
    [akpm@linux-foundation.org: declare arch_has_descending_max_zone_pfns()]
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Reviewed-by: Baoquan He <bhe@redhat.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Cc: Guenter Roeck <linux@roeck-us.net>
    Link: http://lkml.kernel.org/r/20200412194859.12663-18-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    51930df View commit details
    Browse the repository at this point in the history
  29. mm: rename free_area_init_node() to free_area_init_memoryless_node()

    free_area_init_node() is only used by x86 to initialize a memory-less
    nodes.  Make its name reflect this and drop all the function parameters
    except node ID as they are anyway zero.
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/20200412194859.12663-19-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    bc9331a View commit details
    Browse the repository at this point in the history
  30. mm: clean up free_area_init_node() and its helpers

    free_area_init_node() now always uses memblock info and the zone PFN
    limits so it does not need the backwards compatibility functions to
    calculate the zone spanned and absent pages.  The removal of the compat_
    versions of zone_{abscent,spanned}_pages_in_node() in turn, makes
    zone_size and zhole_size parameters unused.
    
    The node_start_pfn is determined by get_pfn_range_for_nid(), so there is
    no need to pass it to free_area_init_node().
    
    As a result, the only required parameter to free_area_init_node() is the
    node ID, all the rest are removed along with no longer used
    compat_zone_{abscent,spanned}_pages_in_node() helpers.
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/20200412194859.12663-20-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    854e884 View commit details
    Browse the repository at this point in the history
  31. mm: simplify find_min_pfn_with_active_regions()

    find_min_pfn_with_active_regions() calls find_min_pfn_for_node() with nid
    parameter set to MAX_NUMNODES.  This makes the find_min_pfn_for_node()
    traverse all memblock memory regions although the first PFN in the system
    can be easily found with memblock_start_of_DRAM().
    
    Use memblock_start_of_DRAM() in find_min_pfn_with_active_regions() and drop
    now unused find_min_pfn_for_node().
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/20200412194859.12663-21-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    8a1b25f View commit details
    Browse the repository at this point in the history
  32. docs/vm: update memory-models documentation

    To reflect the updates to free_area_init() family of functions.
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Hoan Tran <hoan@os.amperecomputing.com>	[arm64]
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Brian Cain <bcain@codeaurora.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Geert Uytterhoeven <geert@linux-m68k.org>
    Cc: Greentime Hu <green.hu@gmail.com>
    Cc: Greg Ungerer <gerg@linux-m68k.org>
    Cc: Guan Xuetao <gxt@pku.edu.cn>
    Cc: Guo Ren <guoren@kernel.org>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Ley Foon Tan <ley.foon.tan@intel.com>
    Cc: Mark Salter <msalter@redhat.com>
    Cc: Matt Turner <mattst88@gmail.com>
    Cc: Max Filippov <jcmvbkbc@gmail.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Michal Simek <monstr@monstr.eu>
    Cc: Nick Hu <nickhu@andestech.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Richard Weinberger <richard@nod.at>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Stafford Horne <shorne@gmail.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/20200412194859.12663-22-rppt@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    237e506 View commit details
    Browse the repository at this point in the history
  33. mm/page_alloc.c: bad_[reason|flags] is not necessary when PageHWPoison

    Patch series "mm/page_alloc.c: cleanup on check page", v3.
    
    This patchset does some cleanup related to check page.
    
    1. Remove unnecessary bad_reason assignment
    2. Remove bad_flags to bad_page()
    3. Rename function for naming convention
    4. Extract common part to check page
    
    Thanks for suggestions from David Rientjes and Anshuman Khandual.
    
    This patch (of 5):
    
    Since function returns directly, bad_[reason|flags] is not used any where.
    And move this to the first.
    
    This is a following cleanup for commit e570f56 ("mm:
    check_new_page_bad() directly returns in __PG_HWPOISON case")
    
    Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Cc: Anshuman Khandual <anshuman.khandual@arm.com>
    Cc: David Rientjes <rientjes@google.com>
    Link: http://lkml.kernel.org/r/20200411220357.9636-2-richard.weiyang@gmail.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    RichardWeiYang authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    833d8a4 View commit details
    Browse the repository at this point in the history
  34. mm/page_alloc.c: bad_flags is not necessary for bad_page()

    After commit 5b57b8f ("mm/debug.c: always print flags in
    dump_page()"), page->flags is always printed for a bad page.  It is not
    necessary to have bad_flags any more.
    
    Suggested-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Link: http://lkml.kernel.org/r/20200411220357.9636-3-richard.weiyang@gmail.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    RichardWeiYang authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    82a3241 View commit details
    Browse the repository at this point in the history
  35. mm/page_alloc.c: rename free_pages_check_bad() to check_free_page_bad()

    free_pages_check_bad() is the counterpart of check_new_page_bad().  Rename
    it to use the same naming convention.
    
    Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Anshuman Khandual <anshuman.khandual@arm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Link: http://lkml.kernel.org/r/20200411220357.9636-4-richard.weiyang@gmail.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    RichardWeiYang authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    0d0c48a View commit details
    Browse the repository at this point in the history
  36. mm/page_alloc.c: rename free_pages_check() to check_free_page()

    free_pages_check() is the counterpart of check_new_page().  Rename it to
    use the same naming convention.
    
    Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Anshuman Khandual <anshuman.khandual@arm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Link: http://lkml.kernel.org/r/20200411220357.9636-5-richard.weiyang@gmail.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    RichardWeiYang authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    534fe5e View commit details
    Browse the repository at this point in the history
  37. mm/page_alloc.c: extract check_[new|free]_page_bad() common part to p…

    …age_bad_reason()
    
    We share similar code in check_[new|free]_page_bad() to get the page's bad
    reason.
    
    Let's extract it and reduce code duplication.
    
    Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: David Rientjes <rientjes@google.com>
    Cc: Anshuman Khandual <anshuman.khandual@arm.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Link: http://lkml.kernel.org/r/20200411220357.9636-6-richard.weiyang@gmail.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    RichardWeiYang authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    58b7f11 View commit details
    Browse the repository at this point in the history
  38. mm,page_alloc,cma: conditionally prefer cma pageblocks for movable al…

    …locations
    
    Currently a cma area is barely used by the page allocator because it's
    used only as a fallback from movable, however kswapd tries hard to make
    sure that the fallback path isn't used.
    
    This results in a system evicting memory and pushing data into swap, while
    lots of CMA memory is still available.  This happens despite the fact that
    alloc_contig_range is perfectly capable of moving any movable allocations
    out of the way of an allocation.
    
    To effectively use the cma area let's alter the rules: if the zone has
    more free cma pages than the half of total free pages in the zone, use cma
    pageblocks first and fallback to movable blocks in the case of failure.
    
    [guro@fb.com: ifdef the cma-specific code]
      Link: http://lkml.kernel.org/r/20200311225832.GA178154@carbon.DHCP.thefacebook.com
    Co-developed-by: Rik van Riel <riel@surriel.com>
    Signed-off-by: Roman Gushchin <guro@fb.com>
    Signed-off-by: Rik van Riel <riel@surriel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: Minchan Kim <minchan@kernel.org>
    Cc: Qian Cai <cai@lca.pw>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Anshuman Khandual <anshuman.khandual@arm.com>
    Cc: Joonsoo Kim <js1304@gmail.com>
    Link: http://lkml.kernel.org/r/20200306150102.3e77354b@imladris.surriel.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rgushchin authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    1686766 View commit details
    Browse the repository at this point in the history
  39. mm/page_alloc.c: remove unused free_bootmem_with_active_regions

    Since commit 397dc00 ("mips: sgi-ip27: switch from DISCONTIGMEM
    to SPARSEMEM"), the last caller of free_bootmem_with_active_regions() was
    gone.  Now no user calls it any more.
    
    Let's remove it.
    
    Signed-off-by: Baoquan He <bhe@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Link: http://lkml.kernel.org/r/20200402143455.5145-1-bhe@redhat.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Baoquan He authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    4ca7be2 View commit details
    Browse the repository at this point in the history
  40. mm/page_alloc.c: only tune sysctl_lowmem_reserve_ratio value once whe…

    …n changing it
    
    Patch series "improvements about lowmem_reserve and /proc/zoneinfo", v2.
    
    This patch (of 3):
    
    When people write to /proc/sys/vm/lowmem_reserve_ratio to change
    sysctl_lowmem_reserve_ratio[], setup_per_zone_lowmem_reserve() is called
    to recalculate all ->lowmem_reserve[] for each zone of all nodes as below:
    
    static void setup_per_zone_lowmem_reserve(void)
    {
    ...
    	for_each_online_pgdat(pgdat) {
    		for (j = 0; j < MAX_NR_ZONES; j++) {
    			...
    			while (idx) {
    				...
    				if (sysctl_lowmem_reserve_ratio[idx] < 1) {
    					sysctl_lowmem_reserve_ratio[idx] = 0;
    					lower_zone->lowmem_reserve[j] = 0;
                                    } else {
    				...
    			}
    		}
    	}
    }
    
    Meanwhile, here, sysctl_lowmem_reserve_ratio[idx] will be tuned if its
    value is smaller than '1'.  As we know, sysctl_lowmem_reserve_ratio[] is
    set for zone without regarding to which node it belongs to.  That means
    the tuning will be done on all nodes, even though it has been done in the
    first node.
    
    And the tuning will be done too even when init_per_zone_wmark_min() calls
    setup_per_zone_lowmem_reserve(), where actually nobody tries to change
    sysctl_lowmem_reserve_ratio[].
    
    So now move the tuning into lowmem_reserve_ratio_sysctl_handler(), to make
    code logic more reasonable.
    
    Signed-off-by: Baoquan He <bhe@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Baoquan He <bhe@redhat.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: David Rientjes <rientjes@google.com>
    Link: http://lkml.kernel.org/r/20200402140113.3696-1-bhe@redhat.com
    Link: http://lkml.kernel.org/r/20200402140113.3696-2-bhe@redhat.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Baoquan He authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    86aaf25 View commit details
    Browse the repository at this point in the history
  41. mm/page_alloc.c: clear out zone->lowmem_reserve[] if the zone is empty

    When requesting memory allocation from a specific zone is not satisfied,
    it will fall to lower zone to try allocating memory.  In this case, lower
    zone's ->lowmem_reserve[] will help protect its own memory resource.  The
    higher the relevant ->lowmem_reserve[] is, the harder the upper zone can
    get memory from this lower zone.
    
    However, this protection mechanism should be applied to populated zone,
    but not an empty zone. So filling ->lowmem_reserve[] for empty zone is
    not necessary, and may mislead people that it's valid data in that zone.
    
    Node 2, zone      DMA
      pages free     0
            min      0
            low      0
            high     0
            spanned  0
            present  0
            managed  0
            protection: (0, 0, 1024, 1024)
    Node 2, zone    DMA32
      pages free     0
            min      0
            low      0
            high     0
            spanned  0
            present  0
            managed  0
            protection: (0, 0, 1024, 1024)
    Node 2, zone   Normal
      per-node stats
          nr_inactive_anon 0
          nr_active_anon 143
          nr_inactive_file 0
          nr_active_file 0
          nr_unevictable 0
          nr_slab_reclaimable 45
          nr_slab_unreclaimable 254
    
    Here clear out zone->lowmem_reserve[] if zone is empty.
    
    Signed-off-by: Baoquan He <bhe@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Link: http://lkml.kernel.org/r/20200402140113.3696-3-bhe@redhat.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Baoquan He authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    f636615 View commit details
    Browse the repository at this point in the history
  42. mm/vmstat.c: do not show lowmem reserve protection information of emp…

    …ty zone
    
    Because the lowmem reserve protection of a zone can't tell anything if the
    zone is empty, except of adding one more line in /proc/zoneinfo.
    
    Let's remove it from that zone's showing.
    
    Signed-off-by: Baoquan He <bhe@redhat.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Link: http://lkml.kernel.org/r/20200402140113.3696-4-bhe@redhat.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Baoquan He authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    26e7dea View commit details
    Browse the repository at this point in the history
  43. mm/page_alloc: use ac->high_zoneidx for classzone_idx

    Patch series "integrate classzone_idx and high_zoneidx", v5.
    
    This patchset is followup of the problem reported and discussed two years
    ago [1, 2].  The problem this patchset solves is related to the
    classzone_idx on the NUMA system.  It causes a problem when the lowmem
    reserve protection exists for some zones on a node that do not exist on
    other nodes.
    
    This problem was reported two years ago, and, at that time, the solution
    got general agreements [2].  But it was not upstreamed.
    
    [1]: http://lkml.kernel.org/r/20180102063528.GG30397@yexl-desktop
    [2]: http://lkml.kernel.org/r/1525408246-14768-1-git-send-email-iamjoonsoo.kim@lge.com
    
    This patch (of 2):
    
    Currently, we use classzone_idx to calculate lowmem reserve proetection
    for an allocation request.  This classzone_idx causes a problem on NUMA
    systems when the lowmem reserve protection exists for some zones on a node
    that do not exist on other nodes.
    
    Before further explanation, I should first clarify how to compute the
    classzone_idx and the high_zoneidx.
    
    - ac->high_zoneidx is computed via the arcane gfp_zone(gfp_mask) and
      represents the index of the highest zone the allocation can use
    
    - classzone_idx was supposed to be the index of the highest zone on the
      local node that the allocation can use, that is actually available in
      the system
    
    Think about following example.  Node 0 has 4 populated zone,
    DMA/DMA32/NORMAL/MOVABLE.  Node 1 has 1 populated zone, NORMAL.  Some
    zones, such as MOVABLE, doesn't exist on node 1 and this makes following
    difference.
    
    Assume that there is an allocation request whose gfp_zone(gfp_mask) is the
    zone, MOVABLE.  Then, it's high_zoneidx is 3.  If this allocation is
    initiated on node 0, it's classzone_idx is 3 since actually
    available/usable zone on local (node 0) is MOVABLE.  If this allocation is
    initiated on node 1, it's classzone_idx is 2 since actually
    available/usable zone on local (node 1) is NORMAL.
    
    You can see that classzone_idx of the allocation request are different
    according to their starting node, even if their high_zoneidx is the same.
    
    Think more about these two allocation requests.  If they are processed on
    local, there is no problem.  However, if allocation is initiated on node 1
    are processed on remote, in this example, at the NORMAL zone on node 0,
    due to memory shortage, problem occurs.  Their different classzone_idx
    leads to different lowmem reserve and then different min watermark.  See
    the following example.
    
    root@ubuntu:/sys/devices/system/memory# cat /proc/zoneinfo
    Node 0, zone      DMA
      per-node stats
    ...
      pages free     3965
            min      5
            low      8
            high     11
            spanned  4095
            present  3998
            managed  3977
            protection: (0, 2961, 4928, 5440)
    ...
    Node 0, zone    DMA32
      pages free     757955
            min      1129
            low      1887
            high     2645
            spanned  1044480
            present  782303
            managed  758116
            protection: (0, 0, 1967, 2479)
    ...
    Node 0, zone   Normal
      pages free     459806
            min      750
            low      1253
            high     1756
            spanned  524288
            present  524288
            managed  503620
            protection: (0, 0, 0, 4096)
    ...
    Node 0, zone  Movable
      pages free     130759
            min      195
            low      326
            high     457
            spanned  1966079
            present  131072
            managed  131072
            protection: (0, 0, 0, 0)
    ...
    Node 1, zone      DMA
      pages free     0
            min      0
            low      0
            high     0
            spanned  0
            present  0
            managed  0
            protection: (0, 0, 1006, 1006)
    Node 1, zone    DMA32
      pages free     0
            min      0
            low      0
            high     0
            spanned  0
            present  0
            managed  0
            protection: (0, 0, 1006, 1006)
    Node 1, zone   Normal
      per-node stats
    ...
      pages free     233277
            min      383
            low      640
            high     897
            spanned  262144
            present  262144
            managed  257744
            protection: (0, 0, 0, 0)
    ...
    Node 1, zone  Movable
      pages free     0
            min      0
            low      0
            high     0
            spanned  262144
            present  0
            managed  0
            protection: (0, 0, 0, 0)
    
    - static min watermark for the NORMAL zone on node 0 is 750.
    
    - lowmem reserve for the request with classzone idx 3 at the NORMAL on
      node 0 is 4096.
    
    - lowmem reserve for the request with classzone idx 2 at the NORMAL on
      node 0 is 0.
    
    So, overall min watermark is:
    allocation initiated on node 0 (classzone_idx 3): 750 + 4096 = 4846
    allocation initiated on node 1 (classzone_idx 2): 750 + 0 = 750
    
    Allocation initiated on node 1 will have some precedence than allocation
    initiated on node 0 because min watermark of the former allocation is
    lower than the other.  So, allocation initiated on node 1 could succeed on
    node 0 when allocation initiated on node 0 could not, and, this could
    cause too many numa_miss allocation.  Then, performance could be
    downgraded.
    
    Recently, there was a regression report about this problem on CMA patches
    since CMA memory are placed in ZONE_MOVABLE by those patches.  I checked
    that problem is disappeared with this fix that uses high_zoneidx for
    classzone_idx.
    
    http://lkml.kernel.org/r/20180102063528.GG30397@yexl-desktop
    
    Using high_zoneidx for classzone_idx is more consistent way than previous
    approach because system's memory layout doesn't affect anything to it.
    With this patch, both classzone_idx on above example will be 3 so will
    have the same min watermark.
    
    allocation initiated on node 0: 750 + 4096 = 4846
    allocation initiated on node 1: 750 + 4096 = 4846
    
    One could wonder if there is a side effect that allocation initiated on
    node 1 will use higher bar when allocation is handled on local since
    classzone_idx could be higher than before.  It will not happen because the
    zone without managed page doesn't contributes lowmem_reserve at all.
    
    Reported-by: Ye Xiaolong <xiaolong.ye@intel.com>
    Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Ye Xiaolong <xiaolong.ye@intel.com>
    Reviewed-by: Baoquan He <bhe@redhat.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: David Rientjes <rientjes@google.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Link: http://lkml.kernel.org/r/1587095923-7515-1-git-send-email-iamjoonsoo.kim@lge.com
    Link: http://lkml.kernel.org/r/1587095923-7515-2-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    JoonsooKim authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    3334a45 View commit details
    Browse the repository at this point in the history
  44. mm/page_alloc: integrate classzone_idx and high_zoneidx

    classzone_idx is just different name for high_zoneidx now.  So, integrate
    them and add some comment to struct alloc_context in order to reduce
    future confusion about the meaning of this variable.
    
    The accessor, ac_classzone_idx() is also removed since it isn't needed
    after integration.
    
    In addition to integration, this patch also renames high_zoneidx to
    highest_zoneidx since it represents more precise meaning.
    
    Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Baoquan He <bhe@redhat.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Acked-by: David Rientjes <rientjes@google.com>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Ye Xiaolong <xiaolong.ye@intel.com>
    Link: http://lkml.kernel.org/r/1587095923-7515-3-git-send-email-iamjoonsoo.kim@lge.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    JoonsooKim authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    97a225e View commit details
    Browse the repository at this point in the history
  45. mm/page_alloc.c: use NODE_MASK_NONE in build_zonelists()

    Slightly simplify the code by initializing user_mask with NODE_MASK_NONE,
    instead of later calling nodes_clear().  This saves a line of code.
    
    Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
    Link: http://lkml.kernel.org/r/20200330220840.21228-1-richard.weiyang@gmail.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    RichardWeiYang authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    d0ddf49 View commit details
    Browse the repository at this point in the history
  46. mm: rename gfpflags_to_migratetype to gfp_migratetype for same conven…

    …tion
    
    Pageblock migrate type is encoded in GFP flags, just as zone_type and
    zonelist.
    
    Currently we use gfp_zone() and gfp_zonelist() to extract related
    information, it would be proper to use the same naming convention for
    migrate type.
    
    Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
    Link: http://lkml.kernel.org/r/20200329080823.7735-1-richard.weiyang@gmail.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    RichardWeiYang authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    01c0bfe View commit details
    Browse the repository at this point in the history
  47. mm/page_alloc.c: reset numa stats for boot pagesets

    Initially, the per-cpu pagesets of each zone are set to the boot pagesets.
    The real pagesets are allocated later but before that happens, page
    allocations do occur and the numa stats for the boot pagesets get
    incremented since they are common to all zones at that point.
    
    The real pagesets, however, are allocated for the populated zones only.
    Unpopulated zones, like those associated with memory-less nodes, continue
    using the boot pageset and end up skewing the numa stats of the
    corresponding node.
    
    E.g.
    
      $ numactl -H
      available: 2 nodes (0-1)
      node 0 cpus: 0 1 2 3
      node 0 size: 0 MB
      node 0 free: 0 MB
      node 1 cpus: 4 5 6 7
      node 1 size: 8131 MB
      node 1 free: 6980 MB
      node distances:
      node   0   1
        0:  10  40
        1:  40  10
    
      $ numastat
                                 node0           node1
      numa_hit                     108           56495
      numa_miss                      0               0
      numa_foreign                   0               0
      interleave_hit                 0            4537
      local_node                   108           31547
      other_node                     0           24948
    
    Hence, the boot pageset stats need to be cleared after the real pagesets
    are allocated.
    
    After this point, the stats of the boot pagesets do not change as page
    allocations requested for a memory-less node will either fail (if
    __GFP_THISNODE is used) or get fulfilled by a preferred zone of a
    different node based on the fallback zonelist.
    
    [sandipan@linux.ibm.com: v3]
      Link: http://lkml.kernel.org/r/20200511170356.162531-1-sandipan@linux.ibm.com
    Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
    Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
    Link: http://lkml.kernel.org/r/9c9c2d1b15e37f6e6bf32f99e3100035e90c4ac9.1588868430.git.sandipan@linux.ibm.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    sandip4n authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    b418a0f View commit details
    Browse the repository at this point in the history
  48. mm, page_alloc: reset the zone->watermark_boost early

    Updating the zone watermarks by any means, like min_free_kbytes,
    water_mark_scale_factor etc, when ->watermark_boost is set will result in
    higher low and high watermarks than the user asked.
    
    Below are the steps to reproduce the problem on system setup of Android
    kernel running on Snapdragon hardware.
    
    1) Default settings of the system are as below:
    
       #cat /proc/sys/vm/min_free_kbytes = 5162
       #cat /proc/zoneinfo | grep -e boost -e low -e "high " -e min -e Node
    	Node 0, zone   Normal
    		min      797
    		low      8340
    		high     8539
    
    2) Monitor the zone->watermark_boost(by adding a debug print in the
       kernel) and whenever it is greater than zero value, write the same
       value of min_free_kbytes obtained from step 1.
    
       #echo 5162 > /proc/sys/vm/min_free_kbytes
    
    3) Then read the zone watermarks in the system while the
       ->watermark_boost is zero.  This should show the same values of
       watermarks as step 1 but shown a higher values than asked.
    
       #cat /proc/zoneinfo | grep -e boost -e low -e "high " -e min -e Node
    	Node 0, zone   Normal
    		min      797
    		low      21148
    		high     21347
    
    These higher values are because of updating the zone watermarks using the
    macro min_wmark_pages(zone) which also adds the zone->watermark_boost.
    
    	#define min_wmark_pages(z) (z->_watermark[WMARK_MIN] +
    					z->watermark_boost)
    
    So the steps that lead to the issue are:
    
    1) On the extfrag event, watermarks are boosted by storing the required
       value in ->watermark_boost.
    
    2) User tries to update the zone watermarks level in the system through
       min_free_kbytes or watermark_scale_factor.
    
    3) Later, when kswapd woke up, it resets the zone->watermark_boost to
       zero.
    
    In step 2), we use the min_wmark_pages() macro to store the watermarks
    in the zone structure thus the values are always offsetted by
    ->watermark_boost value. This can be avoided by resetting the
    ->watermark_boost to zero before it is used.
    
    Signed-off-by: Charan Teja Reddy <charante@codeaurora.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Baoquan He <bhe@redhat.com>
    Cc: Vinayak Menon <vinmenon@codeaurora.org>
    Link: http://lkml.kernel.org/r/1589457511-4255-1-git-send-email-charante@codeaurora.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Charan Teja Reddy authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    aa09259 View commit details
    Browse the repository at this point in the history
  49. mm/page_alloc: restrict and formalize compound_page_dtors[]

    Restrict elements in compound_page_dtors[] array per NR_COMPOUND_DTORS and
    explicitly position them according to enum compound_dtor_id.  This
    improves protection against possible misalignment between
    compound_page_dtors[] and enum compound_dtor_id later on.
    
    Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Link: http://lkml.kernel.org/r/1589795958-19317-1-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Anshuman Khandual authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    ae70edd View commit details
    Browse the repository at this point in the history
  50. mm/pagealloc.c: call touch_nmi_watchdog() on max order boundaries in …

    …deferred init
    
    Patch series "initialize deferred pages with interrupts enabled", v4.
    
    Keep interrupts enabled during deferred page initialization in order to
    make code more modular and allow jiffies to update.
    
    Original approach, and discussion can be found here:
     http://lkml.kernel.org/r/20200311123848.118638-1-shile.zhang@linux.alibaba.com
    
    This patch (of 3):
    
    deferred_init_memmap() disables interrupts the entire time, so it calls
    touch_nmi_watchdog() periodically to avoid soft lockup splats.  Soon it
    will run with interrupts enabled, at which point cond_resched() should be
    used instead.
    
    deferred_grow_zone() makes the same watchdog calls through code shared
    with deferred init but will continue to run with interrupts disabled, so
    it can't call cond_resched().
    
    Pull the watchdog calls up to these two places to allow the first to be
    changed later, independently of the second.  The frequency reduces from
    twice per pageblock (init and free) to once per max order block.
    
    Fixes: 3a2d7fa ("mm: disable interrupts while initializing deferred pages")
    Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
    Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
    Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
    Cc: James Morris <jmorris@namei.org>
    Cc: Sasha Levin <sashal@kernel.org>
    Cc: Yiqian Wei <yiwei@redhat.com>
    Cc: <stable@vger.kernel.org>	[4.17+]
    Link: http://lkml.kernel.org/r/20200403140952.17177-2-pasha.tatashin@soleen.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    danieljordan10 authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    117003c View commit details
    Browse the repository at this point in the history
  51. mm: initialize deferred pages with interrupts enabled

    Initializing struct pages is a long task and keeping interrupts disabled
    for the duration of this operation introduces a number of problems.
    
    1. jiffies are not updated for long period of time, and thus incorrect time
       is reported. See proposed solution and discussion here:
       lkml/20200311123848.118638-1-shile.zhang@linux.alibaba.com
    2. It prevents farther improving deferred page initialization by allowing
       intra-node multi-threading.
    
    We are keeping interrupts disabled to solve a rather theoretical problem
    that was never observed in real world (See 3a2d7fa).
    
    Let's keep interrupts enabled. In case we ever encounter a scenario where
    an interrupt thread wants to allocate large amount of memory this early in
    boot we can deal with that by growing zone (see deferred_grow_zone()) by
    the needed amount before starting deferred_init_memmap() threads.
    
    Before:
    [    1.232459] node 0 initialised, 12058412 pages in 1ms
    
    After:
    [    1.632580] node 0 initialised, 12051227 pages in 436ms
    
    Fixes: 3a2d7fa ("mm: disable interrupts while initializing deferred pages")
    Reported-by: Shile Zhang <shile.zhang@linux.alibaba.com>
    Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Vlastimil Babka <vbabka@suse.cz>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: James Morris <jmorris@namei.org>
    Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
    Cc: Sasha Levin <sashal@kernel.org>
    Cc: Yiqian Wei <yiwei@redhat.com>
    Cc: <stable@vger.kernel.org>	[4.17+]
    Link: http://lkml.kernel.org/r/20200403140952.17177-3-pasha.tatashin@soleen.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    soleen authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    3d06085 View commit details
    Browse the repository at this point in the history
  52. mm: call cond_resched() from deferred_init_memmap()

    Now that deferred pages are initialized with interrupts enabled we can
    replace touch_nmi_watchdog() with cond_resched(), as it was before
    3a2d7fa.
    
    For now, we cannot do the same in deferred_grow_zone() as it is still
    initializes pages with interrupts disabled.
    
    This change fixes RCU problem described in
    https://lkml.kernel.org/r/20200401104156.11564-2-david@redhat.com
    
    [   60.474005] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
    [   60.475000] rcu:  1-...0: (0 ticks this GP) idle=02a/1/0x4000000000000000 softirq=1/1 fqs=15000
    [   60.475000] rcu:  (detected by 0, t=60002 jiffies, g=-1199, q=1)
    [   60.475000] Sending NMI from CPU 0 to CPUs 1:
    [    1.760091] NMI backtrace for cpu 1
    [    1.760091] CPU: 1 PID: 20 Comm: pgdatinit0 Not tainted 4.18.0-147.9.1.el8_1.x86_64 #1
    [    1.760091] Hardware name: Red Hat KVM, BIOS 1.13.0-1.module+el8.2.0+5520+4e5817f3 04/01/2014
    [    1.760091] RIP: 0010:__init_single_page.isra.65+0x10/0x4f
    [    1.760091] Code: 48 83 cf 63 48 89 f8 0f 1f 40 00 48 89 c6 48 89 d7 e8 6b 18 80 ff 66 90 5b c3 31 c0 b9 10 00 00 00 49 89 f8 48 c1 e6 33 f3 ab <b8> 07 00 00 00 48 c1 e2 36 41 c7 40 34 01 00 00 00 48 c1 e0 33 41
    [    1.760091] RSP: 0000:ffffba783123be40 EFLAGS: 00000006
    [    1.760091] RAX: 0000000000000000 RBX: fffffad34405e300 RCX: 0000000000000000
    [    1.760091] RDX: 0000000000000000 RSI: 0010000000000000 RDI: fffffad34405e340
    [    1.760091] RBP: 0000000033f3177e R08: fffffad34405e300 R09: 0000000000000002
    [    1.760091] R10: 000000000000002b R11: ffff98afb691a500 R12: 0000000000000002
    [    1.760091] R13: 0000000000000000 R14: 000000003f03ea00 R15: 000000003e10178c
    [    1.760091] FS:  0000000000000000(0000) GS:ffff9c9ebeb00000(0000) knlGS:0000000000000000
    [    1.760091] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    [    1.760091] CR2: 00000000ffffffff CR3: 000000a1cf20a001 CR4: 00000000003606e0
    [    1.760091] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    [    1.760091] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    [    1.760091] Call Trace:
    [    1.760091]  deferred_init_pages+0x8f/0xbf
    [    1.760091]  deferred_init_memmap+0x184/0x29d
    [    1.760091]  ? deferred_free_pages.isra.97+0xba/0xba
    [    1.760091]  kthread+0x112/0x130
    [    1.760091]  ? kthread_flush_work_fn+0x10/0x10
    [    1.760091]  ret_from_fork+0x35/0x40
    [   89.123011] node 0 initialised, 1055935372 pages in 88650ms
    
    Fixes: 3a2d7fa ("mm: disable interrupts while initializing deferred pages")
    Reported-by: Yiqian Wei <yiwei@redhat.com>
    Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Reviewed-by: Pankaj Gupta <pankaj.gupta.linux@gmail.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: James Morris <jmorris@namei.org>
    Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
    Cc: Sasha Levin <sashal@kernel.org>
    Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: <stable@vger.kernel.org>	[4.17+]
    Link: http://lkml.kernel.org/r/20200403140952.17177-4-pasha.tatashin@soleen.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    soleen authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    da97f2d View commit details
    Browse the repository at this point in the history
  53. padata: remove exit routine

    Patch series "padata: parallelize deferred page init", v3.
    
    Deferred struct page init is a bottleneck in kernel boot--the biggest for
    us and probably others.  Optimizing it maximizes availability for
    large-memory systems and allows spinning up short-lived VMs as needed
    without having to leave them running.  It also benefits bare metal
    machines hosting VMs that are sensitive to downtime.  In projects such as
    VMM Fast Restart[1], where guest state is preserved across kexec reboot,
    it helps prevent application and network timeouts in the guests.
    
    So, multithread deferred init to take full advantage of system memory
    bandwidth.
    
    Extend padata, a framework that handles many parallel singlethreaded jobs,
    to handle multithreaded jobs as well by adding support for splitting up
    the work evenly, specifying a minimum amount of work that's appropriate
    for one helper thread to do, load balancing between helpers, and
    coordinating them.  More documentation in patches 4 and 8.
    
    This series is the first step in a project to address other memory
    proportional bottlenecks in the kernel such as pmem struct page init, vfio
    page pinning, hugetlb fallocate, and munmap.  Deferred page init doesn't
    require concurrency limits, resource control, or priority adjustments like
    these other users will because it happens during boot when the system is
    otherwise idle and waiting for page init to finish.
    
    This has been run on a variety of x86 systems and speeds up kernel boot by
    4% to 49%, saving up to 1.6 out of 4 seconds.  Patch 6 has more numbers.
    
    This patch (of 8):
    
    padata_driver_exit() is unnecessary because padata isn't built as a module
    and doesn't exit.
    
    padata's init routine will soon allocate memory, so getting rid of the
    exit function now avoids pointless code to free it.
    
    Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Josh Triplett <josh@joshtriplett.org>
    Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
    Cc: Alex Williamson <alex.williamson@redhat.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Herbert Xu <herbert@gondor.apana.org.au>
    Cc: Jason Gunthorpe <jgg@ziepe.ca>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Pavel Machek <pavel@ucw.cz>
    Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Robert Elliott <elliott@hpe.com>
    Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
    Cc: Steffen Klassert <steffen.klassert@secunet.com>
    Cc: Steven Sistare <steven.sistare@oracle.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Zi Yan <ziy@nvidia.com>
    Link: http://lkml.kernel.org/r/20200527173608.2885243-1-daniel.m.jordan@oracle.com
    Link: http://lkml.kernel.org/r/20200527173608.2885243-2-daniel.m.jordan@oracle.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    danieljordan10 authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    305dacf View commit details
    Browse the repository at this point in the history
  54. padata: initialize earlier

    padata will soon initialize the system's struct pages in parallel, so it
    needs to be ready by page_alloc_init_late().
    
    The error return from padata_driver_init() triggers an initcall warning,
    so add a warning to padata_init() to avoid silent failure.
    
    Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Josh Triplett <josh@joshtriplett.org>
    Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
    Cc: Alex Williamson <alex.williamson@redhat.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Herbert Xu <herbert@gondor.apana.org.au>
    Cc: Jason Gunthorpe <jgg@ziepe.ca>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Pavel Machek <pavel@ucw.cz>
    Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Robert Elliott <elliott@hpe.com>
    Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
    Cc: Steffen Klassert <steffen.klassert@secunet.com>
    Cc: Steven Sistare <steven.sistare@oracle.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Zi Yan <ziy@nvidia.com>
    Link: http://lkml.kernel.org/r/20200527173608.2885243-3-daniel.m.jordan@oracle.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    danieljordan10 authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    f1b192b View commit details
    Browse the repository at this point in the history
  55. padata: allocate work structures for parallel jobs from a pool

    padata allocates per-CPU, per-instance work structs for parallel jobs.  A
    do_parallel call assigns a job to a sequence number and hashes the number
    to a CPU, where the job will eventually run using the corresponding work.
    
    This approach fit with how padata used to bind a job to each CPU
    round-robin, makes less sense after commit bfde23c ("padata: unbind
    parallel jobs from specific CPUs") because a work isn't bound to a
    particular CPU anymore, and isn't needed at all for multithreaded jobs
    because they don't have sequence numbers.
    
    Replace the per-CPU works with a preallocated pool, which allows sharing
    them between existing padata users and the upcoming multithreaded user.
    The pool will also facilitate setting NUMA-aware concurrency limits with
    later users.
    
    The pool is sized according to the number of possible CPUs.  With this
    limit, MAX_OBJ_NUM no longer makes sense, so remove it.
    
    If the global pool is exhausted, a parallel job is run in the current task
    instead to throttle a system trying to do too much in parallel.
    
    Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Josh Triplett <josh@joshtriplett.org>
    Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
    Cc: Alex Williamson <alex.williamson@redhat.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Herbert Xu <herbert@gondor.apana.org.au>
    Cc: Jason Gunthorpe <jgg@ziepe.ca>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Pavel Machek <pavel@ucw.cz>
    Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Robert Elliott <elliott@hpe.com>
    Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
    Cc: Steffen Klassert <steffen.klassert@secunet.com>
    Cc: Steven Sistare <steven.sistare@oracle.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Zi Yan <ziy@nvidia.com>
    Link: http://lkml.kernel.org/r/20200527173608.2885243-4-daniel.m.jordan@oracle.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    danieljordan10 authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    4611ce2 View commit details
    Browse the repository at this point in the history
  56. padata: add basic support for multithreaded jobs

    Sometimes the kernel doesn't take full advantage of system memory
    bandwidth, leading to a single CPU spending excessive time in
    initialization paths where the data scales with memory size.
    
    Multithreading naturally addresses this problem.
    
    Extend padata, a framework that handles many parallel yet singlethreaded
    jobs, to also handle multithreaded jobs by adding support for splitting up
    the work evenly, specifying a minimum amount of work that's appropriate
    for one helper thread to do, load balancing between helpers, and
    coordinating them.
    
    This is inspired by work from Pavel Tatashin and Steve Sistare.
    
    Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Josh Triplett <josh@joshtriplett.org>
    Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
    Cc: Alex Williamson <alex.williamson@redhat.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Herbert Xu <herbert@gondor.apana.org.au>
    Cc: Jason Gunthorpe <jgg@ziepe.ca>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Pavel Machek <pavel@ucw.cz>
    Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Robert Elliott <elliott@hpe.com>
    Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
    Cc: Steffen Klassert <steffen.klassert@secunet.com>
    Cc: Steven Sistare <steven.sistare@oracle.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Zi Yan <ziy@nvidia.com>
    Link: http://lkml.kernel.org/r/20200527173608.2885243-5-daniel.m.jordan@oracle.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    danieljordan10 authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    004ed42 View commit details
    Browse the repository at this point in the history
  57. mm: don't track number of pages during deferred initialization

    Deferred page init used to report the number of pages initialized:
    
      node 0 initialised, 32439114 pages in 97ms
    
    Tracking this makes the code more complicated when using multiple threads.
    Given that the statistic probably has limited value, especially since a
    zone grows on demand so that the page count can vary, just remove it.
    
    The boot message now looks like
    
      node 0 deferred pages initialised in 97ms
    
    Suggested-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
    Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
    Cc: Alex Williamson <alex.williamson@redhat.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Herbert Xu <herbert@gondor.apana.org.au>
    Cc: Jason Gunthorpe <jgg@ziepe.ca>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Josh Triplett <josh@joshtriplett.org>
    Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Pavel Machek <pavel@ucw.cz>
    Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Robert Elliott <elliott@hpe.com>
    Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
    Cc: Steffen Klassert <steffen.klassert@secunet.com>
    Cc: Steven Sistare <steven.sistare@oracle.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Zi Yan <ziy@nvidia.com>
    Link: http://lkml.kernel.org/r/20200527173608.2885243-6-daniel.m.jordan@oracle.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    danieljordan10 authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    89c7c40 View commit details
    Browse the repository at this point in the history
  58. mm: parallelize deferred_init_memmap()

    Deferred struct page init is a significant bottleneck in kernel boot.
    Optimizing it maximizes availability for large-memory systems and allows
    spinning up short-lived VMs as needed without having to leave them
    running.  It also benefits bare metal machines hosting VMs that are
    sensitive to downtime.  In projects such as VMM Fast Restart[1], where
    guest state is preserved across kexec reboot, it helps prevent application
    and network timeouts in the guests.
    
    Multithread to take full advantage of system memory bandwidth.
    
    The maximum number of threads is capped at the number of CPUs on the node
    because speedups always improve with additional threads on every system
    tested, and at this phase of boot, the system is otherwise idle and
    waiting on page init to finish.
    
    Helper threads operate on section-aligned ranges to both avoid false
    sharing when setting the pageblock's migrate type and to avoid accessing
    uninitialized buddy pages, though max order alignment is enough for the
    latter.
    
    The minimum chunk size is also a section.  There was benefit to using
    multiple threads even on relatively small memory (1G) systems, and this is
    the smallest size that the alignment allows.
    
    The time (milliseconds) is the slowest node to initialize since boot
    blocks until all nodes finish.  intel_pstate is loaded in active mode
    without hwp and with turbo enabled, and intel_idle is active as well.
    
        Intel(R) Xeon(R) Platinum 8167M CPU @ 2.00GHz (Skylake, bare metal)
          2 nodes * 26 cores * 2 threads = 104 CPUs
          384G/node = 768G memory
    
                       kernel boot                 deferred init
                       ------------------------    ------------------------
        node% (thr)    speedup  time_ms (stdev)    speedup  time_ms (stdev)
              (  0)         --   4089.7 (  8.1)         --   1785.7 (  7.6)
           2% (  1)       1.7%   4019.3 (  1.5)       3.8%   1717.7 ( 11.8)
          12% (  6)      34.9%   2662.7 (  2.9)      79.9%    359.3 (  0.6)
          25% ( 13)      39.9%   2459.0 (  3.6)      91.2%    157.0 (  0.0)
          37% ( 19)      39.2%   2485.0 ( 29.7)      90.4%    172.0 ( 28.6)
          50% ( 26)      39.3%   2482.7 ( 25.7)      90.3%    173.7 ( 30.0)
          75% ( 39)      39.0%   2495.7 (  5.5)      89.4%    190.0 (  1.0)
         100% ( 52)      40.2%   2443.7 (  3.8)      92.3%    138.0 (  1.0)
    
        Intel(R) Xeon(R) CPU E5-2699C v4 @ 2.20GHz (Broadwell, kvm guest)
          1 node * 16 cores * 2 threads = 32 CPUs
          192G/node = 192G memory
    
                       kernel boot                 deferred init
                       ------------------------    ------------------------
        node% (thr)    speedup  time_ms (stdev)    speedup  time_ms (stdev)
              (  0)         --   1988.7 (  9.6)         --   1096.0 ( 11.5)
           3% (  1)       1.1%   1967.0 ( 17.6)       0.3%   1092.7 ( 11.0)
          12% (  4)      41.1%   1170.3 ( 14.2)      73.8%    287.0 (  3.6)
          25% (  8)      47.1%   1052.7 ( 21.9)      83.9%    177.0 ( 13.5)
          38% ( 12)      48.9%   1016.3 ( 12.1)      86.8%    144.7 (  1.5)
          50% ( 16)      48.9%   1015.7 (  8.1)      87.8%    134.0 (  4.4)
          75% ( 24)      49.1%   1012.3 (  3.1)      88.1%    130.3 (  2.3)
         100% ( 32)      49.5%   1004.0 (  5.3)      88.5%    125.7 (  2.1)
    
        Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz (Haswell, bare metal)
          2 nodes * 18 cores * 2 threads = 72 CPUs
          128G/node = 256G memory
    
                       kernel boot                 deferred init
                       ------------------------    ------------------------
        node% (thr)    speedup  time_ms (stdev)    speedup  time_ms (stdev)
              (  0)         --   1680.0 (  4.6)         --    627.0 (  4.0)
           3% (  1)       0.3%   1675.7 (  4.5)      -0.2%    628.0 (  3.6)
          11% (  4)      25.6%   1250.7 (  2.1)      67.9%    201.0 (  0.0)
          25% (  9)      30.7%   1164.0 ( 17.3)      81.8%    114.3 ( 17.7)
          36% ( 13)      31.4%   1152.7 ( 10.8)      84.0%    100.3 ( 17.9)
          50% ( 18)      31.5%   1150.7 (  9.3)      83.9%    101.0 ( 14.1)
          75% ( 27)      31.7%   1148.0 (  5.6)      84.5%     97.3 (  6.4)
         100% ( 36)      32.0%   1142.3 (  4.0)      85.6%     90.0 (  1.0)
    
        AMD EPYC 7551 32-Core Processor (Zen, kvm guest)
          1 node * 8 cores * 2 threads = 16 CPUs
          64G/node = 64G memory
    
                       kernel boot                 deferred init
                       ------------------------    ------------------------
        node% (thr)    speedup  time_ms (stdev)    speedup  time_ms (stdev)
              (  0)         --   1029.3 ( 25.1)         --    240.7 (  1.5)
           6% (  1)      -0.6%   1036.0 (  7.8)      -2.2%    246.0 (  0.0)
          12% (  2)      11.8%    907.7 (  8.6)      44.7%    133.0 (  1.0)
          25% (  4)      13.9%    886.0 ( 10.6)      62.6%     90.0 (  6.0)
          38% (  6)      17.8%    845.7 ( 14.2)      69.1%     74.3 (  3.8)
          50% (  8)      16.8%    856.0 ( 22.1)      72.9%     65.3 (  5.7)
          75% ( 12)      15.4%    871.0 ( 29.2)      79.8%     48.7 (  7.4)
         100% ( 16)      21.0%    813.7 ( 21.0)      80.5%     47.0 (  5.2)
    
    Server-oriented distros that enable deferred page init sometimes run in
    small VMs, and they still benefit even though the fraction of boot time
    saved is smaller:
    
        AMD EPYC 7551 32-Core Processor (Zen, kvm guest)
          1 node * 2 cores * 2 threads = 4 CPUs
          16G/node = 16G memory
    
                       kernel boot                 deferred init
                       ------------------------    ------------------------
        node% (thr)    speedup  time_ms (stdev)    speedup  time_ms (stdev)
              (  0)         --    716.0 ( 14.0)         --     49.7 (  0.6)
          25% (  1)       1.8%    703.0 (  5.3)      -4.0%     51.7 (  0.6)
          50% (  2)       1.6%    704.7 (  1.2)      43.0%     28.3 (  0.6)
          75% (  3)       2.7%    696.7 ( 13.1)      49.7%     25.0 (  0.0)
         100% (  4)       4.1%    687.0 ( 10.4)      55.7%     22.0 (  0.0)
    
        Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz (Haswell, kvm guest)
          1 node * 2 cores * 2 threads = 4 CPUs
          14G/node = 14G memory
    
                       kernel boot                 deferred init
                       ------------------------    ------------------------
        node% (thr)    speedup  time_ms (stdev)    speedup  time_ms (stdev)
              (  0)         --    787.7 (  6.4)         --    122.3 (  0.6)
          25% (  1)       0.2%    786.3 ( 10.8)      -2.5%    125.3 (  2.1)
          50% (  2)       5.9%    741.0 ( 13.9)      37.6%     76.3 ( 19.7)
          75% (  3)       8.3%    722.0 ( 19.0)      49.9%     61.3 (  3.2)
         100% (  4)       9.3%    714.7 (  9.5)      56.4%     53.3 (  1.5)
    
    On Josh's 96-CPU and 192G memory system:
    
        Without this patch series:
        [    0.487132] node 0 initialised, 23398907 pages in 292ms
        [    0.499132] node 1 initialised, 24189223 pages in 304ms
        ...
        [    0.629376] Run /sbin/init as init process
    
        With this patch series:
        [    0.231435] node 1 initialised, 24189223 pages in 32ms
        [    0.236718] node 0 initialised, 23398907 pages in 36ms
    
    [1] https://static.sched.com/hosted_files/kvmforum2019/66/VMM-fast-restart_kvmforum2019.pdf
    
    Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Josh Triplett <josh@joshtriplett.org>
    Reviewed-by: Alexander Duyck <alexander.h.duyck@linux.intel.com>
    Cc: Alex Williamson <alex.williamson@redhat.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Herbert Xu <herbert@gondor.apana.org.au>
    Cc: Jason Gunthorpe <jgg@ziepe.ca>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Pavel Machek <pavel@ucw.cz>
    Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Robert Elliott <elliott@hpe.com>
    Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
    Cc: Steffen Klassert <steffen.klassert@secunet.com>
    Cc: Steven Sistare <steven.sistare@oracle.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Zi Yan <ziy@nvidia.com>
    Link: http://lkml.kernel.org/r/20200527173608.2885243-7-daniel.m.jordan@oracle.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    danieljordan10 authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    e444314 View commit details
    Browse the repository at this point in the history
  59. mm: make deferred init's max threads arch-specific

    Using padata during deferred init has only been tested on x86, so for now
    limit it to this architecture.
    
    If another arch wants this, it can find the max thread limit that's best
    for it and override deferred_page_init_max_threads().
    
    Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Josh Triplett <josh@joshtriplett.org>
    Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
    Cc: Alex Williamson <alex.williamson@redhat.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Herbert Xu <herbert@gondor.apana.org.au>
    Cc: Jason Gunthorpe <jgg@ziepe.ca>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Pavel Machek <pavel@ucw.cz>
    Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Robert Elliott <elliott@hpe.com>
    Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
    Cc: Steffen Klassert <steffen.klassert@secunet.com>
    Cc: Steven Sistare <steven.sistare@oracle.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Zi Yan <ziy@nvidia.com>
    Link: http://lkml.kernel.org/r/20200527173608.2885243-8-daniel.m.jordan@oracle.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    danieljordan10 authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    ecd0965 View commit details
    Browse the repository at this point in the history
  60. padata: document multithreaded jobs

    Add Documentation for multithreaded jobs.
    
    Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Josh Triplett <josh@joshtriplett.org>
    Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com>
    Cc: Alex Williamson <alex.williamson@redhat.com>
    Cc: Dan Williams <dan.j.williams@intel.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David Hildenbrand <david@redhat.com>
    Cc: Herbert Xu <herbert@gondor.apana.org.au>
    Cc: Jason Gunthorpe <jgg@ziepe.ca>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
    Cc: Michal Hocko <mhocko@kernel.org>
    Cc: Pavel Machek <pavel@ucw.cz>
    Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Robert Elliott <elliott@hpe.com>
    Cc: Shile Zhang <shile.zhang@linux.alibaba.com>
    Cc: Steffen Klassert <steffen.klassert@secunet.com>
    Cc: Steven Sistare <steven.sistare@oracle.com>
    Cc: Tejun Heo <tj@kernel.org>
    Cc: Zi Yan <ziy@nvidia.com>
    Link: http://lkml.kernel.org/r/20200527173608.2885243-9-daniel.m.jordan@oracle.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    danieljordan10 authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    ec3b39c View commit details
    Browse the repository at this point in the history
  61. mm/page_alloc.c: add missing newline

    Add missing line breaks on pr_warn().
    
    Signed-off-by: Chen Tao <chentao107@huawei.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
    Link: http://lkml.kernel.org/r/20200603063547.235825-1-chentao107@huawei.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Chen Tao authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    633bf2f View commit details
    Browse the repository at this point in the history
  62. khugepaged: add self test

    Patch series "thp/khugepaged improvements and CoW semantics", v4.
    
    The patchset adds khugepaged selftest (anon-THP only for now), expands
    cases khugepaged can handle and switches anon-THP copy-on-write handling
    to 4k.
    
    This patch (of 8):
    
    The test checks if khugepaged is able to recover huge page where we expect
    to do so.  It only covers anon-THP for now.
    
    Currently the test shows few failures.  They are going to be addressed by
    the following patches.
    
    [colin.king@canonical.com: fix several spelling mistakes]
      Link: http://lkml.kernel.org/r/20200420084241.65433-1-colin.king@canonical.com
    [aneesh.kumar@linux.ibm.com: replace the usage of system(3) in the test]
      Link: http://lkml.kernel.org/r/20200429110727.89388-1-aneesh.kumar@linux.ibm.com
    [kirill@shutemov.name: fixup for issues I've noticed]
      Link: http://lkml.kernel.org/r/20200429124816.jp272trghrzxx5j5@box
    [jhubbard@nvidia.com: add khugepaged to .gitignore]
      Link: http://lkml.kernel.org/r/20200517002509.362401-1-jhubbard@nvidia.com
    Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Signed-off-by: Colin Ian King <colin.king@canonical.com>
    Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
    Signed-off-by: John Hubbard <jhubbard@nvidia.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Zi Yan <ziy@nvidia.com>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>
    Reviewed-by: Zi Yan <ziy@nvidia.com>
    Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: William Kucharski <william.kucharski@oracle.com>
    Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Link: http://lkml.kernel.org/r/20200416160026.16538-1-kirill.shutemov@linux.intel.com
    Link: http://lkml.kernel.org/r/20200416160026.16538-2-kirill.shutemov@linux.intel.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    kiryl authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    e0c13f9 View commit details
    Browse the repository at this point in the history
  63. khugepaged: do not stop collapse if less than half PTEs are referenced

    __collapse_huge_page_swapin() checks the number of referenced PTE to
    decide if the memory range is hot enough to justify swapin.
    
    We have few problems with the approach:
    
     - It is way too late: we can do the check much earlier and safe time.
       khugepaged_scan_pmd() already knows if we have any pages to swap in
       and number of referenced page.
    
     - It stops collapse altogether if there's not enough referenced pages,
       not only swappingin.
    
    Fix it by making the right check early. We also can avoid additional
    page table scanning if khugepaged_scan_pmd() haven't found any swap
    entries.
    
    Fixes: 0db501f ("mm, thp: convert from optimistic swapin collapsing to conservative")
    Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Zi Yan <ziy@nvidia.com>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>
    Reviewed-by: Zi Yan <ziy@nvidia.com>
    Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Link: http://lkml.kernel.org/r/20200416160026.16538-3-kirill.shutemov@linux.intel.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    kiryl authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    ffe945e View commit details
    Browse the repository at this point in the history
  64. khugepaged: drain all LRU caches before scanning pages

    Having a page in LRU add cache offsets page refcount and gives
    false-negative on PageLRU().  It reduces collapse success rate.
    
    Drain all LRU add caches before scanning.  It happens relatively rare and
    should not disturb the system too much.
    
    Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Zi Yan <ziy@nvidia.com>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>
    Reviewed-by: Zi Yan <ziy@nvidia.com>
    Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Link: http://lkml.kernel.org/r/20200416160026.16538-4-kirill.shutemov@linux.intel.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    kiryl authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    a980df3 View commit details
    Browse the repository at this point in the history
  65. khugepaged: drain LRU add pagevec after swapin

    collapse_huge_page() tries to swap in pages that are part of the PMD
    range.  Just swapped in page goes though LRU add cache.  The cache gets
    extra reference on the page.
    
    The extra reference can lead to the collapse fail: the following
    __collapse_huge_page_isolate() would check refcount and abort collapse
    seeing unexpected refcount.
    
    The fix is to drain local LRU add cache in
    __collapse_huge_page_swapin() if we successfully swapped in any pages.
    
    Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Zi Yan <ziy@nvidia.com>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>
    Reviewed-by: Zi Yan <ziy@nvidia.com>
    Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Link: http://lkml.kernel.org/r/20200416160026.16538-5-kirill.shutemov@linux.intel.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    kiryl authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    ae2c5d8 View commit details
    Browse the repository at this point in the history
  66. khugepaged: allow to collapse a page shared across fork

    The page can be included into collapse as long as it doesn't have extra
    pins (from GUP or otherwise).
    
    Logic to check the refcount is moved to a separate function.  For pages in
    swap cache, add compound_nr(page) to the expected refcount, in order to
    handle the compound page case.  This is in preparation for the following
    patch.
    
    VM_BUG_ON_PAGE() was removed from __collapse_huge_page_copy() as the
    invariant it checks is no longer valid: the source can be mapped multiple
    times now.
    
    [yang.shi@linux.alibaba.com: remove error message when checking external pins]
      Link: http://lkml.kernel.org/r/1589317383-9595-1-git-send-email-yang.shi@linux.alibaba.com
    [cai@lca.pw: fix set-but-not-used warning]
      Link: http://lkml.kernel.org/r/20200521145644.GA6367@ovpn-112-192.phx2.redhat.com
    Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Zi Yan <ziy@nvidia.com>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>
    Reviewed-by: Zi Yan <ziy@nvidia.com>
    Reviewed-by: John Hubbard <jhubbard@nvidia.com>
    Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Link: http://lkml.kernel.org/r/20200416160026.16538-6-kirill.shutemov@linux.intel.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    kiryl authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    9445689 View commit details
    Browse the repository at this point in the history
  67. khugepaged: allow to collapse PTE-mapped compound pages

    We can collapse PTE-mapped compound pages.  We only need to avoid handling
    them more than once: lock/unlock page only once if it's present in the PMD
    range multiple times as it handled on compound level.  The same goes for
    LRU isolation and putback.
    
    Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Zi Yan <ziy@nvidia.com>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>
    Reviewed-by: Zi Yan <ziy@nvidia.com>
    Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Link: http://lkml.kernel.org/r/20200416160026.16538-7-kirill.shutemov@linux.intel.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    kiryl authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    5503fbf View commit details
    Browse the repository at this point in the history
  68. thp: change CoW semantics for anon-THP

    Currently we have different copy-on-write semantics for anon- and
    file-THP.  For anon-THP we try to allocate huge page on the write fault,
    but on file-THP we split PMD and allocate 4k page.
    
    Arguably, file-THP semantics is more desirable: we don't necessary want to
    unshare full PMD range from the parent on the first access.  This is the
    primary reason THP is unusable for some workloads, like Redis.
    
    The original THP refcounting didn't allow to have PTE-mapped compound
    pages, so we had no options, but to allocate huge page on CoW (with
    fallback to 512 4k pages).
    
    The current refcounting doesn't have such limitations and we can cut a lot
    of complex code out of fault path.
    
    khugepaged is now able to recover THP from such ranges if the
    configuration allows.
    
    Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Zi Yan <ziy@nvidia.com>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>
    Reviewed-by: Zi Yan <ziy@nvidia.com>
    Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Link: http://lkml.kernel.org/r/20200416160026.16538-8-kirill.shutemov@linux.intel.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    kiryl authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    3917c80 View commit details
    Browse the repository at this point in the history
  69. khugepaged: introduce 'max_ptes_shared' tunable

    'max_ptes_shared' specifies how many pages can be shared across multiple
    processes.  Exceeding the number would block the collapse::
    
    	/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_shared
    
    A higher value may increase memory footprint for some workloads.
    
    By default, at least half of pages has to be not shared.
    
    [colin.king@canonical.com: fix several spelling mistakes]
      Link: http://lkml.kernel.org/r/20200420084241.65433-1-colin.king@canonical.com
    Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Signed-off-by: Colin Ian King <colin.king@canonical.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Zi Yan <ziy@nvidia.com>
    Reviewed-by: William Kucharski <william.kucharski@oracle.com>
    Reviewed-by: Zi Yan <ziy@nvidia.com>
    Acked-by: Yang Shi <yang.shi@linux.alibaba.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Ralph Campbell <rcampbell@nvidia.com>
    Link: http://lkml.kernel.org/r/20200416160026.16538-9-kirill.shutemov@linux.intel.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    kiryl authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    71a2c11 View commit details
    Browse the repository at this point in the history
  70. hugetlbfs: add arch_hugetlb_valid_size

    Patch series "Clean up hugetlb boot command line processing", v4.
    
    Longpeng(Mike) reported a weird message from hugetlb command line
    processing and proposed a solution [1].  While the proposed patch does
    address the specific issue, there are other related issues in command line
    processing.  As hugetlbfs evolved, updates to command line processing have
    been made to meet immediate needs and not necessarily in a coordinated
    manner.  The result is that some processing is done in arch specific code,
    some is done in arch independent code and coordination is problematic.
    Semantics can vary between architectures.
    
    The patch series does the following:
    - Define arch specific arch_hugetlb_valid_size routine used to validate
      passed huge page sizes.
    - Move hugepagesz= command line parsing out of arch specific code and into
      an arch independent routine.
    - Clean up command line processing to follow desired semantics and
      document those semantics.
    
    [1] https://lore.kernel.org/linux-mm/20200305033014.1152-1-longpeng2@huawei.com
    
    This patch (of 3):
    
    The architecture independent routine hugetlb_default_setup sets up the
    default huge pages size.  It has no way to verify if the passed value is
    valid, so it accepts it and attempts to validate at a later time.  This
    requires undocumented cooperation between the arch specific and arch
    independent code.
    
    For architectures that support more than one huge page size, provide a
    routine arch_hugetlb_valid_size to validate a huge page size.
    hugetlb_default_setup can use this to validate passed values.
    
    arch_hugetlb_valid_size will also be used in a subsequent patch to move
    processing of the "hugepagesz=" in arch specific code to a common routine
    in arch independent code.
    
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[s390]
    Acked-by: Will Deacon <will@kernel.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Albert Ou <aou@eecs.berkeley.edu>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Vasily Gorbik <gor@linux.ibm.com>
    Cc: Christian Borntraeger <borntraeger@de.ibm.com>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Longpeng <longpeng2@huawei.com>
    Cc: Christophe Leroy <christophe.leroy@c-s.fr>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Nitesh Narayan Lal <nitesh@redhat.com>
    Cc: Anders Roxell <anders.roxell@linaro.org>
    Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
    Cc: Qian Cai <cai@lca.pw>
    Cc: Stephen Rothwell <sfr@canb.auug.org.au>
    Link: http://lkml.kernel.org/r/20200428205614.246260-1-mike.kravetz@oracle.com
    Link: http://lkml.kernel.org/r/20200428205614.246260-2-mike.kravetz@oracle.com
    Link: http://lkml.kernel.org/r/20200417185049.275845-1-mike.kravetz@oracle.com
    Link: http://lkml.kernel.org/r/20200417185049.275845-2-mike.kravetz@oracle.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    mjkravetz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    ae94da8 View commit details
    Browse the repository at this point in the history
  71. hugetlbfs: move hugepagesz= parsing to arch independent code

    Now that architectures provide arch_hugetlb_valid_size(), parsing of
    "hugepagesz=" can be done in architecture independent code.  Create a
    single routine to handle hugepagesz= parsing and remove all arch specific
    routines.  We can also remove the interface hugetlb_bad_size() as this is
    no longer used outside arch independent code.
    
    This also provides consistent behavior of hugetlbfs command line options.
    The hugepagesz= option should only be specified once for a specific size,
    but some architectures allow multiple instances.  This appears to be more
    of an oversight when code was added by some architectures to set up ALL
    huge pages sizes.
    
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Sandipan Das <sandipan@linux.ibm.com>
    Reviewed-by: Peter Xu <peterx@redhat.com>
    Acked-by: Mina Almasry <almasrymina@google.com>
    Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[s390]
    Acked-by: Will Deacon <will@kernel.org>
    Cc: Albert Ou <aou@eecs.berkeley.edu>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christian Borntraeger <borntraeger@de.ibm.com>
    Cc: Christophe Leroy <christophe.leroy@c-s.fr>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Longpeng <longpeng2@huawei.com>
    Cc: Nitesh Narayan Lal <nitesh@redhat.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vasily Gorbik <gor@linux.ibm.com>
    Cc: Anders Roxell <anders.roxell@linaro.org>
    Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
    Cc: Qian Cai <cai@lca.pw>
    Cc: Stephen Rothwell <sfr@canb.auug.org.au>
    Link: http://lkml.kernel.org/r/20200417185049.275845-3-mike.kravetz@oracle.com
    Link: http://lkml.kernel.org/r/20200428205614.246260-3-mike.kravetz@oracle.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    mjkravetz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    359f254 View commit details
    Browse the repository at this point in the history
  72. hugetlbfs: remove hugetlb_add_hstate() warning for existing hstate

    hugetlb_add_hstate() prints a warning if the hstate already exists.  This
    was originally done as part of kernel command line parsing.  If
    'hugepagesz=' was specified more than once, the warning
    
    	pr_warn("hugepagesz= specified twice, ignoring\n");
    
    would be printed.
    
    Some architectures want to enable all huge page sizes.  They would call
    hugetlb_add_hstate for all supported sizes.  However, this was done after
    command line processing and as a result hstates could have already been
    created for some sizes.  To make sure no warning were printed, there would
    often be code like:
    
    	if (!size_to_hstate(size)
    		hugetlb_add_hstate(ilog2(size) - PAGE_SHIFT)
    
    The only time we want to print the warning is as the result of command
    line processing.  So, remove the warning from hugetlb_add_hstate and add
    it to the single arch independent routine processing "hugepagesz=".  After
    this, calls to size_to_hstate() in arch specific code can be removed and
    hugetlb_add_hstate can be called without worrying about warning messages.
    
    [mike.kravetz@oracle.com: fix hugetlb initialization]
      Link: http://lkml.kernel.org/r/4c36c6ce-3774-78fa-abc4-b7346bf24348@oracle.com
      Link: http://lkml.kernel.org/r/20200428205614.246260-5-mike.kravetz@oracle.com
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Anders Roxell <anders.roxell@linaro.org>
    Acked-by: Mina Almasry <almasrymina@google.com>
    Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[s390]
    Acked-by: Will Deacon <will@kernel.org>
    Cc: Albert Ou <aou@eecs.berkeley.edu>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christian Borntraeger <borntraeger@de.ibm.com>
    Cc: Christophe Leroy <christophe.leroy@c-s.fr>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Longpeng <longpeng2@huawei.com>
    Cc: Nitesh Narayan Lal <nitesh@redhat.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vasily Gorbik <gor@linux.ibm.com>
    Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
    Cc: Qian Cai <cai@lca.pw>
    Cc: Stephen Rothwell <sfr@canb.auug.org.au>
    Link: http://lkml.kernel.org/r/20200417185049.275845-4-mike.kravetz@oracle.com
    Link: http://lkml.kernel.org/r/20200428205614.246260-4-mike.kravetz@oracle.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    mjkravetz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    3823783 View commit details
    Browse the repository at this point in the history
  73. hugetlbfs: clean up command line processing

    With all hugetlb page processing done in a single file clean up code.
    
    - Make code match desired semantics
      - Update documentation with semantics
    - Make all warnings and errors messages start with 'HugeTLB:'.
    - Consistently name command line parsing routines.
    - Warn if !hugepages_supported() and command line parameters have
      been specified.
    - Add comments to code
      - Describe some of the subtle interactions
      - Describe semantics of command line arguments
    
    This patch also fixes issues with implicitly setting the number of
    gigantic huge pages to preallocate.  Previously on X86 command line,
    
            hugepages=2 default_hugepagesz=1G
    
    would result in zero 1G pages being preallocated and,
    
            # grep HugePages_Total /proc/meminfo
            HugePages_Total:       0
            # sysctl -a | grep nr_hugepages
            vm.nr_hugepages = 2
            vm.nr_hugepages_mempolicy = 2
            # cat /proc/sys/vm/nr_hugepages
            2
    
    After this patch 2 gigantic pages will be preallocated and all the proc,
    sysfs, sysctl and meminfo files will accurately reflect this.
    
    To address the issue with gigantic pages, a small change in behavior was
    made to command line processing.  Previously the command line,
    
            hugepages=128 default_hugepagesz=2M hugepagesz=2M hugepages=256
    
    would result in the allocation of 256 2M huge pages.  The value 128 would
    be ignored without any warning.  After this patch, 128 2M pages will be
    allocated and a warning message will be displayed indicating the value of
    256 is ignored.  This change in behavior is required because allocation of
    implicitly specified gigantic pages must be done when the
    default_hugepagesz= is encountered for gigantic pages.  Previously the
    code waited until later in the boot process (hugetlb_init), to allocate
    pages of default size.  However the bootmem allocator required for
    gigantic allocations is not available at this time.
    
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Sandipan Das <sandipan@linux.ibm.com>
    Acked-by: Gerald Schaefer <gerald.schaefer@de.ibm.com>	[s390]
    Acked-by: Will Deacon <will@kernel.org>
    Cc: Albert Ou <aou@eecs.berkeley.edu>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Christian Borntraeger <borntraeger@de.ibm.com>
    Cc: Christophe Leroy <christophe.leroy@c-s.fr>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: David S. Miller <davem@davemloft.net>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Cc: Longpeng <longpeng2@huawei.com>
    Cc: Mina Almasry <almasrymina@google.com>
    Cc: Nitesh Narayan Lal <nitesh@redhat.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Peter Xu <peterx@redhat.com>
    Cc: Randy Dunlap <rdunlap@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Vasily Gorbik <gor@linux.ibm.com>
    Cc: Anders Roxell <anders.roxell@linaro.org>
    Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>
    Cc: Qian Cai <cai@lca.pw>
    Cc: Stephen Rothwell <sfr@canb.auug.org.au>
    Link: http://lkml.kernel.org/r/20200417185049.275845-5-mike.kravetz@oracle.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    mjkravetz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    282f421 View commit details
    Browse the repository at this point in the history
  74. hugetlbfs: fix changes to command line processing

    Previously, a check for hugepages_supported was added before processing
    hugetlb command line parameters.  On some architectures such as powerpc,
    hugepages_supported() is not set to true until after command line
    processing.  Therefore, no hugetlb command line parameters would be
    accepted.
    
    Remove the additional checks for hugepages_supported.  In hugetlb_init,
    print a warning if !hugepages_supported and command line parameters were
    specified.
    
    Reported-by: Sandipan Das <sandipan.osd@gmail.com>
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Stephen Rothwell <sfr@canb.auug.org.au>
    Link: http://lkml.kernel.org/r/b1f04f9f-fa46-c2a0-7693-4a0679d2a1ee@oracle.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    mjkravetz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    c2833a5 View commit details
    Browse the repository at this point in the history
  75. mm/hugetlb: avoid unnecessary check on pud and pmd entry in huge_pte_…

    …offset
    
    When huge_pte_offset() is called, the parameter sz can only be PUD_SIZE or
    PMD_SIZE.  If sz is PUD_SIZE and code can reach pud, then *pud must be
    none, or normal hugetlb entry, or non-present (migration or hwpoisoned)
    hugetlb entry, and we can directly return pud.  When sz is PMD_SIZE, pud
    must be none or present, and if code can reach pmd, we can directly return
    pmd.
    
    So after this patch the code is simplified by first check on the parameter
    sz, and avoid unnecessary checks in current code.  Same semantics of
    existing code is maintained.
    
    More details about relevant commits:
    commit 9b19df2 ("mm/hugetlb.c: make huge_pte_offset() consistent
    and document behaviour") changed the code path for pud and pmd handling,
    see comments about why this patch intends to change it.
    ...
    	pud = pud_offset(p4d, addr);
    	if (sz != PUD_SIZE && pud_none(*pud)) // [1]
    		return NULL;
    	/* hugepage or swap? */
    	if (pud_huge(*pud) || !pud_present(*pud)) // [2]
    		return (pte_t *)pud;
    
    	pmd = pmd_offset(pud, addr);
    	if (sz != PMD_SIZE && pmd_none(*pmd)) // [3]
    		return NULL;
    	/* hugepage or swap? */
    	if (pmd_huge(*pmd) || !pmd_present(*pmd)) // [4]
    		return (pte_t *)pmd;
    
    	return NULL; // [5]
    ...
    [1]: this is necessary, return NULL for sz == PMD_SIZE;
    [2]: if sz == PUD_SIZE, all valid values of pud entry will cause return;
    [3]: dead code, sz != PMD_SIZE never true;
    [4]: all valid values of pmd entry will cause return;
    [5]: dead code, because of check in [4].
    
    Now, this patch combines [1] and [2] for pud, and combines [3], [4] and
    [5] for pmd, so avoid unnecessary checks.
    
    I don't try to catch any invalid values in page table entry, as that will
    be checked by caller and avoid extra branch in this function.  Also no
    assert on sz must equal PUD_SIZE or PMD_SIZE, since this function only
    call for hugetlb mapping.
    
    For commit 3c1d7e6 ("mm/hugetlb: fix a addressing exception caused by
    huge_pte_offset"), since we don't read the entry more than once now,
    variable pud_entry and pmd_entry are not needed.
    
    Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Jason Gunthorpe <jgg@mellanox.com>
    Cc: Punit Agrawal <punit.agrawal@arm.com>
    Cc: Longpeng <longpeng2@huawei.com>
    Link: http://lkml.kernel.org/r/1587794313-16849-1-git-send-email-lixinhai.lxh@gmail.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Li Xinhai authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    8ac0b81 View commit details
    Browse the repository at this point in the history
  76. arm64/mm: drop __HAVE_ARCH_HUGE_PTEP_GET

    Patch series "mm/hugetlb: Add some new generic fallbacks", v3.
    
    This series adds the following new generic fallbacks.  Before that it
    drops __HAVE_ARCH_HUGE_PTEP_GET from arm64 platform.
    
    1. is_hugepage_only_range()
    2. arch_clear_hugepage_flags()
    
    After this arm (32 bit) remains the sole platform defining it's own
    huge_ptep_get() via __HAVE_ARCH_HUGE_PTEP_GET.
    
    This patch (of 3):
    
    Platform specific huge_ptep_get() is required only when fetching the huge
    PTE involves more than just dereferencing the page table pointer.  This is
    not the case on arm64 platform.  Hence huge_ptep_pte() can be dropped
    along with it's __HAVE_ARCH_HUGE_PTEP_GET subscription.  Before that, it
    updates the generic huge_ptep_get() with READ_ONCE() which will prevent
    known page table issues with THP on arm64.
    
    Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Christian Borntraeger <borntraeger@de.ibm.com>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Fenghua Yu <fenghua.yu@intel.com>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Rich Felker <dalias@libc.org>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Vasily Gorbik <gor@linux.ibm.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Link: http://lkml.kernel.org/r/1588907271-11920-1-git-send-email-anshuman.khandual@arm.com
    Link: http://lkml.kernel.org/r//1506527369-19535-1-git-send-email-will.deacon@arm.com/
    Link: http://lkml.kernel.org/r/1588907271-11920-2-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Anshuman Khandual authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    be51e3f View commit details
    Browse the repository at this point in the history
  77. mm/hugetlb: define a generic fallback for is_hugepage_only_range()

    There are multiple similar definitions for is_hugepage_only_range() on
    various platforms.  Lets just add it's generic fallback definition for
    platforms that do not override.  This help reduce code duplication.
    
    Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Fenghua Yu <fenghua.yu@intel.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Vasily Gorbik <gor@linux.ibm.com>
    Cc: Christian Borntraeger <borntraeger@de.ibm.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Cc: Rich Felker <dalias@libc.org>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Link: http://lkml.kernel.org/r/1588907271-11920-3-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Anshuman Khandual authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    b0eae98 View commit details
    Browse the repository at this point in the history
  78. mm/hugetlb: define a generic fallback for arch_clear_hugepage_flags()

    There are multiple similar definitions for arch_clear_hugepage_flags() on
    various platforms.  Lets just add it's generic fallback definition for
    platforms that do not override.  This help reduce code duplication.
    
    Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: Tony Luck <tony.luck@intel.com>
    Cc: Fenghua Yu <fenghua.yu@intel.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com>
    Cc: Helge Deller <deller@gmx.de>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
    Cc: Vasily Gorbik <gor@linux.ibm.com>
    Cc: Christian Borntraeger <borntraeger@de.ibm.com>
    Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
    Cc: Rich Felker <dalias@libc.org>
    Cc: "David S. Miller" <davem@davemloft.net>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Link: http://lkml.kernel.org/r/1588907271-11920-4-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Anshuman Khandual authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    5be9934 View commit details
    Browse the repository at this point in the history
  79. mm: simplify calling a compound page destructor

    None of the three callers of get_compound_page_dtor() want to know the
    value; they just want to call the function.  Replace it with
    destroy_compound_page() which calls the dtor for them.
    
    Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Reviewed-by: David Hildenbrand <david@redhat.com>
    Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Link: http://lkml.kernel.org/r/20200517105051.9352-1-willy@infradead.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Matthew Wilcox (Oracle) authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    ff45fc3 View commit details
    Browse the repository at this point in the history
  80. mm/vmscan.c: use update_lru_size() in update_lru_sizes()

    We already defined the helper update_lru_size().
    
    Let's use this to reduce code duplication.
    
    Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Baoquan He <bhe@redhat.com>
    Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Link: http://lkml.kernel.org/r/20200331221550.1011-1-richard.weiyang@gmail.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    RichardWeiYang authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    a892cb6 View commit details
    Browse the repository at this point in the history
  81. mm/vmscan: count layzfree pages and fix nr_isolated_* mismatch

    Fix an nr_isolate_* mismatch problem between cma and dirty lazyfree pages.
    
    If try_to_unmap_one is used for reclaim and it detects a dirty lazyfree
    page, then the lazyfree page is changed to a normal anon page having
    SwapBacked by commit 802a3a9 ("mm: reclaim MADV_FREE pages").  Even
    with the change, reclaim context correctly counts isolated files because
    it uses is_file_lru to distinguish file.  And the change to anon is not
    happened if try_to_unmap_one is used for migration.  So migration context
    like compaction also correctly counts isolated files even though it uses
    page_is_file_lru insted of is_file_lru.  Recently page_is_file_cache was
    renamed to page_is_file_lru by commit 9de4f22 ("mm: code cleanup for
    MADV_FREE").
    
    But the nr_isolate_* mismatch problem happens on cma alloc.  There is
    reclaim_clean_pages_from_list which is being used only by cma.  It was
    introduced by commit 02c6de8 ("mm: cma: discard clean pages during
    contiguous allocation instead of migration") to reclaim clean file pages
    without migration.  The cma alloc uses both reclaim_clean_pages_from_list
    and migrate_pages, and it uses page_is_file_lru to count isolated files.
    If there are dirty lazyfree pages allocated from cma memory region, the
    pages are counted as isolated file at the beginging but are counted as
    isolated anon after finished.
    
    Mem-Info:
    Node 0 active_anon:3045904kB inactive_anon:611448kB active_file:14892kB inactive_file:205636kB unevictable:10416kB isolated(anon):0kB isolated(file):37664kB mapped:630216kB dirty:384kB writeback:0kB shmem:42576kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no
    
    Like log above, there were too much isolated files, 37664kB, which
    triggers too_many_isolated in reclaim even when there is no actually
    isolated file in system wide.  It could be reproducible by running two
    programs, writing on MADV_FREE page and doing cma alloc, respectively.
    Although isolated anon is 0, I found that the internal value of isolated
    anon was the negative value of isolated file.
    
    Fix this by compensating the isolated count for both LRU lists.  Count
    non-discarded lazyfree pages in shrink_page_list, then compensate the
    counted number in reclaim_clean_pages_from_list.
    
    Reported-by: Yong-Taek Lee <ytk.lee@samsung.com>
    Suggested-by: Minchan Kim <minchan@kernel.org>
    Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: Minchan Kim <minchan@kernel.org>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Johannes Weiner <hannes@cmpxchg.org>
    Cc: Marek Szyprowski <m.szyprowski@samsung.com>
    Cc: Michal Nazarewicz <mina86@mina86.com>
    Cc: Shaohua Li <shli@fb.com>
    Link: http://lkml.kernel.org/r/20200426011718.30246-1-jaewon31.kim@samsung.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Jaewon31Kim authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    1f318a9 View commit details
    Browse the repository at this point in the history
  82. mm/vmscan.c: change prototype for shrink_page_list

    commit 3c710c1 ("mm, vmscan extract shrink_page_list reclaim counters
    into a struct") changed data type for the function, so changing return
    type for funciton and its caller.
    
    Signed-off-by: Vaneet Narang <v.narang@samsung.com>
    Signed-off-by: Maninder Singh <maninder1.s@samsung.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Cc: Amit Sahrawat <a.sahrawat@samsung.com>
    Cc: Mel Gorman <mgorman@suse.de>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Link: http://lkml.kernel.org/r/1588168259-25604-1-git-send-email-maninder1.s@samsung.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    maninder42 authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    730ec8c View commit details
    Browse the repository at this point in the history
  83. mm/vmscan: update the comment of should_continue_reclaim()

    try_to_compact_zone() has been replaced by try_to_compact_pages(), which
    is necessary to be updated in the comment of should_continue_reclaim().
    
    Signed-off-by: Qiwu Chen <chenqiwu@xiaomi.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
    Link: http://lkml.kernel.org/r/20200501034907.22991-1-chenqiwu@xiaomi.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Qiwu Chen authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    df3a45f View commit details
    Browse the repository at this point in the history
  84. mm: fix NUMA node file count error in replace_page_cache()

    Patch series "mm: memcontrol: charge swapin pages on instantiation", v2.
    
    This patch series reworks memcg to charge swapin pages directly at
    swapin time, rather than at fault time, which may be much later, or
    not happen at all.
    
    Changes in version 2:
    - prevent double charges on pre-allocated hugepages in khugepaged
    - leave shmem swapcache when charging fails to avoid double IO (Joonsoo)
    - fix temporary accounting bug by switching rmap<->commit (Joonsoo)
    - fix double swap charge bug in cgroup1/cgroup2 code gating
    - simplify swapin error checking (Joonsoo)
    - mm: memcontrol: document the new swap control behavior (Alex)
    - review tags
    
    The delayed swapin charging scheme we have right now causes problems:
    
    - Alex's per-cgroup lru_lock patches rely on pages that have been
      isolated from the LRU to have a stable page->mem_cgroup; otherwise
      the lock may change underneath him. Swapcache pages are charged only
      after they are added to the LRU, and charging doesn't follow the LRU
      isolation protocol.
    
    - Joonsoo's anon workingset patches need a suitable LRU at the time
      the page enters the swap cache and displaces the non-resident
      info. But the correct LRU is only available after charging.
    
    - It's a containment hole / DoS vector. Users can trigger arbitrarily
      large swap readahead using MADV_WILLNEED. The memory is never
      charged unless somebody actually touches it.
    
    - It complicates the page->mem_cgroup stabilization rules
    
    In order to charge pages directly at swapin time, the memcg code base
    needs to be prepared, and several overdue cleanups become a necessity:
    
    To charge pages at swapin time, we need to always have cgroup
    ownership tracking of swap records. We also cannot rely on
    page->mapping to tell apart page types at charge time, because that's
    only set up during a page fault.
    
    To eliminate the page->mapping dependency, memcg needs to ditch its
    private page type counters (MEMCG_CACHE, MEMCG_RSS, NR_SHMEM) in favor
    of the generic vmstat counters and accounting sites, such as
    NR_FILE_PAGES, NR_ANON_MAPPED etc.
    
    To switch to generic vmstat counters, the charge sequence must be
    adjusted such that page->mem_cgroup is set up by the time these
    counters are modified.
    
    The series is structured as follows:
    
    1. Bug fixes
    2. Decoupling charging from rmap
    3. Swap controller integration into memcg
    4. Direct swapin charging
    
    This patch (of 19):
    
    When replacing one page with another one in the cache, we have to decrease
    the file count of the old page's NUMA node and increase the one of the new
    NUMA node, otherwise the old node leaks the count and the new node
    eventually underflows its counter.
    
    Fixes: 74d6095 ("page cache: Add and replace pages using the XArray")
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
    Reviewed-by: Shakeel Butt <shakeelb@google.com>
    Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Reviewed-by: Balbir Singh <bsingharora@gmail.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Roman Gushchin <guro@fb.com>
    Link: http://lkml.kernel.org/r/20200508183105.225460-1-hannes@cmpxchg.org
    Link: http://lkml.kernel.org/r/20200508183105.225460-2-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    f4129ea View commit details
    Browse the repository at this point in the history
  85. mm: memcontrol: fix stat-corrupting race in charge moving

    The move_lock is a per-memcg lock, but the VM accounting code that needs
    to acquire it comes from the page and follows page->mem_cgroup under RCU
    protection.  That means that the page becomes unlocked not when we drop
    the move_lock, but when we update page->mem_cgroup.  And that assignment
    doesn't imply any memory ordering.  If that pointer write gets reordered
    against the reads of the page state - page_mapped, PageDirty etc.  the
    state may change while we rely on it being stable and we can end up
    corrupting the counters.
    
    Place an SMP memory barrier to make sure we're done with all page state by
    the time the new page->mem_cgroup becomes visible.
    
    Also replace the open-coded move_lock with a lock_page_memcg() to make it
    more obvious what we're serializing against.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Reviewed-by: Shakeel Butt <shakeelb@google.com>
    Cc: Alex Shi <alex.shi@linux.alibaba.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Link: http://lkml.kernel.org/r/20200508183105.225460-3-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    abb242f View commit details
    Browse the repository at this point in the history
  86. mm: memcontrol: drop @Compound parameter from memcg charging API

    The memcg charging API carries a boolean @Compound parameter that tells
    whether the page we're dealing with is a hugepage.
    mem_cgroup_commit_charge() has another boolean @lrucare that indicates
    whether the page needs LRU locking or not while charging.  The majority of
    callsites know those parameters at compile time, which results in a lot of
    naked "false, false" argument lists.  This makes for cryptic code and is a
    breeding ground for subtle mistakes.
    
    Thankfully, the huge page state can be inferred from the page itself and
    doesn't need to be passed along.  This is safe because charging completes
    before the page is published and somebody may split it.
    
    Simplify the callsites by removing @Compound, and let memcg infer the
    state by using hpage_nr_pages() unconditionally.  That function does
    PageTransHuge() to identify huge pages, which also helpfully asserts that
    nobody passes in tail pages by accident.
    
    The following patches will introduce a new charging API, best not to carry
    over unnecessary weight.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
    Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Reviewed-by: Shakeel Butt <shakeelb@google.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Link: http://lkml.kernel.org/r/20200508183105.225460-4-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    3fba69a View commit details
    Browse the repository at this point in the history
  87. mm: shmem: remove rare optimization when swapin races with hole punching

    Commit 215c02b ("tmpfs: fix shmem_getpage_gfp() VM_BUG_ON")
    recognized that hole punching can race with swapin and removed the
    BUG_ON() for a truncated entry from the swapin path.
    
    The patch also added a swapcache deletion to optimize this rare case:
    Since swapin has the page locked, and free_swap_and_cache() merely
    trylocks, this situation can leave the page stranded in swapcache.
    Usually, page reclaim picks up stale swapcache pages, and the race can
    happen at any other time when the page is locked.  (The same happens for
    non-shmem swapin racing with page table zapping.) The thinking here was:
    we already observed the race and we have the page locked, we may as well
    do the cleanup instead of waiting for reclaim.
    
    However, this optimization complicates the next patch which moves the
    cgroup charging code around.  As this is just a minor speedup for a race
    condition that is so rare that it required a fuzzer to trigger the
    original BUG_ON(), it's no longer worth the complications.
    
    Suggested-by: Hugh Dickins <hughd@google.com>
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: Hugh Dickins <hughd@google.com>
    Cc: Alex Shi <alex.shi@linux.alibaba.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Link: http://lkml.kernel.org/r/20200511181056.GA339505@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    14235ab View commit details
    Browse the repository at this point in the history
  88. mm: memcontrol: move out cgroup swaprate throttling

    The cgroup swaprate throttling is about matching new anon allocations to
    the rate of available IO when that is being throttled.  It's the io
    controller hooking into the VM, rather than a memory controller thing.
    
    Rename mem_cgroup_throttle_swaprate() to cgroup_throttle_swaprate(), and
    drop the @memcg argument which is only used to check whether the preceding
    page charge has succeeded and the fault is proceeding.
    
    We could decouple the call from mem_cgroup_try_charge() here as well, but
    that would cause unnecessary churn: the following patches convert all
    callsites to a new charge API and we'll decouple as we go along.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
    Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Reviewed-by: Shakeel Butt <shakeelb@google.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Link: http://lkml.kernel.org/r/20200508183105.225460-5-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    6caa6a0 View commit details
    Browse the repository at this point in the history
  89. mm: memcontrol: convert page cache to a new mem_cgroup_charge() API

    The try/commit/cancel protocol that memcg uses dates back to when pages
    used to be uncharged upon removal from the page cache, and thus couldn't
    be committed before the insertion had succeeded.  Nowadays, pages are
    uncharged when they are physically freed; it doesn't matter whether the
    insertion was successful or not.  For the page cache, the transaction
    dance has become unnecessary.
    
    Introduce a mem_cgroup_charge() function that simply charges a newly
    allocated page to a cgroup and sets up page->mem_cgroup in one single
    step.  If the insertion fails, the caller doesn't have to do anything but
    free/put the page.
    
    Then switch the page cache over to this new API.
    
    Subsequent patches will also convert anon pages, but it needs a bit more
    prep work.  Right now, memcg depends on page->mapping being already set up
    at the time of charging, so that it can maintain its own MEMCG_CACHE and
    MEMCG_RSS counters.  For anon, page->mapping is set under the same pte
    lock under which the page is publishd, so a single charge point that can
    block doesn't work there just yet.
    
    The following prep patches will replace the private memcg counters with
    the generic vmstat counters, thus removing the page->mapping dependency,
    then complete the transition to the new single-point charge API and delete
    the old transactional scheme.
    
    v2: leave shmem swapcache when charging fails to avoid double IO (Joonsoo)
    v3: rebase on preceeding shmem simplification patch
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Link: http://lkml.kernel.org/r/20200508183105.225460-6-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    3fea5a4 View commit details
    Browse the repository at this point in the history
  90. mm: memcontrol: prepare uncharging for removal of private page type c…

    …ounters
    
    The uncharge batching code adds up the anon, file, kmem counts to
    determine the total number of pages to uncharge and references to drop.
    But the next patches will remove the anon and file counters.
    
    Maintain an aggregate nr_pages in the uncharge_gather struct.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
    Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Link: http://lkml.kernel.org/r/20200508183105.225460-7-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    9f762db View commit details
    Browse the repository at this point in the history
  91. mm: memcontrol: prepare move_account for removal of private page type…

    … counters
    
    When memcg uses the generic vmstat counters, it doesn't need to do
    anything at charging and uncharging time.  It does, however, need to
    migrate counts when pages move to a different cgroup in move_account.
    
    Prepare the move_account function for the arrival of NR_FILE_PAGES,
    NR_ANON_MAPPED, NR_ANON_THPS etc.  by having a branch for files and a
    branch for anon, which can then divided into sub-branches.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
    Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Link: http://lkml.kernel.org/r/20200508183105.225460-8-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    49e50d2 View commit details
    Browse the repository at this point in the history
  92. mm: memcontrol: prepare cgroup vmstat infrastructure for native anon …

    …counters
    
    Anonymous compound pages can be mapped by ptes, which means that if we
    want to track NR_MAPPED_ANON, NR_ANON_THPS on a per-cgroup basis, we have
    to be prepared to see tail pages in our accounting functions.
    
    Make mod_lruvec_page_state() and lock_page_memcg() deal with tail pages
    correctly, namely by redirecting to the head page which has the
    page->mem_cgroup set up.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Alex Shi <alex.shi@linux.alibaba.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Link: http://lkml.kernel.org/r/20200508183105.225460-9-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    9da7b52 View commit details
    Browse the repository at this point in the history
  93. mm: memcontrol: switch to native NR_FILE_PAGES and NR_SHMEM counters

    Memcg maintains private MEMCG_CACHE and NR_SHMEM counters.  This
    divergence from the generic VM accounting means unnecessary code overhead,
    and creates a dependency for memcg that page->mapping is set up at the
    time of charging, so that page types can be told apart.
    
    Convert the generic accounting sites to mod_lruvec_page_state and friends
    to maintain the per-cgroup vmstat counters of NR_FILE_PAGES and NR_SHMEM.
    The page is already locked in these places, so page->mem_cgroup is stable;
    we only need minimal tweaks of two mem_cgroup_migrate() calls to ensure
    it's set up in time.
    
    Then replace MEMCG_CACHE with NR_FILE_PAGES and delete the private
    NR_SHMEM accounting sites.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Alex Shi <alex.shi@linux.alibaba.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Link: http://lkml.kernel.org/r/20200508183105.225460-10-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    0d1c207 View commit details
    Browse the repository at this point in the history
  94. mm: memcontrol: switch to native NR_ANON_MAPPED counter

    Memcg maintains a private MEMCG_RSS counter.  This divergence from the
    generic VM accounting means unnecessary code overhead, and creates a
    dependency for memcg that page->mapping is set up at the time of charging,
    so that page types can be told apart.
    
    Convert the generic accounting sites to mod_lruvec_page_state and friends
    to maintain the per-cgroup vmstat counter of NR_ANON_MAPPED.  We use
    lock_page_memcg() to stabilize page->mem_cgroup during rmap changes, the
    same way we do for NR_FILE_MAPPED.
    
    With the previous patch removing MEMCG_CACHE and the private NR_SHMEM
    counter, this patch finally eliminates the need to have page->mapping set
    up at charge time.  However, we need to have page->mem_cgroup set up by
    the time rmap runs and does the accounting, so switch the commit and the
    rmap callbacks around.
    
    v2: fix temporary accounting bug by switching rmap<->commit (Joonsoo)
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Alex Shi <alex.shi@linux.alibaba.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Link: http://lkml.kernel.org/r/20200508183105.225460-11-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    be5d0a7 View commit details
    Browse the repository at this point in the history
  95. mm: memcontrol: switch to native NR_ANON_THPS counter

    With rmap memcg locking already in place for NR_ANON_MAPPED, it's just a
    small step to remove the MEMCG_RSS_HUGE wart and switch memcg to the
    native NR_ANON_THPS accounting sites.
    
    [hannes@cmpxchg.org: fixes]
      Link: http://lkml.kernel.org/r/20200512121750.GA397968@cmpxchg.org
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
    Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Acked-by: Randy Dunlap <rdunlap@infradead.org>	[build-tested]
    Cc: Alex Shi <alex.shi@linux.alibaba.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Link: http://lkml.kernel.org/r/20200508183105.225460-12-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    468c398 View commit details
    Browse the repository at this point in the history
  96. mm: memcontrol: convert anon and file-thp to new mem_cgroup_charge() API

    With the page->mapping requirement gone from memcg, we can charge anon and
    file-thp pages in one single step, right after they're allocated.
    
    This removes two out of three API calls - especially the tricky commit
    step that needed to happen at just the right time between when the page is
    "set up" and when it's "published" - somewhat vague and fluid concepts
    that varied by page type.  All we need is a freshly allocated page and a
    memcg context to charge.
    
    v2: prevent double charges on pre-allocated hugepages in khugepaged
    
    [hannes@cmpxchg.org: Fix crash - *hpage could be ERR_PTR instead of NULL]
      Link: http://lkml.kernel.org/r/20200512215813.GA487759@cmpxchg.org
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Alex Shi <alex.shi@linux.alibaba.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Cc: Qian Cai <cai@lca.pw>
    Link: http://lkml.kernel.org/r/20200508183105.225460-13-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    9d82c69 View commit details
    Browse the repository at this point in the history
  97. mm: memcontrol: drop unused try/commit/cancel charge API

    There are no more users. RIP in peace.
    
    [arnd@arndb.de: fix an unused-function warning]
      Link: http://lkml.kernel.org/r/20200528095640.151454-1-arnd@arndb.de
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Arnd Bergmann <arnd@arndb.de>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Alex Shi <alex.shi@linux.alibaba.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Link: http://lkml.kernel.org/r/20200508183105.225460-14-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    f0e45fb View commit details
    Browse the repository at this point in the history
  98. mm: memcontrol: prepare swap controller setup for integration

    A few cleanups to streamline the swap controller setup:
    
    - Replace the do_swap_account flag with cgroup_memory_noswap. This
      brings it in line with other functionality that is usually available
      unless explicitly opted out of - nosocket, nokmem.
    
    - Remove the really_do_swap_account flag that stores the boot option
      and is later used to switch the do_swap_account. It's not clear why
      this indirection is/was necessary. Use do_swap_account directly.
    
    - Minor coding style polishing
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Alex Shi <alex.shi@linux.alibaba.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Link: http://lkml.kernel.org/r/20200508183105.225460-15-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    eccb52e View commit details
    Browse the repository at this point in the history
  99. mm: memcontrol: make swap tracking an integral part of memory control

    Without swap page tracking, users that are otherwise memory controlled can
    easily escape their containment and allocate significant amounts of memory
    that they're not being charged for.  That's because swap does readahead,
    but without the cgroup records of who owned the page at swapout, readahead
    pages don't get charged until somebody actually faults them into their
    page table and we can identify an owner task.  This can be maliciously
    exploited with MADV_WILLNEED, which triggers arbitrary readahead
    allocations without charging the pages.
    
    Make swap swap page tracking an integral part of memcg and remove the
    Kconfig options.  In the first place, it was only made configurable to
    allow users to save some memory.  But the overhead of tracking cgroup
    ownership per swap page is minimal - 2 byte per page, or 512k per 1G of
    swap, or 0.04%.  Saving that at the expense of broken containment
    semantics is not something we should present as a coequal option.
    
    The swapaccount=0 boot option will continue to exist, and it will
    eliminate the page_counter overhead and hide the swap control files, but
    it won't disable swap slot ownership tracking.
    
    This patch makes sure we always have the cgroup records at swapin time;
    the next patch will fix the actual bug by charging readahead swap pages at
    swapin time rather than at fault time.
    
    v2: fix double swap charge bug in cgroup1/cgroup2 code gating
    
    [hannes@cmpxchg.org: fix crash with cgroup_disable=memory]
      Link: http://lkml.kernel.org/r/20200521215855.GB815153@cmpxchg.org
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Alex Shi <alex.shi@linux.alibaba.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Cc: Naresh Kamboju <naresh.kamboju@linaro.org>
    Link: http://lkml.kernel.org/r/20200508183105.225460-16-hannes@cmpxchg.org
    Debugged-by: Hugh Dickins <hughd@google.com>
    Debugged-by: Michal Hocko <mhocko@kernel.org>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    2d1c498 View commit details
    Browse the repository at this point in the history
  100. mm: memcontrol: charge swapin pages on instantiation

    Right now, users that are otherwise memory controlled can easily escape
    their containment and allocate significant amounts of memory that they're
    not being charged for.  That's because swap readahead pages are not being
    charged until somebody actually faults them into their page table.  This
    can be exploited with MADV_WILLNEED, which triggers arbitrary readahead
    allocations without charging the pages.
    
    There are additional problems with the delayed charging of swap pages:
    
    1. To implement refault/workingset detection for anonymous pages, we
       need to have a target LRU available at swapin time, but the LRU is not
       determinable until the page has been charged.
    
    2. To implement per-cgroup LRU locking, we need page->mem_cgroup to be
       stable when the page is isolated from the LRU; otherwise, the locks
       change under us.  But swapcache gets charged after it's already on the
       LRU, and even if we cannot isolate it ourselves (since charging is not
       exactly optional).
    
    The previous patch ensured we always maintain cgroup ownership records for
    swap pages.  This patch moves the swapcache charging point from the fault
    handler to swapin time to fix all of the above problems.
    
    v2: simplify swapin error checking (Joonsoo)
    
    [hughd@google.com: fix livelock in __read_swap_cache_async()]
      Link: http://lkml.kernel.org/r/alpine.LSU.2.11.2005212246080.8458@eggly.anvils
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Hugh Dickins <hughd@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Cc: Rafael Aquini <aquini@redhat.com>
    Cc: Alex Shi <alex.shi@linux.alibaba.com>
    Link: http://lkml.kernel.org/r/20200508183105.225460-17-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    4c6355b View commit details
    Browse the repository at this point in the history
  101. mm: memcontrol: document the new swap control behavior

    Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Link: http://lkml.kernel.org/r/20200508183105.225460-18-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    alexshi authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    0a27cae View commit details
    Browse the repository at this point in the history
  102. mm: memcontrol: delete unused lrucare handling

    Swapin faults were the last event to charge pages after they had already
    been put on the LRU list.  Now that we charge directly on swapin, the
    lrucare portion of the charge code is unused.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Alex Shi <alex.shi@linux.alibaba.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Link: http://lkml.kernel.org/r/20200508183105.225460-19-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    d9eb1ea View commit details
    Browse the repository at this point in the history
  103. mm: memcontrol: update page->mem_cgroup stability rules

    The previous patches have simplified the access rules around
    page->mem_cgroup somewhat:
    
    1. We never change page->mem_cgroup while the page is isolated by
       somebody else.  This was by far the biggest exception to our rules and
       it didn't stop at lock_page() or lock_page_memcg().
    
    2. We charge pages before they get put into page tables now, so the
       somewhat fishy rule about "can be in page table as long as it's still
       locked" is now gone and boiled down to having an exclusive reference to
       the page.
    
    Document the new rules.  Any of the following will stabilize the
    page->mem_cgroup association:
    
    - the page lock
    - LRU isolation
    - lock_page_memcg()
    - exclusive access to the page
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Alex Shi <alex.shi@linux.alibaba.com>
    Reviewed-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Roman Gushchin <guro@fb.com>
    Cc: Shakeel Butt <shakeelb@google.com>
    Cc: Balbir Singh <bsingharora@gmail.com>
    Link: http://lkml.kernel.org/r/20200508183105.225460-20-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    a0b5b41 View commit details
    Browse the repository at this point in the history
  104. mm: fix LRU balancing effect of new transparent huge pages

    The reclaim code that balances between swapping and cache reclaim tries to
    predict likely reuse based on in-memory reference patterns alone.  This
    works in many cases, but when it fails it cannot detect when the cache is
    thrashing pathologically, or when we're in the middle of a swap storm.
    
    The high seek cost of rotational drives under which the algorithm evolved
    also meant that mistakes could quickly result in lockups from too
    aggressive swapping (which is predominantly random IO).  As a result, the
    balancing code has been tuned over time to a point where it mostly goes
    for page cache and defers swapping until the VM is under significant
    memory pressure.
    
    The resulting strategy doesn't make optimal caching decisions - where
    optimal is the least amount of IO required to execute the workload.
    
    The proliferation of fast random IO devices such as SSDs, in-memory
    compression such as zswap, and persistent memory technologies on the
    horizon, has made this undesirable behavior very noticable: Even in the
    presence of large amounts of cold anonymous memory and a capable swap
    device, the VM refuses to even seriously scan these pages, and can leave
    the page cache thrashing needlessly.
    
    This series sets out to address this.  Since commit ("a528910e12ec mm:
    thrash detection-based file cache sizing") we have exact tracking of
    refault IO - the ultimate cost of reclaiming the wrong pages.  This allows
    us to use an IO cost based balancing model that is more aggressive about
    scanning anonymous memory when the cache is thrashing, while being able to
    avoid unnecessary swap storms.
    
    These patches base the LRU balance on the rate of refaults on each list,
    times the relative IO cost between swap device and filesystem
    (swappiness), in order to optimize reclaim for least IO cost incurred.
    
    	History
    
    I floated these changes in 2016.  At the time they were incomplete and
    full of workarounds due to a lack of infrastructure in the reclaim code:
    We didn't have PageWorkingset, we didn't have hierarchical cgroup
    statistics, and problems with the cgroup swap controller.  As swapping
    wasn't too high a priority then, the patches stalled out.  With all
    dependencies in place now, here we are again with much cleaner,
    feature-complete patches.
    
    I kept the acks for patches that stayed materially the same :-)
    
    Below is a series of test results that demonstrate certain problematic
    behavior of the current code, as well as showcase the new code's more
    predictable and appropriate balancing decisions.
    
    	Test #1: No convergence
    
    This test shows an edge case where the VM currently doesn't converge at
    all on a new file workingset with a stale anon/tmpfs set.
    
    The test sets up a cold anon set the size of 3/4 RAM, then tries to
    establish a new file set half the size of RAM (flat access pattern).
    
    The vanilla kernel refuses to even scan anon pages and never converges.
    The file set is perpetually served from the filesystem.
    
    The first test kernel is with the series up to the workingset patch
    applied.  This allows thrashing page cache to challenge the anonymous
    workingset.  The VM then scans the lists based on the current
    scanned/rotated balancing algorithm.  It converges on a stable state where
    all cold anon pages are pushed out and the fileset is served entirely from
    cache:
    
    			    noconverge/5.7-rc5-mm	noconverge/5.7-rc5-mm-workingset
    Scanned			417719308.00 (    +0.00%)		64091155.00 (   -84.66%)
    Reclaimed		417711094.00 (    +0.00%)		61640308.00 (   -85.24%)
    Reclaim efficiency %	      100.00 (    +0.00%)		      96.18 (    -3.78%)
    Scanned file		417719308.00 (    +0.00%)		59211118.00 (   -85.83%)
    Scanned anon			0.00 (    +0.00%)	         4880037.00 (          )
    Swapouts			0.00 (    +0.00%)	         2439957.00 (          )
    Swapins				0.00 (    +0.00%)		     257.00 (          )
    Refaults		415246605.00 (    +0.00%)		59183722.00 (   -85.75%)
    Restore refaults		0.00 (    +0.00%)	        54988252.00 (          )
    
    The second test kernel is with the full patch series applied, which
    replaces the scanned/rotated ratios with refault/swapin rate-based
    balancing.  It evicts the cold anon pages more aggressively in the
    presence of a thrashing cache and the absence of swapins, and so converges
    with about 60% of the IO and reclaim activity:
    
    			noconverge/5.7-rc5-mm-workingset	noconverge/5.7-rc5-mm-lrubalance
    Scanned				64091155.00 (    +0.00%)		37579741.00 (   -41.37%)
    Reclaimed			61640308.00 (    +0.00%)		35129293.00 (   -43.01%)
    Reclaim efficiency %		      96.18 (    +0.00%)		      93.48 (    -2.78%)
    Scanned file			59211118.00 (    +0.00%)		32708385.00 (   -44.76%)
    Scanned anon			 4880037.00 (    +0.00%)		 4871356.00 (    -0.18%)
    Swapouts			 2439957.00 (    +0.00%)		 2435565.00 (    -0.18%)
    Swapins				     257.00 (    +0.00%)		     262.00 (    +1.94%)
    Refaults			59183722.00 (    +0.00%)		32675667.00 (   -44.79%)
    Restore refaults		54988252.00 (    +0.00%)		28480430.00 (   -48.21%)
    
    We're triggering this case in host sideloading scenarios: When a host's
    primary workload is not saturating the machine (primary load is usually
    driven by user activity), we can optimistically sideload a batch job; if
    user activity picks up and the primary workload needs the whole host
    during this time, we freeze the sideload and rely on it getting pushed to
    swap.  Frequently that swapping doesn't happen and the completely inactive
    sideload simply stays resident while the expanding primary worklad is
    struggling to gain ground.
    
    	Test #2: Kernel build
    
    This test is a a kernel build that is slightly memory-restricted (make -j4
    inside a 400M cgroup).
    
    Despite the very aggressive swapping of cold anon pages in test #1, this
    test shows that the new kernel carefully balances swap against cache
    refaults when both the file and the cache set are pressured.
    
    It shows the patched kernel to be slightly better at finding the coldest
    memory from the combined anon and file set to evict under pressure.  The
    result is lower aggregate reclaim and paging activity:
    
    z				    5.7-rc5-mm	5.7-rc5-mm-lrubalance
    Real time		   210.60 (    +0.00%)	   210.97 (    +0.18%)
    User time		   745.42 (    +0.00%)	   746.48 (    +0.14%)
    System time		    69.78 (    +0.00%)	    69.79 (    +0.02%)
    Scanned file		354682.00 (    +0.00%)	293661.00 (   -17.20%)
    Scanned anon		465381.00 (    +0.00%)	378144.00 (   -18.75%)
    Swapouts		185920.00 (    +0.00%)	147801.00 (   -20.50%)
    Swapins			 34583.00 (    +0.00%)	 32491.00 (    -6.05%)
    Refaults		212664.00 (    +0.00%)	172409.00 (   -18.93%)
    Restore refaults	 48861.00 (    +0.00%)	 80091.00 (   +63.91%)
    Total paging IO		433167.00 (    +0.00%)	352701.00 (   -18.58%)
    
    	Test #3: Overload
    
    This next test is not about performance, but rather about the
    predictability of the algorithm.  The current balancing behavior doesn't
    always lead to comprehensible results, which makes performance analysis
    and parameter tuning (swappiness e.g.) very difficult.
    
    The test shows the balancing behavior under equivalent anon and file
    input.  Anon and file sets are created of equal size (3/4 RAM), have the
    same access patterns (a hot-cold gradient), and synchronized access rates.
    Swappiness is raised from the default of 60 to 100 to indicate equal IO
    cost between swap and cache.
    
    With the vanilla balancing code, anon scans make up around 9% of the total
    pages scanned, or a ~1:10 ratio.  This is a surprisingly skewed ratio, and
    it's an outcome that is hard to explain given the input parameters to the
    VM.
    
    The new balancing model targets a 1:2 balance: All else being equal,
    reclaiming a file page costs one page IO - the refault; reclaiming an anon
    page costs two IOs - the swapout and the swapin.  In the test we observe a
    ~1:3 balance.
    
    The scanned and paging IO numbers indicate that the anon LRU algorithm we
    have in place right now does a slightly worse job at picking the coldest
    pages compared to the file algorithm.  There is ongoing work to improve
    this, like Joonsoo's anon workingset patches; however, it's difficult to
    compare the two aging strategies when the balancing between them is
    behaving unintuitively.
    
    The slightly less efficient anon reclaim results in a deviation from the
    optimal 1:2 scan ratio we would like to see here - however, 1:3 is much
    closer to what we'd want to see in this test than the vanilla kernel's
    aging of 10+ cache pages for every anonymous one:
    
    			overload-100/5.7-rc5-mm-workingset	overload-100/5.7-rc5-mm-lrubalance-realfile
    Scanned				 533633725.00 (    +0.00%)			  595687785.00 (   +11.63%)
    Reclaimed			 494325440.00 (    +0.00%)			  518154380.00 (    +4.82%)
    Reclaim efficiency %			92.63 (    +0.00%)				 86.98 (    -6.03%)
    Scanned file			 484532894.00 (    +0.00%)			  456937722.00 (    -5.70%)
    Scanned anon			  49100831.00 (    +0.00%)			  138750063.00 (  +182.58%)
    Swapouts			   8096423.00 (    +0.00%)			   48982142.00 (  +504.98%)
    Swapins				  10027384.00 (    +0.00%)			   62325044.00 (  +521.55%)
    Refaults			 479819973.00 (    +0.00%)			  451309483.00 (    -5.94%)
    Restore refaults		 426422087.00 (    +0.00%)			  399914067.00 (    -6.22%)
    Total paging IO			 497943780.00 (    +0.00%)			  562616669.00 (   +12.99%)
    
    	Test #4: Parallel IO
    
    It's important to note that these patches only affect the situation where
    the kernel has to reclaim workingset memory, which is usually a
    transitionary period.  The vast majority of page reclaim occuring in a
    system is from trimming the ever-expanding page cache.
    
    These patches don't affect cache trimming behavior.  We never swap as long
    as we only have use-once cache moving through the file LRU, we only
    consider swapping when the cache is actively thrashing.
    
    The following test demonstrates this.  It has an anon workingset that
    takes up half of RAM and then writes a file that is twice the size of RAM
    out to disk.
    
    As the cache is funneled through the inactive file list, no anon pages are
    scanned (aside from apparently some background noise of 10 pages):
    
    					  5.7-rc5-mm		          5.7-rc5-mm-lrubalance
    Scanned			    10714722.00 (    +0.00%)		       10723445.00 (    +0.08%)
    Reclaimed		    10703596.00 (    +0.00%)		       10712166.00 (    +0.08%)
    Reclaim efficiency %		  99.90 (    +0.00%)			     99.89 (    -0.00%)
    Scanned file		    10714722.00 (    +0.00%)		       10723435.00 (    +0.08%)
    Scanned anon			   0.00 (    +0.00%)			     10.00 (          )
    Swapouts			   0.00 (    +0.00%)			      7.00 (          )
    Swapins				   0.00 (    +0.00%)			      0.00 (    +0.00%)
    Refaults			  92.00 (    +0.00%)			     41.00 (   -54.84%)
    Restore refaults		   0.00 (    +0.00%)			      0.00 (    +0.00%)
    Total paging IO			  92.00 (    +0.00%)			     48.00 (   -47.31%)
    
    This patch (of 14):
    
    Currently, THP are counted as single pages until they are split right
    before being swapped out.  However, at that point the VM is already in the
    middle of reclaim, and adjusting the LRU balance then is useless.
    
    Always account THP by the number of basepages, and remove the fixup from
    the splitting path.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Shakeel Butt <shakeelb@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Rik van Riel <riel@surriel.com>
    Reviewed-by: Shakeel Butt <shakeelb@google.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Minchan Kim <minchan@kernel.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Link: http://lkml.kernel.org/r/20200520232525.798933-1-hannes@cmpxchg.org
    Link: http://lkml.kernel.org/r/20200520232525.798933-2-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    5df7419 View commit details
    Browse the repository at this point in the history
  105. mm: keep separate anon and file statistics on page reclaim activity

    Having statistics on pages scanned and pages reclaimed for both anon and
    file pages makes it easier to evaluate changes to LRU balancing.
    
    While at it, clean up the stat-keeping mess for isolation, putback,
    reclaim stats etc.  a bit: first the physical LRU operation (isolation and
    putback), followed by vmstats, reclaim_stats, and then vm events.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Rik van Riel <riel@surriel.com>
    Link: http://lkml.kernel.org/r/20200520232525.798933-3-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    497a6c1 View commit details
    Browse the repository at this point in the history
  106. mm: allow swappiness that prefers reclaiming anon over the file worki…

    …ngset
    
    With the advent of fast random IO devices (SSDs, PMEM) and in-memory swap
    devices such as zswap, it's possible for swap to be much faster than
    filesystems, and for swapping to be preferable over thrashing filesystem
    caches.
    
    Allow setting swappiness - which defines the rough relative IO cost of
    cache misses between page cache and swap-backed pages - to reflect such
    situations by making the swap-preferred range configurable.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Rik van Riel <riel@surriel.com>
    Link: http://lkml.kernel.org/r/20200520232525.798933-4-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    c843966 View commit details
    Browse the repository at this point in the history
  107. mm: fold and remove lru_cache_add_anon() and lru_cache_add_file()

    They're the same function, and for the purpose of all callers they are
    equivalent to lru_cache_add().
    
    [akpm@linux-foundation.org: fix it for local_lock changes]
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Rik van Riel <riel@surriel.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Minchan Kim <minchan@kernel.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Link: http://lkml.kernel.org/r/20200520232525.798933-5-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    6058eae View commit details
    Browse the repository at this point in the history
  108. mm: workingset: let cache workingset challenge anon

    We activate cache refaults with reuse distances in pages smaller than the
    size of the total cache.  This allows new pages with competitive access
    frequencies to establish themselves, as well as challenge and potentially
    displace pages on the active list that have gone cold.
    
    However, that assumes that active cache can only replace other active
    cache in a competition for the hottest memory.  This is not a great
    default assumption.  The page cache might be thrashing while there are
    enough completely cold and unused anonymous pages sitting around that we'd
    only have to write to swap once to stop all IO from the cache.
    
    Activate cache refaults when their reuse distance in pages is smaller than
    the total userspace workingset, including anonymous pages.
    
    Reclaim can still decide how to balance pressure among the two LRUs
    depending on the IO situation.  Rotational drives will prefer avoiding
    random IO from swap and go harder after cache.  But fundamentally, hot
    cache should be able to compete with anon pages for a place in RAM.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Rik van Riel <riel@surriel.com>
    Link: http://lkml.kernel.org/r/20200520232525.798933-6-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    34e58ca View commit details
    Browse the repository at this point in the history
  109. mm: remove use-once cache bias from LRU balancing

    When the splitlru patches divided page cache and swap-backed pages into
    separate LRU lists, the pressure balance between the lists was biased to
    account for the fact that streaming IO can cause memory pressure with a
    flood of pages that are used only once.  New page cache additions would
    tip the balance toward the file LRU, and repeat access would neutralize
    that bias again.  This ensured that page reclaim would always go for
    used-once cache first.
    
    Since e986850 ("mm,vmscan: only evict file pages when we have
    plenty"), page reclaim generally skips over swap-backed memory entirely as
    long as there is used-once cache present, and will apply the LRU balancing
    when only repeatedly accessed cache pages are left - at which point the
    previous use-once bias will have been neutralized.  This makes the
    use-once cache balancing bias unnecessary.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Acked-by: Minchan Kim <minchan@kernel.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Rik van Riel <riel@surriel.com>
    Link: http://lkml.kernel.org/r/20200520232525.798933-7-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    9682468 View commit details
    Browse the repository at this point in the history
  110. mm: vmscan: drop unnecessary div0 avoidance rounding in get_scan_count()

    When we calculate the relative scan pressure between the anon and file LRU
    lists, we have to assume that reclaim_stat can contain zeroes.  To avoid
    div0 crashes, we add 1 to all denominators like so:
    
            anon_prio = swappiness;
            file_prio = 200 - anon_prio;
    
    	[...]
    
            /*
             * The amount of pressure on anon vs file pages is inversely
             * proportional to the fraction of recently scanned pages on
             * each list that were recently referenced and in active use.
             */
            ap = anon_prio * (reclaim_stat->recent_scanned[0] + 1);
            ap /= reclaim_stat->recent_rotated[0] + 1;
    
            fp = file_prio * (reclaim_stat->recent_scanned[1] + 1);
            fp /= reclaim_stat->recent_rotated[1] + 1;
            spin_unlock_irq(&pgdat->lru_lock);
    
            fraction[0] = ap;
            fraction[1] = fp;
            denominator = ap + fp + 1;
    
    While reclaim_stat can contain 0, it's not actually possible for ap + fp
    to be 0.  One of anon_prio or file_prio could be zero, but they must still
    add up to 200.  And the reclaim_stat fraction, due to the +1 in there, is
    always at least 1.  So if one of the two numerators is 0, the other one
    can't be.  ap + fp is always at least 1.  Drop the + 1.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Rik van Riel <riel@surriel.com>
    Link: http://lkml.kernel.org/r/20200520232525.798933-8-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    a4fe163 View commit details
    Browse the repository at this point in the history
  111. mm: base LRU balancing on an explicit cost model

    Currently, scan pressure between the anon and file LRU lists is balanced
    based on a mixture of reclaim efficiency and a somewhat vague notion of
    "value" of having certain pages in memory over others.  That concept of
    value is problematic, because it has caused us to count any event that
    remotely makes one LRU list more or less preferrable for reclaim, even
    when these events are not directly comparable and impose very different
    costs on the system.  One example is referenced file pages that we still
    deactivate and referenced anonymous pages that we actually rotate back to
    the head of the list.
    
    There is also conceptual overlap with the LRU algorithm itself.  By
    rotating recently used pages instead of reclaiming them, the algorithm
    already biases the applied scan pressure based on page value.  Thus, when
    rebalancing scan pressure due to rotations, we should think of reclaim
    cost, and leave assessing the page value to the LRU algorithm.
    
    Lastly, considering both value-increasing as well as value-decreasing
    events can sometimes cause the same type of event to be counted twice,
    i.e.  how rotating a page increases the LRU value, while reclaiming it
    succesfully decreases the value.  In itself this will balance out fine,
    but it quietly skews the impact of events that are only recorded once.
    
    The abstract metric of "value", the murky relationship with the LRU
    algorithm, and accounting both negative and positive events make the
    current pressure balancing model hard to reason about and modify.
    
    This patch switches to a balancing model of accounting the concrete,
    actually observed cost of reclaiming one LRU over another.  For now, that
    cost includes pages that are scanned but rotated back to the list head.
    Subsequent patches will add consideration for IO caused by refaulting of
    recently evicted pages.
    
    Replace struct zone_reclaim_stat with two cost counters in the lruvec, and
    make everything that affects cost go through a new lru_note_cost()
    function.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Rik van Riel <riel@surriel.com>
    Link: http://lkml.kernel.org/r/20200520232525.798933-9-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    1431d4d View commit details
    Browse the repository at this point in the history
  112. mm: deactivations shouldn't bias the LRU balance

    Operations like MADV_FREE, FADV_DONTNEED etc.  currently move any affected
    active pages to the inactive list to accelerate their reclaim (good) but
    also steer page reclaim toward that LRU type, or away from the other
    (bad).
    
    The reason why this is undesirable is that such operations are not part of
    the regular page aging cycle, and rather a fluke that doesn't say much
    about the remaining pages on that list; they might all be in heavy use,
    and once the chunk of easy victims has been purged, the VM continues to
    apply elevated pressure on those remaining hot pages.  The other LRU,
    meanwhile, might have easily reclaimable pages, and there was never a need
    to steer away from it in the first place.
    
    As the previous patch outlined, we should focus on recording actually
    observed cost to steer the balance rather than speculating about the
    potential value of one LRU list over the other.  In that spirit, leave
    explicitely deactivated pages to the LRU algorithm to pick up, and let
    rotations decide which list is the easiest to reclaim.
    
    [cai@lca.pw: fix set-but-not-used warning]
      Link: http://lkml.kernel.org/r/20200522133335.GA624@Qians-MacBook-Air.local
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: Minchan Kim <minchan@kernel.org>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Rik van Riel <riel@surriel.com>
    Cc: Qian Cai <cai@lca.pw>
    Link: http://lkml.kernel.org/r/20200520232525.798933-10-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    fbbb602 View commit details
    Browse the repository at this point in the history
  113. mm: only count actual rotations as LRU reclaim cost

    When shrinking the active file list we rotate referenced pages only when
    they're in an executable mapping.  The others get deactivated.  When it
    comes to balancing scan pressure, though, we count all referenced pages as
    rotated, even the deactivated ones.  Yet they do not carry the same cost
    to the system: the deactivated page *might* refault later on, but the
    deactivation is tangible progress toward freeing pages; rotations on the
    other hand cost time and effort without getting any closer to freeing
    memory.
    
    Don't treat both events as equal.  The following patch will hook up LRU
    balancing to cache and anon refaults, which are a much more concrete cost
    signal for reclaiming one list over the other.  Thus, remove the maybe-IO
    cost bias from page references, and only note the CPU cost for actual
    rotations that prevent the pages from getting reclaimed.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: Minchan Kim <minchan@kernel.org>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Rik van Riel <riel@surriel.com>
    Link: http://lkml.kernel.org/r/20200520232525.798933-11-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    264e90c View commit details
    Browse the repository at this point in the history
  114. mm: balance LRU lists based on relative thrashing

    Since the LRUs were split into anon and file lists, the VM has been
    balancing between page cache and anonymous pages based on per-list ratios
    of scanned vs.  rotated pages.  In most cases that tips page reclaim
    towards the list that is easier to reclaim and has the fewest actively
    used pages, but there are a few problems with it:
    
    1. Refaults and LRU rotations are weighted the same way, even though
       one costs IO and the other costs a bit of CPU.
    
    2. The less we scan an LRU list based on already observed rotations,
       the more we increase the sampling interval for new references, and
       rotations become even more likely on that list. This can enter a
       death spiral in which we stop looking at one list completely until
       the other one is all but annihilated by page reclaim.
    
    Since commit a528910 ("mm: thrash detection-based file cache sizing")
    we have refault detection for the page cache.  Along with swapin events,
    they are good indicators of when the file or anon list, respectively, is
    too small for its workingset and needs to grow.
    
    For example, if the page cache is thrashing, the cache pages need more
    time in memory, while there may be colder pages on the anonymous list.
    Likewise, if swapped pages are faulting back in, it indicates that we
    reclaim anonymous pages too aggressively and should back off.
    
    Replace LRU rotations with refaults and swapins as the basis for relative
    reclaim cost of the two LRUs.  This will have the VM target list balances
    that incur the least amount of IO on aggregate.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Rik van Riel <riel@surriel.com>
    Link: http://lkml.kernel.org/r/20200520232525.798933-12-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    314b57f View commit details
    Browse the repository at this point in the history
  115. mm: vmscan: determine anon/file pressure balance at the reclaim root

    We split the LRU lists into anon and file, and we rebalance the scan
    pressure between them when one of them begins thrashing: if the file cache
    experiences workingset refaults, we increase the pressure on anonymous
    pages; if the workload is stalled on swapins, we increase the pressure on
    the file cache instead.
    
    With cgroups and their nested LRU lists, we currently don't do this
    correctly.  While recursive cgroup reclaim establishes a relative LRU
    order among the pages of all involved cgroups, LRU pressure balancing is
    done on an individual cgroup LRU level.  As a result, when one cgroup is
    thrashing on the filesystem cache while a sibling may have cold anonymous
    pages, pressure doesn't get equalized between them.
    
    This patch moves LRU balancing decision to the root of reclaim - the same
    level where the LRU order is established.
    
    It does this by tracking LRU cost recursively, so that every level of the
    cgroup tree knows the aggregate LRU cost of all memory within its domain.
    When the page scanner calculates the scan balance for any given individual
    cgroup's LRU list, it uses the values from the ancestor cgroup that
    initiated the reclaim cycle.
    
    If one sibling is then thrashing on the cache, it will tip the pressure
    balance inside its ancestors, and the next hierarchical reclaim iteration
    will go more after the anon pages in the tree.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Rik van Riel <riel@surriel.com>
    Link: http://lkml.kernel.org/r/20200520232525.798933-13-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    7cf111b View commit details
    Browse the repository at this point in the history
  116. mm: vmscan: reclaim writepage is IO cost

    The VM tries to balance reclaim pressure between anon and file so as to
    reduce the amount of IO incurred due to the memory shortage.  It already
    counts refaults and swapins, but in addition it should also count
    writepage calls during reclaim.
    
    For swap, this is obvious: it's IO that wouldn't have occurred if the
    anonymous memory hadn't been under memory pressure.  From a relative
    balancing point of view this makes sense as well: even if anon is cold and
    reclaimable, a cache that isn't thrashing may have equally cold pages that
    don't require IO to reclaim.
    
    For file writeback, it's trickier: some of the reclaim writepage IO would
    have likely occurred anyway due to dirty expiration.  But not all of it -
    premature writeback reduces batching and generates additional writes.
    Since the flushers are already woken up by the time the VM starts writing
    cache pages one by one, let's assume that we'e likely causing writes that
    wouldn't have happened without memory pressure.  In addition, the per-page
    cost of IO would have probably been much cheaper if written in larger
    batches from the flusher thread rather than the single-page-writes from
    kswapd.
    
    For our purposes - getting the trend right to accelerate convergence on a
    stable state that doesn't require paging at all - this is sufficiently
    accurate.  If we later wanted to optimize for sustained thrashing, we can
    still refine the measurements.
    
    Count all writepage calls from kswapd as IO cost toward the LRU that the
    page belongs to.
    
    Why do this dynamically?  Don't we know in advance that anon pages require
    IO to reclaim, and so could build in a static bias?
    
    First, scanning is not the same as reclaiming.  If all the anon pages are
    referenced, we may not swap for a while just because we're scanning the
    anon list.  During this time, however, it's important that we age
    anonymous memory and the page cache at the same rate so that their
    hot-cold gradients are comparable.  Everything else being equal, we still
    want to reclaim the coldest memory overall.
    
    Second, we keep copies in swap unless the page changes.  If there is
    swap-backed data that's mostly read (tmpfs file) and has been swapped out
    before, we can reclaim it without incurring additional IO.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Rik van Riel <riel@surriel.com>
    Link: http://lkml.kernel.org/r/20200520232525.798933-14-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    96f8bf4 View commit details
    Browse the repository at this point in the history
  117. mm: vmscan: limit the range of LRU type balancing

    When LRU cost only shows up on one list, we abruptly stop scanning that
    list altogether.  That's an extreme reaction: by the time the other list
    starts thrashing and the pendulum swings back, we may have no recent age
    information on the first list anymore, and we could have significant
    latencies until the scanner has caught up.
    
    Soften this change in the feedback system by ensuring that no list
    receives less than a third of overall pressure, and only distribute the
    other 66% according to LRU cost.  This ensures that we maintain a minimum
    rate of aging on the entire workingset while it's being pressured, while
    still allowing a generous rate of convergence when the relative sizes of
    the lists need to adjust.
    
    Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Michal Hocko <mhocko@suse.com>
    Cc: Minchan Kim <minchan@kernel.org>
    Cc: Rik van Riel <riel@surriel.com>
    Link: http://lkml.kernel.org/r/20200520232525.798933-15-hannes@cmpxchg.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    hnaz authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    d483a5d View commit details
    Browse the repository at this point in the history
  118. mm: swap: fix vmstats for huge pages

    Many of the callbacks called by pagevec_lru_move_fn() does not correctly
    update the vmstats for huge pages. Fix that. Also __pagevec_lru_add_fn()
    use the irq-unsafe alternative to update the stat as the irqs are
    already disabled.
    
    Signed-off-by: Shakeel Butt <shakeelb@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Link: http://lkml.kernel.org/r/20200527182916.249910-1-shakeelb@google.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    shakeelb authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    5d91f31 View commit details
    Browse the repository at this point in the history
  119. mm: swap: memcg: fix memcg stats for huge pages

    The commit 2262185 ("mm: per-cgroup memory reclaim stats") added
    PGLAZYFREE, PGACTIVATE & PGDEACTIVATE stats for cgroups but missed
    couple of places and PGLAZYFREE missed huge page handling. Fix that.
    Also for PGLAZYFREE use the irq-unsafe function to update as the irq is
    already disabled.
    
    Fixes: 2262185 ("mm: per-cgroup memory reclaim stats")
    Signed-off-by: Shakeel Butt <shakeelb@google.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: Johannes Weiner <hannes@cmpxchg.org>
    Link: http://lkml.kernel.org/r/20200527182947.251343-1-shakeelb@google.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    shakeelb authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    21e330f View commit details
    Browse the repository at this point in the history
  120. tools/vm/page_owner_sort.c: filter out unneeded line

    To see a sorted result from page_owner, there must be a tiresome
    preprocessing step before running page_owner_sort.  This patch simply
    filters out lines which start with "PFN" while reading the page owner
    report.
    
    Signed-off-by: Changhee Han <ch0.han@lge.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Vlastimil Babka <vbabka@suse.cz>
    Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
    Cc: Jonathan Corbet <corbet@lwn.net>
    Link: http://lkml.kernel.org/r/20200429052940.16968-1-ch0.han@lge.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    iehlog0zb authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    5b94ce2 View commit details
    Browse the repository at this point in the history
  121. mm, mempolicy: fix up gup usage in lookup_node

    ba84107 ("mm/mempolicy: Allow lookup_node() to handle fatal signal")
    has added a special casing for 0 return value because that was a possible
    gup return value when interrupted by fatal signal.  This has been fixed by
    ae46d2a ("mm/gup: Let __get_user_pages_locked() return -EINTR for
    fatal signal") in the mean time so ba84107 can be reverted.
    
    This patch however doesn't go all the way to revert it because the check
    for 0 is wrong and confusing here.  Firstly it is inherently unsafe to
    access the page when get_user_pages_locked returns 0 (aka no page
    returned).
    
    Fortunatelly this will not happen because get_user_pages_locked will not
    return 0 when nr_pages > 0 unless FOLL_NOWAIT is specified which is not
    the case here.  Document this potential error code in gup code while we
    are at it.
    
    Signed-off-by: Michal Hocko <mhocko@suse.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Peter Xu <peterx@redhat.com>
    Link: http://lkml.kernel.org/r/20200421071026.18394-1-mhocko@kernel.org
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Michal Hocko authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    2d3a36a View commit details
    Browse the repository at this point in the history
  122. include/linux/memblock.h: fix minor typo and unclear comment

    Fix a minor typo "usabe->usable" for the current discription of member
    variable "memory" in struct memblock.
    
    BTW, I think it's unclear the member variable "base" in struct
    memblock_type is currently described as the physical address of memory
    region, change it to base address of the region is clearer since the
    variable is decorated as phys_addr_t.
    
    Signed-off-by: chenqiwu <chenqiwu@xiaomi.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
    Link: http://lkml.kernel.org/r/1588846952-32166-1-git-send-email-qiwuchen55@gmail.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    qiwuchen authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    8cbd54f View commit details
    Browse the repository at this point in the history
  123. sparc32: register memory occupied by kernel as memblock.memory

    sparc32 never registered the memory occupied by the kernel image with
    memblock_add() and it only reserved this memory with meblock_reserve().
    
    With openbios as system firmware, the memory occupied by the kernel is
    reserved in openbios and removed from mem.available.  The prom setup code
    in the kernel uses mem.available to set up the memory banks and
    essentially there is a hole for the memory occupied by the kernel image.
    
    Later in bootmem_init() this memory is memblock_reserve()d.
    
    Up until recently, memmap initialization would call __init_single_page()
    for the pages in that hole, the free_low_memory_core_early() would mark
    them as reserved and everything would be Ok.
    
    After the change in memmap initialization introduced by the commit "mm:
    memmap_init: iterate over memblock regions rather that check each PFN",
    the hole is skipped and the page structs for it are not initialized.  And
    when they are passed from memblock to page allocator as reserved, the
    latter gets confused.
    
    Simply registering the memory occupied by the kernel with memblock_add()
    resolves this issue.
    
    Tested on qemu-system-sparc with Debian Etch [1] userspace.
    
    [1] https://people.debian.org/~aurel32/qemu/sparc/debian_etch_sparc_small.qcow2
    
    Signed-off-by: Mike Rapoport <rppt@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: David S. Miller <davem@davemloft.net>
    Cc: Guenter Roeck <linux@roeck-us.net>
    Link: https://lkml.kernel.org/r/20200517000050.GA87467@roeck-us.nlllllet/
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    rppt authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    4360dfa View commit details
    Browse the repository at this point in the history
  124. hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs

    In a 32-bit program, running on arm64 architecture.  When the address
    space below mmap base is completely exhausted, shmat() for huge pages will
    return ENOMEM, but shmat() for normal pages can still success on no-legacy
    mode.  This seems not fair.
    
    For normal pages, the calling trace of get_unmapped_area() is:
    
    	=> mm->get_unmapped_area()
    	if on legacy mode,
    		=> arch_get_unmapped_area()
    			=> vm_unmapped_area()
    	if on no-legacy mode,
    		=> arch_get_unmapped_area_topdown()
    			=> vm_unmapped_area()
    
    For huge pages, the calling trace of get_unmapped_area() is:
    
    	=> file->f_op->get_unmapped_area()
    		=> hugetlb_get_unmapped_area()
    			=> vm_unmapped_area()
    
    To solve this issue, we only need to make hugetlb_get_unmapped_area() take
    the same way as mm->get_unmapped_area().  Add *bottomup() and *topdown()
    for hugetlbfs, and check current mm->get_unmapped_area() to decide which
    one to use.  If mm->get_unmapped_area is equal to
    arch_get_unmapped_area_topdown(), hugetlb_get_unmapped_area() calls
    topdown routine, otherwise calls bottomup routine.
    
    Reported-by: kbuild test robot <lkp@intel.com>
    Signed-off-by: Shijie Hu <hushijie3@huawei.com>
    Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Will Deacon <will@kernel.org>
    Cc: Xiaoming Ni <nixiaoming@huawei.com>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: yangerkun <yangerkun@huawei.com>
    Cc: ChenGang <cg.chen@huawei.com>
    Cc: Chen Jie <chenjie6@huawei.com>
    Link: http://lkml.kernel.org/r/20200518065338.113664-1-hushijie3@huawei.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Shijie Hu authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    8859025 View commit details
    Browse the repository at this point in the history
  125. mm: thp: don't need to drain lru cache when splitting and mlocking THP

    Since commit 8f18227 ("mm/swap.c: flush lru pvecs on compound page
    arrival") THP would not stay in pagevec anymore.  So the optimization made
    by commit d965432 ("thp: increase split_huge_page() success rate")
    doesn't make sense anymore, which tries to unpin munlocked THPs from
    pagevec by draining pagevec.
    
    Draining lru cache before isolating THP in mlock path is also unnecessary.
    b676b29 ("mm, thp: fix mapped pages avoiding unevictable list on
    mlock") added it and 9a73f61 ("thp, mlock: do not mlock PTE-mapped
    file huge pages") accidentally carried it over after the above
    optimization went in.
    
    Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
    Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
    Cc: Hugh Dickins <hughd@google.com>
    Cc: Andrea Arcangeli <aarcange@redhat.com>
    Link: http://lkml.kernel.org/r/1585946493-7531-1-git-send-email-yang.shi@linux.alibaba.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Yang Shi authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    67e4eb0 View commit details
    Browse the repository at this point in the history
  126. powerpc/mm: drop platform defined pmd_mknotpresent()

    Patch series "mm/thp: Rename pmd_mknotpresent() as pmd_mknotvalid()", v2.
    
    This series renames pmd_mknotpresent() as pmd_mknotvalid().  Before that
    it drops an existing pmd_mknotpresent() definition from powerpc platform
    which was never required as it defines it's pmdp_invalidate() through
    subscribing __HAVE_ARCH_PMDP_INVALIDATE.  This does not create any
    functional change.
    
    This rename was suggested by Catalin during a previous discussion while we
    were trying to change the THP helpers on arm64 platform for migration.
    
    https://patchwork.kernel.org/patch/11019637/
    
    This patch (of 2):
    
    Platform needs to define pmd_mknotpresent() for generic pmdp_invalidate()
    only when __HAVE_ARCH_PMDP_INVALIDATE is not subscribed.  Otherwise
    platform specific pmd_mknotpresent() is not required.  Hence just drop it.
    
    Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Paul Mackerras <paulus@samba.org>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Will Deacon <will@kernel.org>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Link: http://lkml.kernel.org/r/1587520326-10099-1-git-send-email-anshuman.khandual@arm.com
    Link: http://lkml.kernel.org/r/1584680057-13753-1-git-send-email-anshuman.khandual@arm.com
    Link: http://lkml.kernel.org/r/1584680057-13753-2-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Anshuman Khandual authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    124cb3a View commit details
    Browse the repository at this point in the history
  127. mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()

    pmd_present() is expected to test positive after pmdp_mknotpresent() as
    the PMD entry still points to a valid huge page in memory.
    pmdp_mknotpresent() implies that given PMD entry is just invalidated from
    MMU perspective while still holding on to pmd_page() referred valid huge
    page thus also clearing pmd_present() test.  This creates the following
    situation which is counter intuitive.
    
    [pmd_present(pmd_mknotpresent(pmd)) = true]
    
    This renames pmd_mknotpresent() as pmd_mkinvalid() reflecting the helper's
    functionality more accurately while changing the above mentioned situation
    as follows.  This does not create any functional change.
    
    [pmd_present(pmd_mkinvalid(pmd)) = true]
    
    This is not applicable for platforms that define own pmdp_invalidate() via
    __HAVE_ARCH_PMDP_INVALIDATE.  Suggestion for renaming came during a
    previous discussion here.
    
    https://patchwork.kernel.org/patch/11019637/
    
    [anshuman.khandual@arm.com: change pmd_mknotvalid() to pmd_mkinvalid() per Will]
      Link: http://lkml.kernel.org/r/1587520326-10099-3-git-send-email-anshuman.khandual@arm.com
    Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
    Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: Will Deacon <will@kernel.org>
    Cc: Vineet Gupta <vgupta@synopsys.com>
    Cc: Russell King <linux@armlinux.org.uk>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Steven Rostedt <rostedt@goodmis.org>
    Cc: Dave Hansen <dave.hansen@linux.intel.com>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    Cc: Michael Ellerman <mpe@ellerman.id.au>
    Cc: Paul Mackerras <paulus@samba.org>
    Link: http://lkml.kernel.org/r/1584680057-13753-3-git-send-email-anshuman.khandual@arm.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Anshuman Khandual authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    86ec2da View commit details
    Browse the repository at this point in the history
  128. drivers/base/memory.c: cache memory blocks in xarray to accelerate lo…

    …okup
    
    Searching for a particular memory block by id is an O(n) operation because
    each memory block's underlying device is kept in an unsorted linked list
    on the subsystem bus.
    
    We can cut the lookup cost to O(log n) if we cache each memory block
    in an xarray.  This time complexity improvement is significant on
    systems with many memory blocks.  For example:
    
    1. A 128GB POWER9 VM with 256MB memblocks has 512 blocks.  With this
       change  memory_dev_init() completes ~12ms faster and walk_memory_blocks()
       completes ~12ms faster.
    
    Before:
    [    0.005042] memory_dev_init: adding memory blocks
    [    0.021591] memory_dev_init: added memory blocks
    [    0.022699] walk_memory_blocks: walking memory blocks
    [    0.038730] walk_memory_blocks: walked memory blocks 0-511
    
    After:
    [    0.005057] memory_dev_init: adding memory blocks
    [    0.009415] memory_dev_init: added memory blocks
    [    0.010519] walk_memory_blocks: walking memory blocks
    [    0.014135] walk_memory_blocks: walked memory blocks 0-511
    
    2. A 256GB POWER9 LPAR with 256MB memblocks has 1024 blocks.  With
       this change memory_dev_init() completes ~88ms faster and
       walk_memory_blocks() completes ~87ms faster.
    
    Before:
    [    0.252246] memory_dev_init: adding memory blocks
    [    0.395469] memory_dev_init: added memory blocks
    [    0.409413] walk_memory_blocks: walking memory blocks
    [    0.433028] walk_memory_blocks: walked memory blocks 0-511
    [    0.433094] walk_memory_blocks: walking memory blocks
    [    0.500244] walk_memory_blocks: walked memory blocks 131072-131583
    
    After:
    [    0.245063] memory_dev_init: adding memory blocks
    [    0.299539] memory_dev_init: added memory blocks
    [    0.313609] walk_memory_blocks: walking memory blocks
    [    0.315287] walk_memory_blocks: walked memory blocks 0-511
    [    0.315349] walk_memory_blocks: walking memory blocks
    [    0.316988] walk_memory_blocks: walked memory blocks 131072-131583
    
    3. A 32TB POWER9 LPAR with 256MB memblocks has 131072 blocks.  With
       this change we complete memory_dev_init() ~37 minutes faster and
       walk_memory_blocks() at least ~30 minutes faster.  The exact timing
       for walk_memory_blocks() is  missing, though I observed that the
       soft lockups in walk_memory_blocks() disappeared with the change,
       suggesting that lower bound.
    
    Before:
    [   13.703907] memory_dev_init: adding blocks
    [ 2287.406099] memory_dev_init: added all blocks
    [ 2347.494986] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
    [ 2527.625378] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
    [ 2707.761977] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
    [ 2887.899975] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
    [ 3068.028318] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
    [ 3248.158764] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
    [ 3428.287296] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
    [ 3608.425357] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
    [ 3788.554572] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
    [ 3968.695071] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
    [ 4148.823970] [c000000014c5bb60] [c000000000869af4] walk_memory_blocks+0x94/0x160
    
    After:
    [   13.696898] memory_dev_init: adding blocks
    [   15.660035] memory_dev_init: added all blocks
    (the walk_memory_blocks traces disappear)
    
    There should be no significant negative impact for machines with few
    memory blocks.  A sparse xarray has a small footprint and an O(log n)
    lookup is negligibly slower than an O(n) lookup for only the smallest
    number of memory blocks.
    
    1. A 16GB x86 machine with 128MB memblocks has 132 blocks.  With this
       change memory_dev_init() completes ~300us faster and walk_memory_blocks()
       completes no faster or slower.  The improvement is pretty close to noise.
    
    Before:
    [    0.224752] memory_dev_init: adding memory blocks
    [    0.227116] memory_dev_init: added memory blocks
    [    0.227183] walk_memory_blocks: walking memory blocks
    [    0.227183] walk_memory_blocks: walked memory blocks 0-131
    
    After:
    [    0.224911] memory_dev_init: adding memory blocks
    [    0.226935] memory_dev_init: added memory blocks
    [    0.227089] walk_memory_blocks: walking memory blocks
    [    0.227089] walk_memory_blocks: walked memory blocks 0-131
    
    [david@redhat.com: document the locking]
      Link: http://lkml.kernel.org/r/bc21eec6-7251-4c91-2f57-9a0671f8d414@redhat.com
    Signed-off-by: Scott Cheloha <cheloha@linux.ibm.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Acked-by: David Hildenbrand <david@redhat.com>
    Acked-by: Nathan Lynch <nathanl@linux.ibm.com>
    Acked-by: Michal Hocko <mhocko@suse.com>
    Cc: Rafael J. Wysocki <rafael@kernel.org>
    Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
    Cc: Rick Lindsley <ricklind@linux.vnet.ibm.com>
    Cc: Scott Cheloha <cheloha@linux.ibm.com>
    Link: http://lkml.kernel.org/r/20200121231028.13699-1-cheloha@linux.ibm.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    Scott Cheloha authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    4fb6eab View commit details
    Browse the repository at this point in the history
  129. mm: add DEBUG_WX support

    Patch series "Extract DEBUG_WX to shared use".
    
    Some architectures support DEBUG_WX function, it's verbatim from each
    others, so extract to mm/Kconfig.debug for shared use.
    
    PPC and ARM ports don't support generic page dumper yet, so we only
    refine x86 and arm64 port in this patch series.
    
    For RISC-V port, the DEBUG_WX support depends on other patches which
    be merged already:
      - RISC-V page table dumper
      - Support strict kernel memory permissions for security
    
    This patch (of 4):
    
    Some architectures support DEBUG_WX function, it's verbatim from each
    others.  Extract to mm/Kconfig.debug for shared use.
    
    [akpm@linux-foundation.org: reword text, per Will Deacon & Zong Li]
      Link: http://lkml.kernel.org/r/20200427194245.oxRJKj3fn%25akpm@linux-foundation.org
    [zong.li@sifive.com: remove the specific name of arm64]
      Link: http://lkml.kernel.org/r/3a6a92ecedc54e1d0fc941398e63d504c2cd5611.1589178399.git.zong.li@sifive.com
    [zong.li@sifive.com: add MMU dependency for DEBUG_WX]
      Link: http://lkml.kernel.org/r/4a674ac7863ff39ca91847b10e51209771f99416.1589178399.git.zong.li@sifive.com
    Suggested-by: Palmer Dabbelt <palmer@dabbelt.com>
    Signed-off-by: Zong Li <zong.li@sifive.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: Will Deacon <will@kernel.org>
    Link: http://lkml.kernel.org/r/cover.1587455584.git.zong.li@sifive.com
    Link: http://lkml.kernel.org/r/23980cd0f0e5d79e24a92169116407c75bcc650d.1587455584.git.zong.li@sifive.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    zongbox authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    375d315 View commit details
    Browse the repository at this point in the history
  130. riscv: support DEBUG_WX

    Support DEBUG_WX to check whether there are mapping with write and execute
    permission at the same time.
    
    [akpm@linux-foundation.org: replace macros with C]
    Signed-off-by: Zong Li <zong.li@sifive.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Will Deacon <will@kernel.org>
    Link: http://lkml.kernel.org/r/282e266311bced080bc6f7c255b92f87c1eb65d6.1587455584.git.zong.li@sifive.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    zongbox authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    b422d28 View commit details
    Browse the repository at this point in the history
  131. x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined

    Extract DEBUG_WX to mm/Kconfig.debug for shared use.  Change to use
    ARCH_HAS_DEBUG_WX instead of DEBUG_WX defined by arch port.
    
    Signed-off-by: Zong Li <zong.li@sifive.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Will Deacon <will@kernel.org>
    Link: http://lkml.kernel.org/r/430736828d149df3f5b462d291e845ec690e0141.1587455584.git.zong.li@sifive.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    zongbox authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    7e01ccb View commit details
    Browse the repository at this point in the history
  132. arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined

    Extract DEBUG_WX to mm/Kconfig.debug for shared use.  Change to use
    ARCH_HAS_DEBUG_WX instead of DEBUG_WX defined by arch port.
    
    Signed-off-by: Zong Li <zong.li@sifive.com>
    Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Catalin Marinas <catalin.marinas@arm.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: Palmer Dabbelt <palmer@dabbelt.com>
    Cc: Paul Walmsley <paul.walmsley@sifive.com>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Will Deacon <will@kernel.org>
    Link: http://lkml.kernel.org/r/e19709e7576f65e303245fe520cad5f7bae72763.1587455584.git.zong.li@sifive.com
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    zongbox authored and torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    09587a0 View commit details
    Browse the repository at this point in the history
  133. Merge branch 'akpm' (patches from Andrew)

    Merge more updates from Andrew Morton:
     "More mm/ work, plenty more to come
    
      Subsystems affected by this patch series: slub, memcg, gup, kasan,
      pagealloc, hugetlb, vmscan, tools, mempolicy, memblock, hugetlbfs,
      thp, mmap, kconfig"
    
    * akpm: (131 commits)
      arm64: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
      x86: mm: use ARCH_HAS_DEBUG_WX instead of arch defined
      riscv: support DEBUG_WX
      mm: add DEBUG_WX support
      drivers/base/memory.c: cache memory blocks in xarray to accelerate lookup
      mm/thp: rename pmd_mknotpresent() as pmd_mkinvalid()
      powerpc/mm: drop platform defined pmd_mknotpresent()
      mm: thp: don't need to drain lru cache when splitting and mlocking THP
      hugetlbfs: get unmapped area below TASK_UNMAPPED_BASE for hugetlbfs
      sparc32: register memory occupied by kernel as memblock.memory
      include/linux/memblock.h: fix minor typo and unclear comment
      mm, mempolicy: fix up gup usage in lookup_node
      tools/vm/page_owner_sort.c: filter out unneeded line
      mm: swap: memcg: fix memcg stats for huge pages
      mm: swap: fix vmstats for huge pages
      mm: vmscan: limit the range of LRU type balancing
      mm: vmscan: reclaim writepage is IO cost
      mm: vmscan: determine anon/file pressure balance at the reclaim root
      mm: balance LRU lists based on relative thrashing
      mm: only count actual rotations as LRU reclaim cost
      ...
    torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    ee01c4d View commit details
    Browse the repository at this point in the history
  134. Merge tag 'media/v5.8-1' of git://git.kernel.org/pub/scm/linux/kernel…

    …/git/mchehab/linux-media
    
    Pull media updates from Mauro Carvalho Chehab:
    
     - Media documentation is now split into admin-guide, driver-api and
       userspace-api books (a longstanding request from Jon);
    
     - The media Kconfig was reorganized, in order to make easier to select
       drivers and their dependencies;
    
     - The testing drivers now has a separate directory;
    
     - added a new driver for Rockchip Video Decoder IP;
    
     - The atomisp staging driver was resurrected. It is meant to work with
       4 generations of cameras on Atom-based laptops, tablets and cell
       phones. So, it seems worth investing time to cleanup this driver and
       making it in good shape.
    
     - Added some V4L2 core ancillary routines to help with h264 codecs;
    
     - Added an ov2740 image sensor driver;
    
     - The si2157 gained support for Analog TV, which, in turn, added
       support for some cx231xx and cx23885 boards to also support analog
       standards;
    
     - Added some V4L2 controls (V4L2_CID_CAMERA_ORIENTATION and
       V4L2_CID_CAMERA_SENSOR_ROTATION) to help identifying where the camera
       is located at the device;
    
     - VIDIOC_ENUM_FMT was extended to support MC-centric devices;
    
     - Lots of drivers improvements and cleanups.
    
    * tag 'media/v5.8-1' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media: (503 commits)
      media: Documentation: media: Refer to mbus format documentation from CSI-2 docs
      media: s5k5baf: Replace zero-length array with flexible-array
      media: i2c: imx219: Drop <linux/clk-provider.h> and <linux/clkdev.h>
      media: i2c: Add ov2740 image sensor driver
      media: ov8856: Implement sensor module revision identification
      media: ov8856: Add devicetree support
      media: dt-bindings: ov8856: Document YAML bindings
      media: dvb-usb: Add Cinergy S2 PCIe Dual Port support
      media: dvbdev: Fix tuner->demod media controller link
      media: dt-bindings: phy: phy-rockchip-dphy-rx0: move rockchip dphy rx0 bindings out of staging
      media: staging: dt-bindings: phy-rockchip-dphy-rx0: remove non-used reg property
      media: atomisp: unify the version for isp2401 a0 and b0 versions
      media: atomisp: update TODO with the current data
      media: atomisp: adjust some code at sh_css that could be broken
      media: atomisp: don't produce errs for ignored IRQs
      media: atomisp: print IRQ when debugging
      media: atomisp: isp_mmu: don't use kmem_cache
      media: atomisp: add a notice about possible leak resources
      media: atomisp: disable the dynamic and reserved pools
      media: atomisp: turn on camera before setting it
      ...
    torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    a98f670 View commit details
    Browse the repository at this point in the history
  135. atomisp: avoid warning about unused function

    The atomisp_mrfld_power() function isn't actually ever called, because
    the two call-sites have commented out the use because it breaks on some
    platforms.  That results in:
    
      drivers/staging/media/atomisp/pci/atomisp_v4l2.c:764:12: warning: ‘atomisp_mrfld_power’ defined but not used [-Wunused-function]
        764 | static int atomisp_mrfld_power(struct atomisp_device *isp, bool enable)
            |            ^~~~~~~~~~~~~~~~~~~
    
    during the build.
    
    Rather than commenting out the use entirely, just disable it
    semantically instead (using a "0 &&" construct), leaving the call in
    place from a syntax standpoint, and avoiding the warning.
    
    I really don't want my builds to have any warnings that can then hide
    real issues.
    
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
    torvalds committed Jun 4, 2020
    Configuration menu
    Copy the full SHA
    6929f71 View commit details
    Browse the repository at this point in the history