Skip to content

Commit

Permalink
Remove TranslateUnicodeToOem and all related code (#14745)
Browse files Browse the repository at this point in the history
The overarching intention of this PR is to improve our Unicode support.
Most
of our APIs still don't support anything beyond UCS-2 and DBCS
sequences.
This commit doesn't fix the UTF-16 support (by supporting surrogate
pairs),
but it does improve support for UTF-8 by allowing longer `char`
sequences.

It does so by removing `TranslateUnicodeToOem` which seems to have had
an almost
viral effect on code quality wherever it was used. It made the
assumption that
_all_ narrow glyphs encode to 1 `char` and most wide glyphs to 2
`char`s.
It also didn't bother to check whether `WideCharToMultiByte` failed or
returned
a different amount of `char`s. So up until now it was easily possible to
read
uninitialized stack memory from conhost. Any code that used this
function was
forced to do the same "measurement" of narrow/wide glyphs, because _of
course_
it didn't had any way to indicate to the caller how much memory it needs
to
store the result. Instead all callers were forced to sorta replicate how
it
worked to calculate the required storage ahead of time.
Unsurprisingly, none of the callers used the same algorithm...

Without it the code is much leaner and easier to understand now. The
best
example is `COOKED_READ_DATA::_handlePostCharInputLoop` which used to
contain 3
blocks of _almost_ identical code, but with ever so subtle differences.
After
reading the old code for hours I still don't know if they were relevant
or not.
It used to be 200 lines of code lacking any documentation and it's now
50 lines
with descriptive function names. I hope this doesn't break anything, but
to
be honest I can't imagine anyone having relied on this mess in the first
place.

I needed some helpers to handle byte slices (`std::span<char>`), which
is why
a new `til/bytes.h` header was added. Initially I wrote a `buf_writer`
class
but felt like such a wrapper around a slice/span was annoying to use.
As such I've opted for freestanding functions which take slices as
mutable
references and "advance" them (offset the start) whenever they're read
from
or written to. I'm not particularly happy with the design but they do
the job.

Related to #8000
Fixes #4551
Fixes #7589
Fixes #8663

## Validation Steps Performed
* Unit and feature tests ✅
* Far Manager ✅
* Fixes test cases in #4551, #7589 and #8663
  • Loading branch information
lhecker authored Feb 28, 2023
1 parent cf87590 commit 599b550
Show file tree
Hide file tree
Showing 23 changed files with 603 additions and 1,224 deletions.
74 changes: 0 additions & 74 deletions src/host/dbcs.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -177,77 +177,3 @@ BOOL IsAvailableEastAsianCodePage(const UINT uiCodePage)
return false;
}
}

_Ret_range_(0, cbAnsi)
ULONG TranslateUnicodeToOem(_In_reads_(cchUnicode) PCWCHAR pwchUnicode,
const ULONG cchUnicode,
_Out_writes_bytes_(cbAnsi) PCHAR pchAnsi,
const ULONG cbAnsi,
_Out_ std::unique_ptr<IInputEvent>& partialEvent)
{
const auto& gci = ServiceLocator::LocateGlobals().getConsoleInformation();
const auto TmpUni = new (std::nothrow) WCHAR[cchUnicode];
if (TmpUni == nullptr)
{
return 0;
}

memcpy(TmpUni, pwchUnicode, cchUnicode * sizeof(WCHAR));

CHAR AsciiDbcs[2];
AsciiDbcs[1] = 0;

ULONG i, j;
for (i = 0, j = 0; i < cchUnicode && j < cbAnsi; i++, j++)
{
if (IsGlyphFullWidth(TmpUni[i]))
{
const auto NumBytes = sizeof(AsciiDbcs);
ConvertToOem(gci.CP, &TmpUni[i], 1, (LPSTR)&AsciiDbcs[0], NumBytes);
if (IsDBCSLeadByteConsole(AsciiDbcs[0], &gci.CPInfo))
{
if (j < cbAnsi - 1)
{ // -1 is safe DBCS in buffer
pchAnsi[j] = AsciiDbcs[0];
j++;
pchAnsi[j] = AsciiDbcs[1];
AsciiDbcs[1] = 0;
}
else
{
pchAnsi[j] = AsciiDbcs[0];
break;
}
}
else
{
pchAnsi[j] = AsciiDbcs[0];
AsciiDbcs[1] = 0;
}
}
else
{
ConvertToOem(gci.CP, &TmpUni[i], 1, &pchAnsi[j], 1);
}
}

if (AsciiDbcs[1])
{
try
{
auto keyEvent = std::make_unique<KeyEvent>();
if (keyEvent.get())
{
keyEvent->SetCharData(AsciiDbcs[1]);
partialEvent.reset(static_cast<IInputEvent* const>(keyEvent.release()));
}
}
catch (...)
{
LOG_HR(wil::ResultFromCaughtException());
}
}

delete[] TmpUni;
return j;
}
7 changes: 0 additions & 7 deletions src/host/dbcs.h
Original file line number Diff line number Diff line change
Expand Up @@ -29,10 +29,3 @@ bool IsDBCSLeadByteConsole(const CHAR ch, const CPINFO* const pCPInfo);
BYTE CodePageToCharSet(const UINT uiCodePage);

BOOL IsAvailableEastAsianCodePage(const UINT uiCodePage);

_Ret_range_(0, cbAnsi)
ULONG TranslateUnicodeToOem(_In_reads_(cchUnicode) PCWCHAR pwchUnicode,
const ULONG cchUnicode,
_Out_writes_bytes_(cbAnsi) PCHAR pchAnsi,
const ULONG cbAnsi,
_Out_ std::unique_ptr<IInputEvent>& partialEvent);
68 changes: 7 additions & 61 deletions src/host/directio.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -169,75 +169,21 @@ void EventsToUnicode(_Inout_ std::deque<std::unique_ptr<IInputEvent>>& inEvents,
LockConsole();
auto Unlock = wil::scope_exit([&] { UnlockConsole(); });

std::deque<std::unique_ptr<IInputEvent>> partialEvents;
if (!IsUnicode)
{
if (inputBuffer.IsReadPartialByteSequenceAvailable())
{
partialEvents.push_back(inputBuffer.FetchReadPartialByteSequence(IsPeek));
}
}

size_t amountToRead;
if (FAILED(SizeTSub(eventReadCount, partialEvents.size(), &amountToRead)))
{
return STATUS_INTEGER_OVERFLOW;
}
std::deque<std::unique_ptr<IInputEvent>> readEvents;
auto Status = inputBuffer.Read(readEvents,
amountToRead,
IsPeek,
true,
IsUnicode,
false);
const auto Status = inputBuffer.Read(outEvents,
eventReadCount,
IsPeek,
true,
IsUnicode,
false);

if (CONSOLE_STATUS_WAIT == Status)
{
FAIL_FAST_IF(!(readEvents.empty()));
// If we're told to wait until later, move all of our context
// to the read data object and send it back up to the server.
waiter = std::make_unique<DirectReadData>(&inputBuffer,
&readHandleState,
eventReadCount,
std::move(partialEvents));
}
else if (SUCCEEDED_NTSTATUS(Status))
{
// split key events to oem chars if necessary
if (!IsUnicode)
{
try
{
SplitToOem(readEvents);
}
CATCH_LOG();
}

// combine partial and readEvents
while (!partialEvents.empty())
{
readEvents.push_front(std::move(partialEvents.back()));
partialEvents.pop_back();
}

// move events over
for (size_t i = 0; i < eventReadCount; ++i)
{
if (readEvents.empty())
{
break;
}
outEvents.push_back(std::move(readEvents.front()));
readEvents.pop_front();
}

// store partial event if necessary
if (!readEvents.empty())
{
inputBuffer.StoreReadPartialByteSequence(std::move(readEvents.front()));
readEvents.pop_front();
FAIL_FAST_IF(!(readEvents.empty()));
}
std::move(outEvents));
}
return Status;
}
Expand Down
Loading

0 comments on commit 599b550

Please sign in to comment.