Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable multi-source input in marian-server #505

Closed
wants to merge 357 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
357 commits
Select commit Hold shift + click to select a range
0baf0ae
Merged PR 9998: new build option NO_BOOST to disable inclusion of Boo…
frankseide Oct 22, 2019
8e8aae3
use std::filebuf and InputFileStream derives from istream rather than…
Oct 22, 2019
61d7ba4
delete file buffer in destructor
Oct 22, 2019
84ce4cd
replace InputFileStream with InputFileStreamNew in some files
Oct 22, 2019
750d204
replace InputFileStream with InputFileStreamNew in everything except …
Oct 22, 2019
d342e44
replace all InputFileStream with InputFileStreamNew
Oct 22, 2019
b8e2f06
delete InputFileStream
Oct 22, 2019
a10f5fb
start OutputFileStream
Oct 22, 2019
908dcad
try using OutputFileStream
Oct 22, 2019
e4182e5
flush stream to make sure everything comes out. Doens't work
Oct 22, 2019
1f53dea
flush both filters
Oct 23, 2019
c5c1a4c
replace all OutputFileStream with OutputFileStreamNew
Oct 23, 2019
33f6002
delete OutputFileStream and other crappy classes
Oct 23, 2019
cdf9b52
debuggging message
Oct 23, 2019
999b8fa
start temp file class
Oct 23, 2019
f4056af
use new temp file class in corpus
Oct 23, 2019
d1ed2c2
More helpful error message when CUDA libraries cannot be found.
ugermann Oct 23, 2019
8f591f6
use new temp file class everywhere
Oct 23, 2019
46b8eba
delete old temp file class
Oct 23, 2019
9a89c25
Roll-back of all changes not directly related to graceful shutdown.
ugermann Oct 23, 2019
7159f45
make sure file open even if it doesn't exist
Oct 23, 2019
7c7fc54
CreateFileName()
Oct 23, 2019
79d8307
merge NormalizeTempPrefix into CreateFileName()
Oct 23, 2019
16d4ce6
rename name_ -> file_
Oct 23, 2019
95fc443
string -> path
Oct 23, 2019
9963bea
inherite from OutputFileStreamNew instead of std::fstream
Oct 23, 2019
6915f9c
don't delete inStream_. Passing to a unique ptr
Oct 23, 2019
b26ce69
store iStream_ in unique pointer and move to new location when requested
Oct 23, 2019
41d247e
rename new classes. Delete debugging messages
Oct 23, 2019
3a5d836
merge --ours
Oct 23, 2019
ff50a04
delete commented out code
Oct 24, 2019
0c29e7f
add getline back in
Oct 24, 2019
577dbe7
Merge pull request #517 from marian-nmt/ug-better-cmake-message-when-…
snukky Oct 24, 2019
8972fd9
fix code in files that didn't compile with default cmake command
Oct 24, 2019
12c07c0
change std::getline -> io::getline
Oct 24, 2019
15c1fdf
sheath naked pointer
Oct 24, 2019
704f4e0
merge
Oct 24, 2019
8cfa33d
changes for code review
Oct 24, 2019
05695ff
Fix whitespace and end of line.
ugermann Oct 24, 2019
216608d
manually roll back 'figure out where temp file class is used'
Oct 24, 2019
723b7fc
roll back ken's non-boost code
Oct 24, 2019
7676c5c
merge
Oct 24, 2019
cf283d0
delete unused includes
Oct 24, 2019
47000c5
roll back changes to vs project files
Oct 24, 2019
d81ec71
Merged PR 10111: Just the checkins that moved code from .h -> .cpp
Oct 24, 2019
9d31fb5
merge with previous pull request
Oct 24, 2019
5bafca4
more getline reversion
Oct 24, 2019
a73a5e4
don't use std::make_unique. Not C++11 compatible apparently
Oct 24, 2019
ae84923
put back header include and #ifdef
Oct 24, 2019
583a927
use original mkstemp to create temp file on linux
Oct 25, 2019
6745abd
delete ken's error handling gunk
Oct 25, 2019
bd0118c
const NormalizeTempPrefix(). Retain all streamBuf so we can delete th…
Oct 25, 2019
49e8792
istreambuf for istream class
Oct 25, 2019
e94fd17
Use auto /Frank
Oct 25, 2019
bbbc9a3
resolve remaining issues
Oct 25, 2019
6dcfdaa
Merge branch 'master' into ug-graceful-shutdown
ugermann Oct 25, 2019
1469e45
updated reference to sentencepiece repo to version without softlink
frankseide Oct 25, 2019
ebec822
extra assert
Oct 25, 2019
deb9b6a
implement setbufsize
Oct 25, 2019
fe72735
Update scheduler.cpp
ugermann Oct 25, 2019
d60cae0
Update scheduler.h
ugermann Oct 25, 2019
3dfd32c
Update scheduler.h
ugermann Oct 25, 2019
369148c
Update scheduler.h
ugermann Oct 25, 2019
82e82f6
Merged PR 10086: simplify file streams
Oct 25, 2019
ed5f586
bye bye boost
emjotde Oct 26, 2019
5406641
handle warnings from FBGEMM when compiled statically
emjotde Oct 26, 2019
4f4989b
Bug fix: installSignalHandlers_() -> installSignalHandlers() in src/t…
ugermann Oct 26, 2019
0f7e40e
Bug fix: signalHandler_ -> signalHandler in src/training/scheduler.cpp
ugermann Oct 26, 2019
b22d316
small clean-up
emjotde Oct 26, 2019
1174cec
merge with public master
emjotde Oct 26, 2019
31bac7d
switch off gradient clipping
emjotde Oct 26, 2019
e20de49
move pointer to fitting regression tests
emjotde Oct 26, 2019
27ce118
add comments about gradient clipping bug
emjotde Oct 26, 2019
abf95d0
move regtest pointer
emjotde Oct 26, 2019
e375905
update changelog and version
emjotde Oct 26, 2019
68f9d90
fix changelog
emjotde Oct 26, 2019
463b29c
Merged PR 10144: Bye Bye Boost
emjotde Oct 26, 2019
1ab2484
Merge branch 'master' into mjd/syncWithPublic
emjotde Oct 26, 2019
55e3bcf
bump patch and changelog
emjotde Oct 26, 2019
8fc0772
move regression test pointer
emjotde Oct 26, 2019
8653370
tweaks for windows compile
Oct 27, 2019
aff86fd
Update scheduler.cpp
ugermann Oct 27, 2019
ad6f7ff
Changes as requested by @snukky.
ugermann Oct 27, 2019
9f669f9
Merge pull request #442 from marian-nmt/ug-graceful-shutdown
emjotde Oct 27, 2019
b478917
Merge branch 'pmaster' into mjd/syncWithPublic
emjotde Oct 27, 2019
398ed0c
update version and changelog
emjotde Oct 27, 2019
05c24fc
Merged PR 10147: tweaks for windows compile
Oct 27, 2019
321fab9
move sigterm handling out of class Scheduler
emjotde Oct 28, 2019
2dfb302
Merged PR 10148: Update internal master with changes to public master
emjotde Oct 28, 2019
0a89e8f
fix compilation for various gcc and cuda combinations
emjotde Oct 28, 2019
a0e4722
remove compatibility check for gcc 4.9 and below, as we cannot compil…
emjotde Oct 28, 2019
47eb656
back-compat with CUDA 8.0
emjotde Oct 29, 2019
bf10d36
disable FBGEMM by default
emjotde Oct 29, 2019
42250be
Add --authors and --cite flags
snukky Oct 29, 2019
5476f62
add checking for GPU compute capability
emjotde Oct 29, 2019
6b6444d
fix tests for GPUs with lower compute capability
emjotde Oct 29, 2019
3a16eb6
remove left-over debug info
emjotde Oct 29, 2019
880cc5d
make sure that old GPUs do not run cublasGemmEx
emjotde Oct 29, 2019
9d8777b
fix CUDA 8.0 compilation
emjotde Oct 29, 2019
939625f
simplify code for compute capability of matrix products
emjotde Oct 29, 2019
af7df9d
Address comments from review
emjotde Oct 29, 2019
0bb5dd1
Merged PR 10173: Fix all gcc and cuda incompatibilities accros differ…
emjotde Oct 29, 2019
81c14cd
fix compilation error
emjotde Oct 29, 2019
0116646
Update badges
snukky Oct 30, 2019
6d330e3
Merge branch 'master' of https://github.com/marian-nmt/marian-dev int…
snukky Oct 30, 2019
9e8c772
Add a comment to authors()
snukky Oct 30, 2019
b930762
Update regression-tests submodule
snukky Oct 30, 2019
b32e677
Merge pull request #527 from marian-nmt/add-authors-and-cite-flags
snukky Oct 30, 2019
e53a46a
Fix multi-line string
snukky Oct 30, 2019
7b36b32
Use ccache for faster compilation if available. (#525)
ugermann Oct 30, 2019
7efde49
Merged PR 10205: towards multiple separated parameter types
emjotde Oct 30, 2019
8c3cb06
move regression tests
emjotde Oct 30, 2019
5ed441f
Merged PR 9284: Packed model support in production
ykim362 Nov 1, 2019
5050aad
Fix vs project warnings
ykim362 Nov 1, 2019
52c7618
Merged PR 10259: switch to change abort to exception
emjotde Nov 1, 2019
54fba78
update changelog and version
emjotde Nov 1, 2019
e3399ef
merge
emjotde Nov 1, 2019
1abd125
bump version once more
emjotde Nov 1, 2019
7886867
fix unit test
emjotde Nov 1, 2019
baf4d29
make compile for GCC >= 7
emjotde Nov 3, 2019
f042eae
Merged PR 10268: Make compile for GCC >= 7
emjotde Nov 5, 2019
78f671c
Merged PR 10297: Fast option look-up with lazy option contruction
emjotde Nov 5, 2019
7ba804b
bump patch version, update changelog
emjotde Nov 5, 2019
03bb51c
Merge branch 'master' into pmaster
emjotde Nov 5, 2019
233281c
Merged PR 10304: Remove naked pointers, add binary read mode
emjotde Nov 5, 2019
0cb8125
Merge branch 'master' into pmaster
emjotde Nov 5, 2019
a8826c5
make compile with cudnn
emjotde Nov 6, 2019
1171e3d
Merge branch 'master' into pmaster
emjotde Nov 6, 2019
59011c8
added ssse4.2 support
alvations Nov 6, 2019
ca61033
remove reference counting from fastopt
emjotde Nov 9, 2019
13e6182
bump version and changelog
emjotde Nov 9, 2019
66b95a5
Sort custom cmake options alphabetically (#547)
ugermann Nov 11, 2019
fd3404d
Merged PR 10382: Fix cublas math mode querying
emjotde Nov 11, 2019
c96d709
Merged PR 10376: Fix memory-mapping bug for default parameter-object …
emjotde Nov 11, 2019
5dfd8a7
Merged PR 10383: Align items while saving at 256-byte boundary
emjotde Nov 11, 2019
9353f06
Merged PR 10373: Replace IntrusivePtr with std::uniq_ptr in FastOpt
emjotde Nov 11, 2019
189d89e
Merged PR 10333: Batch-pruning in beam search
emjotde Nov 12, 2019
9a4f784
merge with internal master
emjotde Nov 12, 2019
5fb31b2
Merged PR 10415: Fix windows build errors
ykim362 Nov 13, 2019
4c0698f
Const diligence and thread safety with respect to Vocab and Shortlist…
ugermann Nov 22, 2019
d394641
Merge pull request #546 from alvations/marian-from-origin
emjotde Nov 22, 2019
61c0195
Bug fixes (missing const; instantiate *gen_) in SampledShortlistGener…
ugermann Nov 22, 2019
76e2293
Merge branch 'master' into ug-const-diligence
ugermann Nov 22, 2019
a27fda7
Return exit code 15 (SIGTERM) after SIGTERM. (#551)
ugermann Nov 22, 2019
b19820c
Merged PR 10588: bug fix: guided-alignment loss should not normalize …
frankseide Nov 22, 2019
a1763e2
Const diligence and thread safety ... (#553)
ugermann Nov 23, 2019
93b7ed8
Merged PR 10553: Fix multiple problems in reduce kernels that occurre…
emjotde Nov 26, 2019
9e090e3
bump version
emjotde Nov 26, 2019
26859d2
Merge with ms-internal master
emjotde Nov 26, 2019
f07042b
bump version based on PRs
emjotde Nov 26, 2019
e12a5db
regression-tests/
emjotde Nov 26, 2019
120ab8f
increase tolerance for unit test
emjotde Nov 26, 2019
49b54a6
catch n and y as strings in FastOpt instead of as boolean values
emjotde Nov 27, 2019
0197b89
bump version
emjotde Nov 27, 2019
f007772
Update README.md
emjotde Nov 28, 2019
9fd5ba9
Update README.md
emjotde Nov 28, 2019
9c9a240
Merged PR 10266: FBGEMM based Int8 model
ykim362 Dec 3, 2019
183d0b8
Update fbgemm submodule to the master branch
ykim362 Dec 3, 2019
67e9bc4
Fix windows build warning
ykim362 Dec 3, 2019
82da7d5
do not warn for a number of warnings for cpp files from src/3rd_party/..
emjotde Dec 4, 2019
b20d9d7
Merge branch 'master' into pmaster
emjotde Dec 4, 2019
4b4e6b5
Merged PR 10736: Unify options and type names
emjotde Dec 5, 2019
a6d0af0
bump version
emjotde Dec 5, 2019
6b3c8f5
Merge branch 'master' into pmaster
emjotde Dec 5, 2019
34e99da
Fix compilation on CPUs that don't support AVX and some white space i…
XapaJIaMnu Dec 5, 2019
ac20f77
abort when trying to use packed8 or packed16 without FBGEMM compiled …
emjotde Dec 5, 2019
4993417
Merge branch 'master' into pmaster
emjotde Dec 5, 2019
c12fd5b
Merge branch 'master' of ssh://github.com/marian-nmt/marian-dev into …
emjotde Dec 5, 2019
6224cb1
bump patch
emjotde Dec 5, 2019
734a879
Use __AVX__ compiler define instead of custom and broken NO_AVX flag.…
kpu Dec 5, 2019
c343ced
Add lexical shortlists to marian-server (#560)
snukky Dec 6, 2019
eb5f972
Fix word weighting with max length cropping (#562)
snukky Dec 6, 2019
2b14d49
Allow file name templated valid-translation-output files (#549)
ugermann Dec 9, 2019
5be8558
ammend changelog and bump version
emjotde Dec 13, 2019
e0500b2
Merged PR 10827: Sequential unlikelihood training and fixed gather op…
emjotde Dec 13, 2019
eba7aed
increase version
emjotde Dec 13, 2019
bab02e3
Merge branch 'cblas'
emjotde Dec 13, 2019
f882f27
Merged PR 10692: new factor conditioning, inline fixing suppression, …
frankseide Dec 20, 2019
0dc1ef1
Merged PR 10797: Differentiate packed8 type by layout
emjotde Dec 23, 2019
2bd986d
update version and changelog
emjotde Dec 23, 2019
24f062c
Add option to print word-level scores (#501)
snukky Jan 4, 2020
88d9980
Merged PR 10996: A number of smaller changes and clean-up
emjotde Jan 5, 2020
164d26c
Merged PR 10999: Splitting up add_all.h into *.h, *.cu and *.inc
emjotde Jan 6, 2020
0fab6ea
Merged PR 10709: Disable a warning in FBGEMM code. This issue only ap…
ykim362 Jan 11, 2020
af02867
Merged PR 11103: Clear cache for RNN object between batches
emjotde Jan 11, 2020
1f7a63d
bump version
emjotde Jan 11, 2020
703fcf4
update regression test pointer
emjotde Jan 11, 2020
b822cd4
move regression test pointer
emjotde Jan 11, 2020
b3a2310
Merged PR 11188: Handle empty inputs with batch purging
emjotde Jan 17, 2020
cfdde15
Merge branch 'master' into ug-const-diligence
ugermann Jan 29, 2020
7228698
Make Vocab const in beam search. Remove some trailing whitespace.
ugermann Jan 29, 2020
7f4c730
Merge pull request #591 from marian-nmt/ug-const-diligence
ugermann Jan 29, 2020
a43ccd6
Mention sentence cropping for --valid-max-length (#588)
snukky Jan 29, 2020
ad28f99
Add --valid-reset-stalled (#587)
snukky Jan 29, 2020
22ad592
Add error message about missing --no-restore-corpus (#584)
snukky Jan 29, 2020
5336040
Update script exporting embeddings to support tied embeddings (#569)
snukky Jan 29, 2020
990aeb5
Add a warning message for an incorrect use of output sampling (#585)
snukky Jan 31, 2020
1044f7f
Merged PR 11434: Fixes empty line handling with factored segmenter
emjotde Feb 7, 2020
bb44c2a
Remove printout of variable to std::cerr (#596)
ugermann Feb 11, 2020
64a67d5
Remove unused variables (#571)
XapaJIaMnu Feb 11, 2020
a2a567c
clang: Disable unused-function warnings for 3rd-party NCCL library
kpu Feb 11, 2020
0873311
Gate warning on clang
kpu Feb 11, 2020
e09f713
Merged PR 11566: removed overstuff and understuff features
frankseide Feb 15, 2020
63272d1
Merge pull request #599 from marian-nmt/nccl-unused-function
kpu Feb 15, 2020
24df8f1
Revert "Merge pull request #599 from marian-nmt/nccl-unused-function"
emjotde Feb 15, 2020
03fbf31
Combine two for-loops in nth_element.cpp on CPU (#601)
PinzhenChen Feb 26, 2020
67b055f
Update the submodule regression-tests
snukky Feb 26, 2020
00d2e99
Add support for compiling on Mac (and clang) (#598)
snukky Mar 5, 2020
45b83b2
Merged PR 11895: Use lowest() for INVALID_PATH_SCORE
emjotde Mar 7, 2020
015a218
Merged PR 11312: Guard scheduler against circular references
emjotde Mar 7, 2020
9f29403
version and changelog
emjotde Mar 7, 2020
bec7e02
bump version
emjotde Mar 7, 2020
f4ea823
sync with internal branch
emjotde Mar 7, 2020
cf7f032
Merged PR 11920: Compare external master against internal master
emjotde Mar 10, 2020
aad22c9
Add option for printing CMake cached variables (#583)
snukky Mar 10, 2020
8640031
resolve merge conflicts
emjotde Mar 10, 2020
4b23fe7
update to marian-dev
emjotde Mar 10, 2020
3c7a88f
update changelog and version
emjotde Mar 10, 2020
69d6f02
Merged PR 11998: Lazy init for cuda handles (cusparse and cublas)
emjotde Mar 13, 2020
adba021
bump version
emjotde Mar 13, 2020
f1be95f
Merged PR 11929: Move around code to make later comparison with FP16 …
emjotde Mar 14, 2020
32186be
Merge branch 'pmaster'
emjotde Mar 14, 2020
a5a5c62
bump version
emjotde Mar 14, 2020
9ccb075
Add templates for GitHub issues and pull requests
snukky Mar 17, 2020
95c65bb
Update logging messages for training
snukky Mar 17, 2020
d2b4f38
Merged PR 11831: Change the weight matrix quantization to use 7-bit m…
ykim362 Mar 25, 2020
4a1d918
Merged PR 12243: For int8 quantized model, use int8 quantization for …
ykim362 Mar 27, 2020
696bb44
Support tab-separated inputs (#617)
snukky Apr 10, 2020
2248a65
resolve merge conflicts
emjotde Apr 10, 2020
e78a068
bump version
emjotde Apr 10, 2020
a1d2f94
actually save the merge file
emjotde Apr 10, 2020
f561e12
use float values for catch::Approx
emjotde Apr 10, 2020
e6f82f5
Fix TSV training with mini-batch-fit after the last merge
snukky Apr 11, 2020
3126e2b
Update submodule regression-tests
snukky Apr 11, 2020
485a077
fix 0 * nan behavior in concatention
emjotde Apr 11, 2020
d593608
Fix 0 * nan behavior due to using -O3 instead of -OFast (#630)
emjotde Apr 11, 2020
fe0572b
Merge branch 'pmaster'
emjotde Apr 11, 2020
39cea6d
Update submodule regression-tests
snukky Apr 11, 2020
c70d93d
Merge branch 'pmaster'
emjotde Apr 11, 2020
cbb2990
Support relative paths in shortlist and sqlite options (#612)
snukky Apr 12, 2020
18e6a9a
Fix Iris example on CPU (#623)
snukky Apr 12, 2020
81631e8
Dump version
snukky Apr 12, 2020
c0b6686
Merge branch 'pmaster'
emjotde Apr 12, 2020
5af9899
Merged PR 12442: cherry pick a few improvements/fixes from Frank's br…
emjotde Apr 14, 2020
5e21a28
update changelog and version
emjotde Apr 14, 2020
3c0c1e1
python3 shebang from #620 (#621)
kpu Apr 16, 2020
58e316d
Update submodule regression-tests
snukky Apr 26, 2020
f2347a8
Update Simple-WebSocket-Server and move it to submodules (#639)
snukky Apr 27, 2020
d3c8fbd
Merge with multi-source-server
snukky May 1, 2020
455724d
Add function converting multi-line tab-separated textual input
snukky May 1, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions src/command/marian_server.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -14,22 +14,29 @@ int main(int argc, char **argv) {
// Initialize translation task
auto options = parseOptions(argc, argv, cli::mode::server, true);
auto task = New<TranslateService<BeamSearch>>(options);
auto inputNum = options->get<std::vector<std::string>>("vocabs").size() - 1;

// Initialize web server
WSServer server;
server.config.port = (short)options->get<size_t>("port", 8080);

auto &translate = server.endpoint["^/translate/?$"];

translate.on_message = [&task](Ptr<WSServer::Connection> connection,
translate.on_message = [&task, inputNum](Ptr<WSServer::Connection> connection,
Ptr<WSServer::Message> message) {
// Get input text
auto inputText = message->string();
auto sendStream = std::make_shared<WSServer::SendStream>();

// Translate
timer::Timer timer;
auto outputText = task->run(inputText);
std::vector<std::string> inputs;
if (inputNum >= 2) {
inputs = utils::tsv2lists(inputText, inputNum);
} else {
inputs = std::vector<std::string>({inputText});
}
auto outputText = task->run(inputs);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this logic std::string run(std::string) is used nowhere, am I right? So either this version of the function can be removed, or the logic here can be something similar to this:

std::string o;
if(inputNum >= 2) {
  o = task->run(utils::tsv2lists(inputText, inputNum));
} else {
  o = task->run(inputText);
}

I think the latter is cleaner, but feel free to object.

LOG(info, "Best translation: {}", outputText);
*sendStream << outputText << std::endl;
LOG(info, "Translation took: {:.5f}s", timer.elapsed());
Expand Down
21 changes: 21 additions & 0 deletions src/common/utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
#endif
#include <codecvt>
#include <cwctype>
#include <vector>

namespace marian {
namespace utils {
Expand Down Expand Up @@ -93,6 +94,26 @@ std::string join(const std::vector<std::string>& words, const std::string& del /
return ss.str();
}

std::vector<std::string> tsv2lists(const std::string& inputText, int inputNum) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment documenting the function, ideally with an example input and output.

std::string line_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the trailing underscore from a local variable.

std::vector<std::vector<std::string>> inputLists(inputNum);
std::istringstream inputStream(inputText);
while (std::getline(inputStream, line_)) {
auto items = marian::utils::splitAny(line_, "\t");
std::cerr << "Split into " << items.size() << std::endl;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the debug.

for (size_t i = 0; i < items.size(); ++i) {
inputLists[i].push_back(items[i]);
}
}

std::vector<std::string> inputs;
for (auto &inputList : inputLists) {
inputs.push_back(marian::utils::join(inputList, "\n"));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment why this part is needed.

return inputs;
}


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double whitespace.

// escapes a string for passing to popen, which uses /bin/sh to parse its argument string
static std::string escapeForPOpen(const std::string& arg) {
// e.g. abc -> 'abc'; my file.txt -> 'my file.txt'; $10 -> '$10'; it's -> 'it'\''s'
Expand Down
2 changes: 2 additions & 0 deletions src/common/utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,8 @@ std::vector<std::string> splitAny(const std::string& line,

std::string join(const std::vector<std::string>& words, const std::string& del = " ");

std::vector<std::string> tsv2lists(const std::string& inputText, int inputNum);

std::string exec(const std::string& cmd, const std::vector<std::string>& args = {}, const std::string& arg = "");

std::pair<std::string, int> hostnameAndProcessId();
Expand Down
10 changes: 8 additions & 2 deletions src/data/text_input.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,8 +23,11 @@ const SentenceTuple& TextIterator::dereference() const {
TextInput::TextInput(std::vector<std::string> inputs,
std::vector<Ptr<Vocab>> vocabs,
Ptr<Options> options)
: DatasetBase(inputs, options), vocabs_(vocabs) {
// note: inputs are automatically stored in the inherited variable named paths_, but these are
: DatasetBase(inputs, options),
vocabs_(vocabs),
maxLength_(options_->get<size_t>("max-length")),
maxLengthCrop_(options_->get<bool>("max-length-crop")) {
// note: inputs are automatically stored in the inherited variable named paths_, but these ar
// texts not paths!
for(const auto& text : paths_)
files_.emplace_back(new std::istringstream(text));
Expand All @@ -43,6 +46,9 @@ SentenceTuple TextInput::next() {
std::string line;
if(io::getline(dummyStream, line)) {
Words words = vocabs_[i]->encode(line, /*addEOS =*/ true, /*inference =*/ inference_);
if(this->maxLengthCrop_ && words.size() > this->maxLength_) {
words.resize(maxLength_);
}
if(words.empty())
words.push_back(Word::ZERO); // @TODO: What is this for? @BUGBUG: addEOS=true, so this can never happen, right?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the EOS token needs to be added after cropping the sentence, similar to https://github.com/marian-nmt/marian-dev/blob/master/src/data/corpus_base.cpp#L211.
Then that comment with TODO can be removed.

tup.push_back(words);
Expand Down
3 changes: 3 additions & 0 deletions src/data/text_input.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ class TextInput : public DatasetBase<SentenceTuple, TextIterator, CorpusBatch> {

size_t pos_{0};

size_t maxLength_{0};
bool maxLengthCrop_{false};

public:
typedef SentenceTuple Sample;

Expand Down
7 changes: 7 additions & 0 deletions src/models/model_task.h
Original file line number Diff line number Diff line change
@@ -1,6 +1,9 @@
#pragma once

#include <string>
#include <vector>

#include "common/logging.h"

namespace marian {

Expand All @@ -10,5 +13,9 @@ struct ModelTask {

struct ModelServiceTask {
virtual std::string run(const std::string&) = 0;
virtual std::string run(const std::vector<std::string>&) {
ABORT("Not implemented");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a pure virtual function?

return "";
}
};
} // namespace marian
9 changes: 8 additions & 1 deletion src/translator/translator.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
#pragma once

#include <iostream>
#include <string>

#include "data/batch_generator.h"
#include "data/corpus.h"
#include "data/shortlist.h"
Expand Down Expand Up @@ -189,7 +192,11 @@ class TranslateService : public ModelServiceTask {
}

std::string run(const std::string& input) override {
auto corpus_ = New<data::TextInput>(std::vector<std::string>({input}), srcVocabs_, options_);
return run(std::vector<std::string>({input}));
}

std::string run(const std::vector<std::string>& inputs) override {
auto corpus_ = New<data::TextInput>(inputs, srcVocabs_, options_);
data::BatchGenerator<data::TextInput> batchGenerator(corpus_, options_);

auto collector = New<StringCollector>();
Expand Down