Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

At about 70 client connections the client/server do not show/list all clients anymore #547

Closed
corrados opened this issue Aug 27, 2020 · 63 comments
Labels
bug Something isn't working

Comments

@corrados
Copy link
Contributor

See the report in this post: #455 (comment)

@corrados corrados added the bug Something isn't working label Aug 27, 2020
@corrados
Copy link
Contributor Author

I did a quick test where I created 100 artificial clients with the following code in socket.cpp:

            // server:

            int iCurChanID;

// TEST
for ( int i = 0; i < 100; i++ )
{
    if ( i > 0 )
    {
        RecHostAddr.iPort++;
    }

            if ( pServer->PutAudioData ( vecbyRecBuf, iNumBytesRead, RecHostAddr, iCurChanID ) )
            {
                // we have a new connection, emit a signal
                emit NewConnection ( iCurChanID, RecHostAddr );

                // this was an audio packet, start server if it is in sleep mode
                if ( !pServer->IsRunning() )
                {
                    // (note that Qt will delete the event object when done)
                    QCoreApplication::postEvent ( pServer,
                        new CCustomEvent ( MS_PACKET_RECEIVED, 0, 0 ) );
                }
            }
}

But I cannot reproduce the issue. I could see all 100 faders in the client and the server table was also complete. Maybe the problem was caused by the CPU being at 100% which can cause weird effects.

@storeilly
Copy link

We were doing undocumented tests last night on a server I've compiled to allow 255 clients and had similar issues, on one of my clients I could see 102 clients while the other was stuck at 53. I was watching htop and no cpu (4 total) was over 50% I plan to do this test properly with properly recorded results over the next few days. Audio at 102 clients was badly broken but intelligible.

@maallyn
Copy link

maallyn commented Aug 27, 2020

I just completed the following test and observation.

First, I started with 30 connections on the server. The client could see those 30 connections.

Then I added another 20 clients slowly, so as to not overrun the server with fast connection requests; I did each one every 5 seconds.

Then someone else came on (client with a different name)

However, with 50 connections, I could not see that person on my Jamulus client (Windows 10). However, I could see him on the master list as viewed with my connection status panel.

I then reduced the number of connections (by stopping each) until I got down to about 25 connections. Now I could see all of them without having to scroll sideways. However, I still could not see the one client whom I was talking with.

I then stopped and re-started the client and I did see him.

This suggests to me that there may be some data stuck in the client that had to be reset upon a restart of the client.

Throughout all of this, the sound from that one client was very good despite the fact that I could not see his fader on my panel.

Throughout all of this, there were only occasional instances when any of the four dedicated CPU's hit over fifty percent.

@corrados
Copy link
Contributor Author

@softins Maybe the issue is related to #255. The client list for the audio mixer board gets quite big in case of 100 clients (similar to the server list). Is it possible that you support us with a wireshark analysis of that situation? Maybe if a test is running with more than 50 clients, you could log onto that session and check the protocol messages sent to your client?

@softins
Copy link
Member

softins commented Aug 29, 2020

Sure, I'd be happy to help, although not available today (Sat).

@softins
Copy link
Member

softins commented Aug 29, 2020

Just a couple of general comments before I disappear for the day:

  • If some device in the path is causing fragmented IP packets to drop, then the whole list would be lost, not partially received. Like when a client sees no servers in Default or All Genres. So it looks to me more likely to be something got out of sync between the protocol and the GUI. That is also supported by Mark's comment "However, with 50 connections, I could not see that person on my Jamulus client (Windows 10). However, I could see him on the master list as viewed with my connection status panel."
  • In the situation where one or more clients are not shown to another client, but are evidently connected, one can cross-check with the display from http://jamulus.softins.co.uk

@corrados
Copy link
Contributor Author

So it looks to me more likely to be something got out of sync between the protocol and the GUI.

The protocol transmits messages "message by message" in the order the messages are scheduled. If one message does not make it through to the client because of fragmentation, the protocol mechanism will be stuck trying to transmit that one message and does not process any further messages. That would explain the described behaviour I think (or better, I guess... Therefore it would be good to get a proof of that by checking the network traffic with wireshark).

@maallyn
Copy link

maallyn commented Aug 29, 2020

Okay, thank you for the suggestion, Simon. Sometime next week when I have time, I will re-set up everything but install wireshark on both the server sunning the clients and the jamulus server itself and somehow capture the screens and record them as a video. I have a question. Can I attach a video or at least a link to a video on this ticket?

Or better yet, can I set up a jitsi or zoom session and show the stuff live?

@corrados
Copy link
Contributor Author

Thanks for offering the test. I guess it would be easier if you talk to softins when you start your test so that he can connect to your server and capture the wireshark output on his side. So you should do the test when you and softins have time for it.

@softins
Copy link
Member

softins commented Aug 29, 2020

Hi Mark, there's no Simon on this thread. I'm Tony (also on Facebook). Happy to liaise with you in the week. I'm on UK time, I think you are EDT? If you haven't seen it, check out my repo https://github.com/softins/jamulus-wireshark

@maallyn
Copy link

maallyn commented Aug 30, 2020

I'm embarrassed that I made this mistake. I keep thinking that Corrados is Simon. Thank you for straightening me out. If you are in UK, then the best time for me would be my morning, which is your late afternoon. So, if I try to do this around nine in the morning Pacific U.S. Time, that would be six in the evening for you, is that an okay time?

@corrados
Copy link
Contributor Author

I was looking at the original error description again. Here's what I found:

And the client slider panel stopped showing sliders after about 60 or so connections. However, the total count (the number indicated adjacent to the server name on the master server listings showed the total count (81), along with the total number on the top of the client's panel.

I checked in the code of the client that the total count (in this case 81) is only shown if a client list protocol message for the audio mixer board was correctly and completely received. So maybe the issue is not related to #255. But I think it still does make sense to check the protocol anyway under this stress situation to check that everything works as expected.

@softins
Copy link
Member

softins commented Aug 30, 2020

If you are in UK, then the best time for me would be my morning, which is your late afternoon. So, if I try to do this around nine in the morning Pacific U.S. Time, that would be six in the evening for you, is that an okay time?

Except for the week straddling Oct/Nov (we end DST a week earlier than the US), we are 8 hours ahead of Pacific time, so your 9am is UK 5pm. That would be fine for me. I normally go to eat about 6pm or so. I think any day this week works for me. You can find me on Discord as Tony M or on Facebook Messenger as Tony Mountifield

@maallyn
Copy link

maallyn commented Sep 1, 2020

I completed a test where I had 60 real connections (not using Volker's modification to the socket.cpp file). Those 60 connections were done from another server in the cloud.

Then I made a connection from my client, which is under a different name and could be identified on the mixer panel.

I confirm with the latest download for windows 10, that could be seen at the far right end of the mixer with the bottom slider all the way to the right.

Unfortunately, I cannot create over 60 clients from a ubuntu server because the jack software starts to break down, which is an unrelated issue here.

@corrados
Copy link
Contributor Author

corrados commented Sep 2, 2020

We were doing undocumented tests last night on a server I've compiled to allow 255 clients and had similar issues, on one of my clients I could see 102 clients while the other was stuck at 53.

@storeilly Today I did some multithreading tests on your "jam mt 26" server and could reproduce the issue. And I think I know now whats the issue. If you quickly start a massive number of clients, we get a huge protocol traffic to the clients. For each new client which connects to the server, the complete clients list is updated and also immediately the mixer levels are updated. So there is massive number of protocol messages in the queue. According to the Jamulus protocol MAIN FRAME design, there is a "1 byte cnt" which identifies the messages in the queue. So we can have 256 different messages. If we now have that massive number of messages which are transmitted in a very short time, it can happen that the order of the received network packet is changed and that a packet which should be 256 protocol messages later processed is received, since the counter wraps around the protocol mechanism thinks it has received the correct message and acknowledges it. But it was the incorrect one (caused by the wrap around). If that happens, the protocol system gets stuck and no message is delivered anymore.

One solution to the problem would be to use 2 bytes instead of just one for the counter. But that would break the compatibility to old Jamulus versions (client and server) which is not good.

@softins Do you think with your wireshark tools, you could prove that my assumption is true?

@storeilly
Copy link

I see the logs, you were connecting at about 40 per second at one stage. Well done!! I don't have the ability to do that! I was working with @brynalf and we were leaving 5 seconds between each new connection, not sure if this helps?

@storeilly
Copy link

We were doing undocumented tests last night on a server I've compiled to allow 255 clients and had similar issues, on one of my clients I could see 102 clients while the other was stuck at 53.

@storeilly Today I did some multithreading tests on your "jam mt 26" server and could reproduce the issue. And I think I know now whats the issue.......
@softins Do you think with your wireshark tools, you could prove that my assumption is true?

I've just installed Tshark on that machine, so if you want to 'hit' it again just give me a little notice to start the capture and we can send it to @softins or yourself for analisys?

@softins
Copy link
Member

softins commented Sep 2, 2020

Just seen this. Happy to help. I usually capture using tcpdump rather than tshark, and then copy to my PC for viewing.

I have a script that sets up tcpdump to capture just protocol packets without the audio - makes the files a lot smaller!

[root@vps2 ~]# cat capture-jamulus-proto.sh 
#!/bin/sh

DATE=`date '+%Y%m%d-%H%M%S'`
FILE=jamulus-proto-$DATE.pkt

cd /var/tmp

tcpdump -C 128 -i eth0 -nn -p -s0 -w $FILE udp portrange 22120-22139 and '(udp[8:2] == 0 and udp[4:2]-17 == (udp[14]<<8)+udp[13])' </dev/null >/dev/null 2>&1 &

You may need to change the interface name from eth0 depending on your system.

@softins
Copy link
Member

softins commented Sep 2, 2020

@corrados It's certainly possible that the 8-bit sequence number rolled over. Happy to verify that if I can reproduce it or be sent a capture file.

If that is what is happening, then maybe when sending to a client, it could pause before using a sequence number that is still unacked from the previous time, and wait until the ack comes in? This situation would seldom occur in practice, so on the rare occasion it does, a few ms pause would be tolerable, I would think.

I haven't looked at the code, but that is how I would initially approach it.

@softins
Copy link
Member

softins commented Sep 2, 2020

The Jamulus dissector for Wireshark is a single .lua file, available at https://github.com/softins/jamulus-wireshark

@corrados
Copy link
Contributor Author

corrados commented Sep 3, 2020

Thanks for all your support. By looking at the code I have found a possible bug in the protocol mechanism. Hopefully I get some time this evening to further investigate this. I'll keep you informed by my progress.

@maallyn
Copy link

maallyn commented Sep 3, 2020

Folks:

Here is what I was finally able to do to overcome the restrictions of running 60 clients on Jackd.

I was able to create a second user on the ubuntu server that I am using to send the clients to newark-music.allyn.com and I was able to have that new user also send 60 clients over to newark-music as it had it's own instance of jackd, using a separate aloop audio device.

I configured the script to wait 2 seconds between each client invocation as to not to overrun the server.

After about 70 to 80 clients sent to the server, I notice the at listing from the master server started to have a big gap of empty lines in the listing for newark-music before the listing for the next jamulus server. The total count listed at the top (next to the server listing itself did say 100, but apparently not all 100 are listed; those after about 70 or so had a blank line.

I re-ran the scenario but with no delay between client invocations and the effect was the same. So to me, it does not seem to be an overrun with speed issue.

I also notice that I could not do a systemctl restart of the jamulus server; I had to do a full reboot of the machine.

After I rebooted the machine, the master server's listing of the newark serve remained stuck for a full three or four minutes until it finally reset to 0 at about the time that the newark server finisted it reboot. So, that seems to indicate to me that the master server does not correct the listing for a while after my server was rebooted.

I am wondering if the issue is with the master server being overloaded. I checked the logs on the newark server and I saw no error indications.

All of these tests have been made with no music sent on any of the clients.

I hope this all helps.

Mark Allyn
Bellingham, Washington

@corrados
Copy link
Contributor Author

corrados commented Sep 3, 2020

By looking at the code I have found a possible bug in the protocol mechanism.

Unfortunately it turned out not to be a bug.

since the counter wraps around the protocol mechanism thinks it has received the correct message and acknowledges it

I also do not think that this is the case anymore after I have checked some things today.

I re-ran the scenario but with no delay between client invocations and the effect was the same. So to me, it does not seem to be an overrun with speed issue.

That is interesting. I just run a set of tests this evening and I found out the opposite. When I start the clients without a delay, there is a threshold of 58 clients until the server get's confused and all sort of strange things happen. If I put a delay of about a second after the creation of each test client, I can start more than 58 and do not see the issue.

I'll further investigate the issue...

@maallyn
Copy link

maallyn commented Sep 3, 2020

Just out of curiosity, I slowed down the script so that it issued a client connection once every 20 seconds. This got interesting. The missing connections on the listing from the master server git fewer. It made it to about 80 connections (instead of upper 60's low 70's). But there was still a gap in the listing and I could not connect my own client from my PC after we have 90 connections (server has capacity of 100).

At 45 connections, I then initiated my own connection from my PC and was able to hear myself. However after about 75 connections, my return sound was very much warbly and distorted. I checked htop and found no cpu's hitting over 80 percent and I can see all four CPU's engaged. This is a dedicated cpu instance on Linode/Newark.

I check network and disk utilization on the Linode dashboard and the network never hit more than about 7 MB outbound. Disk and memory were only nominal.

I am wondering if we may have something both performance (ability to handle fast multiple connections) as well as functional (slowing connection requests to one per 20 seconds) reduced, but not eliminated the issues.

This entire session would have resulted in a too big of a file if I ran tcpdump.

I hope this all helps; if there anything I can try to do more, please let me know.

Mark

@maallyn
Copy link

maallyn commented Sep 4, 2020

How do I get and compile in your change? Just do a new git sync or do I need a tag or CONFIG?

@corrados
Copy link
Contributor Author

corrados commented Sep 4, 2020

If you are in Git master, a git pull should be sufficient to get the latest changes.

@storeilly
Copy link

The network only hit 80MB at about 21:30 last night (the resolution drops on AWS as time progresses). The network capacity is 7TB so I doubt that is the issue. I'll build that commit shortly. Thanks @corrados

@softins
Copy link
Member

softins commented Sep 8, 2020

Noticed a test was done last night, so uploaded files to dropbox jamulus-proto-20200907-112301.pkt and jamulus10.log

There was nothing of interest in that packet file. Just a short-lived connection from a client in Malaysia at around 17:43 UTC yesterday.

@softins
Copy link
Member

softins commented Sep 8, 2020

However, I did notice something interesting. When I deliberately slowed down the generation of clients to one every 20 seconds, I still saw the issue with missing clients (and distorted sound) after about 70 client connections. The display at jamulus.softins.co.uk worked fine with automatic updating until about 75 clients and then it stopped and showed newark-music.allyn.com with no clients.

@maallyn Now that's interesting, and I'd like to observe that. I now have a tcpdump on the backend of Jamulus Explorer (capturing specific IP addresses including newark-music), and a corresponding one on newark-music (capturing only traffic with my server and with client.allyn.com). If you can rerun your big test when convenient, I'll look at the traces. Please ping me on discord before you do, so I can make sure I am watching Explorer

@softins
Copy link
Member

softins commented Sep 8, 2020

OK, I have looked at the packet traces from both newark-music and jamulus.softins while you did the test of one new client every 20 seconds. As we saw, jamulus.softins stopped displaying the Version/OS and client list for newark-music once the number of clients reached 62. This is partly due to the design of the jamulus explorer backend: it sends out all its pings to the servers in the server list; when it gets a ping back it sends a version/OS request and a client list request. Because some servers will not respond, it needs to wait until it has received no packets for a certain length of time. This idle timeout is currently 1.5 seconds, which is usually plenty, after which it sends the accumulated data back to the jamulus explorer front end. Increasing the idle timeout makes it take longer for the front end to display when switching genres.

But when the number of clients in the jamulus server reaches a threshold, the delay in responding starts to increase disproportionately, making the replies too late to be caught by the jamulus explorer client. This is what I observed in this test:

Clients Delay in responding to CLM ping or request
61 ~0.8s
62 ~2s
63 ~4s
64 ~6s
65 ~8s
66 ~10s

I looked at the other traffic at the time, and while there is a lot of traffic taken with sending level lists and client lists to the connected clients, there are still a lot of gaps, indicating that it is not due to network saturation. It is interesting to see how the delay increases so much.

@softins
Copy link
Member

softins commented Sep 8, 2020

@maallyn On client.allyn.com, I have also made an updated version of your junker script as junker2 to give the clients individual names:

#/bin/bash
for i in {1..46}
do
  sleep 20
  NAME=`echo -n Test $i | base64`
  INIFILE=".jamulus$i.ini"
  echo "<client><name_base64>$NAME</name_base64></client>" >$INIFILE
  /home/maallyn/jamulus/Jamulus -i $INIFILE -j -n --connect 172.104.29.25 >/dev/null 2>&1 &
done

@maallyn
Copy link

maallyn commented Sep 9, 2020 via email

@softins
Copy link
Member

softins commented Sep 9, 2020

If replying to a github message by email, you need to avoid quoting the message being replied to! I discovered this myself the other day.

@maallyn
Copy link

maallyn commented Sep 10, 2020

I just did test using the feature_protosplit.

Here is what I did on newark_music.allyn.com to do build for both client and serve:

==================================
#!/bin/bash
cd /home/maallyn
rm -rf jamulus
git clone https://github.com/corrados/jamulus.git
cd jamulus
git fetch --all --tags
git tag
git checkout feature_protosplit

Do any local changes here

git apply /home/maallyn/max-client.patch

move to client

cd /home/maallyn
ssh maallyn@client.allyn.com rm -rf /home/maallyn/jamulus
rsync -av jamulus maallyn@client.allyn.com:

do compile on client

ssh maallyn@client.allyn.com rm -f client_compile.sh

rm -f client_compile.sh

cat << 'EOF' > client_compile.sh
#!/bin/bash

cd jamulus
qmake Jamulus.pro
make clean
make
EOF

chmod +x client_compile.sh
scp client_compile.sh maallyn@client.allyn.com:client_compile.sh
ssh maallyn@client.allyn.com /bin/bash client_compile.sh

do compile here

cd jamulus
server_qmake="CONFIG+=nosound"

qmake $server_qmake Jamulus.pro
make clean
make

==========================================================

The first set of 46 clients launched okay.

However after about client 28 on the 2nd set, the server stopped reporting and the output on jamulus.softins.co.uk collapsed.

When I did a killall, the server did restore proper operation without having to reboot or restart.

@corrados
Copy link
Contributor Author

Do you run the test clients and the server on the same PC?

@maallyn
Copy link

maallyn commented Sep 12, 2020

To Volker: I have two machines in the cloud in the same data center. One runs the server. The other runs the clients which are run via vncserver. Both are in the Linode Newark data center. If you need to have access and look around, I can install your ssh key in them. Tony already has access.

@corrados
Copy link
Contributor Author

The first set of 46 clients launched okay. [...] However after about client 28 on the 2nd set, the server stopped reporting and the output on jamulus.softins.co.uk collapsed.

That is interesting. On my test today I could run about 70 clients on storellys server and it still worked with good audio quality. The question is why your server hold so much less clients...

@softins
Copy link
Member

softins commented Sep 13, 2020

The first set of 46 clients launched okay. [...] However after about client 28 on the 2nd set, the server stopped reporting and the output on jamulus.softins.co.uk collapsed.

Note that there are limitations with Jamulus Explorer:

  1. It only uses connectionless (CLM_xxx) messages, and these haven't had the split extension applied yet. But in any case, the back-end has no problem with large UDP fragmented messages.
  2. As mentioned in my comment above, once a Jamulus server has more than a certain number of clients connected (61 in the instance I observed), it starts to take exponentially longer to respond to CLM_xxx messages, which causes the Jamulus Explorer back-end to time out before it receives the replies. I could increase the timeout, but that would make it longer to respond for everyone, as it needs to wait until the timeout to decide there are no more replies to be received.

This exponential delay in the Jamulus server responding is a separate issue that will need investigation at some point.

@maallyn
Copy link

maallyn commented Sep 14, 2020

Tony:
If this is the case, then does this explain why the audio degrades after about 60 to 70 client connections? In addition to that, since most Jamulus use cases (jamming with instruments in small groups) does not approach 60 users, the Jamulus, both the client and the server as they are now, is perfectly adequate for the vast majority of cases, including our own worldjam, where even the waiting room rarely even touches 40 clients?

The one major exception would be for choirs and perhaps a large orchestra.

I have been trying to sell this to my choirs, one of which has 50 and the other has130 voices.

However, I am beginning to feel that I should not be attempting to sell my choir members to use a client desktop that looks like a sound mixer and may be intimidating to choir members in my church / chorus who are techno-phobic and just want something simple plug and play to participate, like Zoom. I have already got strong pushback by other members of my Unitarian Fellowship's audio visual and tech committee, which now I am beginning to agree with.

If there is a very simple client, without the faders and the VU meters, would the traffic being handled by the server be less and the issue you are seeing with 60 to 70 plus connections go away?

@softins
Copy link
Member

softins commented Sep 15, 2020

It may be related. I did some tests yesterday between separate client and server machines on my LAN, while monitoring the interface data rates using SNMP. I found the bandwidth usage on a server increased linearly with the number of clients, which makes sense, since it sends and receives one stream to/from each client. I only went up to 21 clients, so wasn't pushing the limits - I would need more client machines than just my Raspberry Pi to really exercise the server.

However, thinking about it, the demands placed on the system by the mixing will increase by the square of the number of clients: with N clients connected, and each client having their own separate mix generated, the server will be producing N mixes, each from N streams. So as the number of clients increases, there will come a point where it quickly degrades and can't keep up. That will depend on the power of the hardware. As I understand it (I'm still trying to fully understand the code structure), there is a high priority thread that handles the audio mixing, and a lower priority thread that is responsible for handling the protocol. I don't yet know how that is affected by the multi-threading enhancements.

I'm not sure that a simpler client would reduce the traffic enough to be significant, when compared to the bandwidth consumed by the actual audio, and the N² factor on the mixing. However, I can certainly see the benefit of such a client for user friendliness, where the users do not need/want to have their own fine control of a large number of participants.

@softins
Copy link
Member

softins commented Sep 15, 2020

Maybe for a choir kind of usage, we need some kind of architecture where there is a generic mix produced, that can be controlled by one person, but that is then sent identically to most of the clients. That would reduce both the mixing load and the technical usability burden on those users. There could still also be clients that have their own custom mix.

But these would be ideas for Jamulus V4.

@maallyn
Copy link

maallyn commented Sep 15, 2020

Tony:
I agree that large choirs should wait for V4.
Considering that, could this ticket be a candidate for V4 and not the current Jamulus? It seems to me that the current use case (jams of small groups) is currently supported well (witness the World Jam) where we have about 40 people max at once (in the waiting rooms) with no problem.

@storeilly
Copy link

Maybe for a choir kind of usage, we need some kind of architecture where there is a generic mix produced, that can be controlled by one person, but that is then sent identically to most of the clients. That would reduce both the mixing load and the technical usability burden on those users. There could still also be clients that have their own custom mix.

But these would be ideas for Jamulus V4.

Even large choirs will need to mix their individual with the "supplied" mix so does that still need n * n mixes? I've been under pressure and haven't had time to contribute to this for the last two weeks, but I am still very much engaged. I don't think we've figured out why the system creaks yet without peaking on the CPU's or network.! I don't think the choirs can wait for V4. My one can't anyhow, we will make something work. Please don't drop this!

@maallyn
Copy link

maallyn commented Sep 16, 2020

I was thinking about a setup where the mix would be done by the choir director. He or she would have the 'normal' Jamulus client with all of the faders. Then, each choir member would have a simple client with just a master volume control. In fact it could be 'pre-provisioned' such that it would 'know' which server to connect to, which would be even better for those choir members who are afraid of technology. There are some in my two choirs who are almost afraid just to turn the computer on.
In addition, since on the choir master would have the mix panel, there would be mix related traffic for that one alone, which I hope would mitigate the issue of too much traffic handled by the server.

@corrados
Copy link
Contributor Author

I don't think the choirs can wait for V4. My one can't anyhow, we will make something work.

Have you tried a fast server hardware with Jamulus multithreading enabled and all musicians use Mono mode? If the musicians do not connect all at the same time (which is usually not that case), the protocol traffic should be low enough for a session even if a lot of clients are connected.

@corrados
Copy link
Contributor Author

Maybe for a choir kind of usage, we need some kind of architecture where there is a generic mix produced, that can be controlled by one person, but that is then sent identically to most of the clients.

Jamulus uses different modes like the "Small network buffers" and "Mono/Stereo". To keep such a proposed implementation simple, would it be an option that in your new mode only large buffers (128) and Mono is allowed? That would make the implementation easier so that the one master client only has to generate and store just one block of OPUS coded audio data.

@kraney
Copy link

kraney commented Sep 19, 2020

I'd just like to chime in that I'm similarly trying to get this going for high school bands during COVID restrictions, and I also would benefit from being able to have a "director-controlled" mix, with all clients receiving the same mix. That potentially eliminates the n^2 problem as well as simplifying the user interface for those that don't need a mix, and possibly reducing the amount of info each client needs to get about the other channels, so smaller messages.

I'm hoping to be able to contribute some help, but I'm just ramping up on the code so I don't have a ton to offer yet.

Just to brainstorm a possible approach, possibly there could be two server UDP ports. One is a "director" port and functions as today. The other is a "member" port and which is included in the mix but simply gets back a replica copy of one of the "director" mixes, verbatim. No custom mix if you're connected on the member port. If more than one person has joined via the director port, possibly a member could select which of those mixes they use. Number of directors could be limited to the old max, like 15 or 20 or whatever, while members could go much higher.

Another option would be to just have one person be the director, and maybe let that person "pass the baton" (if they choose) to a different person, who then becomes the one-and-only director. Everyone else just gets a replica mix. Or you could password-protect the director port. But these options diverge further from the existing mode.

In my case at least, locking to 128-byte buffers would most likely be acceptable. I favor Mono-in/Stereo-out I think, if I understand its function correctly (so that members can be given separation across left-to-right by the director,) but the specific mode would not be a dealbreaker in any case.

@kraney
Copy link

kraney commented Sep 19, 2020

I ran some tests with 40 clients which worked fine, but adding even a few more caused issues with the last clients having no name information and poor audio. It was a pretty predictable line at 40, which led me to suspect it was adding a 3rd thread that caused issues. Also doing math on the ConnClientsList message, that's right around where it would reach MTU size for the packet.

One plausible hypothesis on why you're seeing failure start at different numbers of clients - @maallyn you are setting client names, right? @corrados are you? That message is variable length so having longer client names would get that message to MTU faster. At least in my case I don't think fragments will survive the journey.

@corrados
Copy link
Contributor Author

Just to brainstorm a possible approach, possibly there could be two server UDP ports. One is a "director" port and functions as today.

I do not want to make big changes to Jamulus to support this. Here is my specification which will be quite easy to implement, just adding a new command line argument to the server like --singlemix:

  • No multithreading (since we only have one encoding and mixing so no multithreading needed)
  • Only 128 sample frame size support
  • Only Mono support (gives us the most possible number of connected clients which is what this modification is all about)
  • The first connected client on that server is the "director", all other clients which connect afterwards get his mix. So you just have to make sure that the director is already connecting to the server before your session begins (this requirement should be very easily to be fulfilled).

There is a vecvecbyCodedData buffer which is used for both, encoding and decoding. I'll introduce a separate buffer so that I can re-use the output buffer of the first client for all other clients. So instead of calling MixEncodeTransmitData for all the other clients, they simply get vecChannels[iCurChanID].PrepAndSendPacket ( &Socket, vecvecbyCodedData[iDirectorID], iCeltNumCodedBytes );. I just did a quick hack: If I modify CreateChannelList so that no client is added, the audio mixer panel is just empty. This would be the case for the slave clients. But then they do not see how many clients are currently connected which is not a big issue. If "--singlemix" is given, "-F" and "-T" is deactivated and a warning is shown that these cannot be combined. In the OnNetTranspPropsReceived function we can check that the client uses 128 samples, if not, refuse to connect.

@corrados
Copy link
Contributor Author

I just created a new branch for this: https://github.com/corrados/jamulus/tree/feature_singlemixserver

@corrados
Copy link
Contributor Author

I just created a new Issue for that: #599

@corrados
Copy link
Contributor Author

@kraney You can start testing now. The current version of the code does not implement any checks for correct client settings. So you have to make sure:

  • all clients use mono and also the same quality setting, e.g. "normal"

All clients see the full mixer panel but only the first connected client actually controls the mix. If the other clients move the faders, nothing will happen in the audio mix.

Have fun :-). Feedback welcome.

@corrados
Copy link
Contributor Author

I'll close this Issue now since the original issue caused by UDP packet drops should be solved by the "split protocol messages" fix which is already implemented. The discussion about the singlemix server should be continued in the new Issue I created.

@maallyn
Copy link

maallyn commented Sep 20, 2020

Just to let you know, now that this is closed, I went ahead and destroyed my test server client.allyn.com

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants