Server Maintenance Manual

From Clogopedia, the Natural Selection 2 Wiki
Revision as of 21:01, 31 July 2016 by Keats & Yeats (Talk | contribs) (removed colons in section titles)

Jump to: navigation, search

Template:Protected

Editorial Note

Please keep all comments on the UWE forums post or on the NS2 Server Ops discord.
All information is current as written on 30-07-2016.
All information is verified as written on 10-12-2015.

Please keep a version number with the date included on big changes.

Version 30.07.2016.001

dd.mm.yyyy.change-amount

Credits & Thanks


Intro and info

This is a manual for NS2 server admins (and users who like being informed). Some stuff I wrote myself, other parts I notes down over the years when posted by others. I haven’t kept a list of whom, but if you feel you deserve credit for anything below, let me know.

  • It will handle how you can monitor a NS2 servers performance.
  • It talks about what certain values do and mean, and offer advice.
  • It will help interpret logs.
  • It also handles some client info so you can make sure “it’s not you”.
  • It does NOT handle ‘mere installation’. For installation please look at the Dedicated Server page. (see old wiki, in need of rewrite.)
  • It handles most common issues.
  • Known Dedicated Server Issues are also being noted down. That list is maintained at the Dedicated Server Issues page. (see old wiki, in need of rewrite.)

Server rates

Tickrate

The tickrate controls how often AI units update. The default is 30. It also acts as a upper limit to Sendrate. If the tickrate is dying, something ELSE has been dying long before!

NS2 servers auto adjust rates if under heavy load to ‘try’ to compensate. However, if your server is in a state where it needs to be auto adjusted, you crossed a line in performance (or lack thereof) long ago! It lowers the tickrate to 20 when the thread runs out CPU power. It does this both on Windows and Linux.

Additional dev note: Tickrate is the overal gameworld update tickrate. It is NOT the main loop. It should also not be mistaken for AI only.

Sendrate

How often the server sends updates to the players. Default is 20. Limited by "tickrate -1", server upload or client download, whatever is lower. You don't want to drop the sendrate. You will get an iwarn if a packet is not sent inside 75% of interp interval, so high sendrate is good. Note that effective sendrate is the least of sendrate and tickrate, so you want tickrate to be about 20% higher than sendrate to ensure that sendrate is not compromised." So the -1 rule is the absolute limit, but the suggested difference is 20%.

There is a idle-client throttle. This means an alt-tabbed client which is idle for 2.8 seconds will have its sendrate reduced. (obviously this can create packetloss and redplug on the client until the client is idle again. Its not a issue and nothing to worry about. Note that depending on server size and the client, it can take a few seconds for a tabbed in client to 'recover'.

Bwlimit

How much data a player can receive per second. The default is 25600 bytes per second or 25kB/sec. The normal default is LOW for a 18-24 slot server. Higher is better. 45kB/sec for a 20 slot server is already better but note that this is an indication. The default is, as of build 279, 51200 bytes/sec or 50kB/sec. Depending on the fighting going on your usage may vary. Limited by outgoing server bandwidth and incoming client bandwidth. Hitting bwlimit feels hitchy. However setting it to high and hitting REAL bandwidth limit just chokes. Make sure the outgoing server bandwidth can hold bwlimit * players.

Moverate

How often a player sends movements and actions to the server. The default is 30. It is limited by the value itself or client fps, whatever is lower.

The problem with low mr is that the risk increases that you will not move inside a sendinterval, which is equivalent to dropping a packet. So the risk increases of starting to teleport around on other peoples screen.

mr should be equivalent or more then 1.25 * sendrate.

Interp

The interpolation buffer sets how much clients are ‘behind’ the server. The default is 2/sendrate which equals to 0.1 seconds or 100ms. Lowering this makes for tighter more recent games but gives players with higher ping a disadvantage. Changing this is a bad idea if your server had folk from other continents playing on it also. Note that mods can and will enforce their own limits! NSL mod for example currently (26-10-2015) raises bwlimit. As any other limit strongly depends on the servers own hardware and bandwidth, it’s a VERY bad idea to put them in mod by default! Shine gives admins the option to change these values. Note that NSL mod still overrides!




Rule of Thumb

  • Check performance on a max loaded server.
    • This is what matters. If it cannot hold peak load then your server is not up to the task. (So lower playercount for example.)
  • Keep up to date vanilla copies of your server.
    • Not all mods are updated on patchdays (yet). Outdated mods can break and potentially cause big problems.
    • When so much breaks in terms of mods & you do not have a workshop backup, or it cannot keep up in terms of bandwidth.
    • Testing. Does your problem occur in vanilla?
  • Test with a well running client which has no issues.
    • To exclude any client side issues.
  • Most of the people complaining have client side issues.
    • Stuff can snowball. A client which is normally already struggling will have huge issues when a relative small problem (clientside) hits.
    • Clients update movement frames every x frames. If it’s not getting to that frame due to being slow on rendering the previous frame, client will suffer.
  • Most of the people who complain are misinformed and therefor wrong.
    • This does not mean you should ignore their complaints. Verify it’s not your server and servers connection!
  • While everyone cares about performance, most people do not know how to TRULY check this.
    • So you can see a lot of bad performing servers in use and servers with great performance empty.
  • Mods can, strongly, affect server and/or client performance!
  • Ask around, NICELY!
    • Many in the CDT & PT are willing to provide information (if allowed by NDA).
    • Many server admins may have had the same issues you have, or know something you do not.
  • Running the hacked executable to boost playercount can and will cause unique problems.
    • Although in honesty, players do not care usually.
  • Most game mods, Faded-SkulksWithShotguns-etc, do not play nice with other mods like shine.
    • Ask the mod author(s) if the current version works with your preferred mods, or run the server mostly vanilla.
  • If you host multiple servers, try to use 1 map as your default coldboot map.
    • This makes it easier to see if a server cold booted.
  • If you host multiple servers and want to share SOME files but not others, you can use symbolic links. See internet/your Operating Systems manual for more info.
  • Use a json validator like http://jsonlint.com/
  • As a rough rule of thumb, a player uses about 20kb/sec or 200kbit, so each Mbit should support 5 players max... say half that to avoid peak loads, so 10 Mbit supports about 25 players and 100Mbit should support 250 players.




Server/Client Perfmon Lines

Note the command to activate this is 'perfmon'.

  • Type it once to enable.
  • Twice to enable detailed mode. (preferred)
  • Three times to enable detailed spam mode. (in case the problem is not showing nonstop.)
  • Four times to deactivate.

Note activating this in a client also puts the same info in the server log. It will use the client perfmon's format.


Order of Info & Meaning:

Server Performance  is green.
Score, anything above 20 is good, higher is better. Basically idle_cpu minus a Bad Thing score based on interp warnings/fails
Quality Score, servers in use have a better quality score. Empty servers have nothing to log so score quality drops.
idle cpu. How much time the server spends waiting on incoming packets. This does NOT account for a CPUs interrupts. Account for a 30% minimum buffer!
Movement Calulations on CPU %
Movement Count
12 players which send an average of 19.9 moves per player.. the default is 20, should not be lower. (else someone could experience a choke for example)
ms it took on average, should be as LOW as possible else move calc takes too long.
Cpu usage in % for entities.
Entity Count
Ms needed on average, should be as low as can be.
tickrate: should be 30 by default.
overload compensation; ms extra between ticks
interp warning, a client received a network update after 0.75 * interp interval. Risks hitching on client due to extrapolation
interp fail, a client received a network update after 1* interp interval. Guaranteed to generate bad client hitch

Now the above list is displayed below from left to right.

 Perf green: Score 66,  Q 2,  idle 70.7%,  mv 20.2%( cnt 239( 12/19.9),  avg 0.85ms),  ent 4.3%( cnt 219,  avg 0.01ms),  tick 30.3 ( over 0.0ms),  iWarn 0,  iFail 0

Values replaced by ‘name’:

 Performance colour  :    Score  ,    Score quality  ,    idle CPU  ,    move CPU   (   count   (   players/moves per player received default 20  ),    average ms  ),
 entities CPU   ( count  ,    average  ),    tickrate   (   duration  ),    interp warn  ,    interp fail

CPU Usage

Idle is how much time the main thread spends waiting for new moves to process, so it is independent of number of cores. A server uses one thread for game processing and one thread for producing network updates. As a rule of thumb, the networking thread uses about about 1% cpu/player at 20 updaterate. Other threads don't use much cpu at all.




Verbose Server Logging perfmon Lines

Note the server shows this info if you are logging in verbose 3. No command is needed.


Order of Info:

timestamp,
durationMs,
moverate,
interpMs,
tickrate,
sendrate,
maxPlayers,
score,
quality,
numPlayers,
updateIntervalMs,
incompleteCount * updateIntervalMs / 1000,
numEntitiesUpdated / (tickrate * dur),
timeSpentOnUpdate / (10 * dur),
movesProcessed / dur,
timeSpentOnMoves / (10 * dur),
timeSpentIdling / (10 * dur),	This does NOT account for a CPUs interrupts. Account for a 30% minimum buffer!
numInterpWarns / dur,
numInterpFails / dur,
bytesSent / (1024.0 * dur),
clearingTimeMs10:  worst theoretical clearing time for a 10Mbit upload. Clearing time is how long time it takes the server to clear the send buffers. 
A 10Mbit is about 1Mbyte, so each kb sent takes 1ms. The server pushes new updates every 50 ms, so if a network update exceeds 50kb, it will actually delay the next network update.  
clearingTimeMs50: as above but for 50Mbit,
clearingTimeMs100: as above but for 100Mbit

Now the above list is displayed below from left to right.

 [12461.807/30085](mr 30, ip 100, tr 30, sr 20, mp 20) Score 53, Q 87, np 20, ui 33(0) [ up(536/9%), mv (567/27%), id 60% ] [ iw 0, if 0 ], net 162.6(19/3/1)

Values replaced by ‘name’:

[   Timestamp  /   DurationMs  ](   MoveRate  ,    Interp  ,    TickRate  ,    SendRate  ,    MaxPlayers  )    Score   ,    QualityOfScore  ,    NumberOfPlayers  ,    UpdateIntervalMs  
(   incompleteCount*updateIntervalMs/1000  ) [    numEntitiesUpdated/(tickrate*dur)  /   timeSpentOnUpdate/(10*dur)  ),    movesProcessed/dur  /   timeSpentOnMoves/(10*dur)  ),   
 timeSpentIdling / (10 * dur)  ] [    InterpWarns/dur  ,    InterpFails/dur   ],    bytesSent/(1024.0*dur)  (   ClearingTimeMs10  /   ClearingTimeMs50  /   ClearingTimeMs100  )




Perfbrowser

Shows in the NS2 server browser. The score is calculated as "%idle - BadThingsHappeningToYourPlayers". The only way to get listed as a Bad server is if the server consistently does BadThings to its players. And the only bad thing that counts is causing players to get an interpretation buffer overrun. Players experience an interpretation buffer overrun as teleporting, rubberbanding and loss of avatar control - basically, the game becomes impossible to enjoy.


In normal settings, the interp is set to 0.1 or 100ms, and normally the server sends out a network update every 50ms. So it is only if the server ever gets so busy that it is unable to send an update inside 100ms that it gets a really bad score (we give a small negative score for exceeding 75% of the interp buffer as well).


The server inner loop tries hard to maintain the sendrate, and it takes a concerted effort (or rather, very optimistic settings) to make it break it.

And with averaging and worst-case discarding, a server will only be in the Bad group if it consistently gives its players a bad game.

The server will maintain a performance history - basically, the worst 30-second periods in recent history - and use this to update the master server performance information.


For reference, this is how the browser interprets the performance score (a value ranging from -99 to 100):

  • Higher than 20: Good
  • Lower than 20: OK
  • Lower than 0: Loaded
  • Lower than -20: Bad

Note that a ‘Good’ score is no guarantee that no player ever received a interp fail. Higher is still better!




Rapid Release Cycle

Rapid Development

Up till now NS2 (build 284), ns2 has mainly used big patches. UWE is slowly switching over to faster, smaller patches. Big slow patches have a low update frequency and a high failure chance. Small fast patches have a high update frequency and a low failure chance.

Now what does this mean?

  • A big patch has more time to be tested, but also more lines of code which got changed or added. Weird stuff happens, things break.

If something breaks, it usually is harder to get a update rolling to fix the problems.

  • a Small patch has less time to be tested, but also less lines of code, so less which goes wrong.

So while you frequently get updates, the chance of a update breaking anything is smaller. If a update breaks something, you have a new update coming in soon because your development cycle is already setup for fast release.

For server ops being concerned that a fast patch cycle does not give them time to check their servers for problems after each patch, remember there is less to break. And if a mod updates while you are asleep it can also break your server. Best to accept it and move on.

Extensions

As part of the rapid development cycle and its frequent release of patches, UWE has added Extensions.

  • A extension is a type of official mod.
  • They will try to limit themselves to 3 extensions in total.
  • Extensions will get merged with the upcoming builds to free up 'room' to add new extensions. (As in they will try to keep to those 3 extensions.)
  • A Google doc by UWE on extensions can be found here: https://docs.google.com/document/d/1Vo1zmIhxVbt51MD9MeZWJ52PxK6dfsppMsFmtCvsAxU/edit
  • Remember that this loads before all your mods! It loads EARLY.
  • Its NOT optional.

An example of a extension, in this case a hotfix is below.

[15:07:44][UWEHotfix] Applying fix for tunnels not opening

Extensions Google doc copy as of 10 February 2016

Extensions
Introduction
As part of the rapid changes “initiative”, in order to refine NS2 in an iterative manner, the need to promote changes quickly and efficiently arose.
Stemming from a byproduct of this necessity a new pipeline was added to the current NS2 Build System: Extensions.

This new Build Pipeline allows UWE the means to implement changes without the need for “traditional” Steam releases. Utilizing this new system, obviously, does not come without a set of caveats.
Many steps will be taken in order to mitigate the impact of this change.

Technical Overview
When a Server is first started the main process will perform a HTTP (port 80) call to a remote host. This remote host will return a simple space-delimited list of all Active Extensions.
The HTTP call to the Extensions listing server, will terminate automatically if the host is unreachable or unavailable. The maximum amount of time this terminate action will be no more than 10 seconds.
Typically, the extensions listing server will respond in less than 450ms. The amount of data transferred will be very small, generally no more than 250 to 320 bytes per request. 

This HTTP-call is required to be a blocking process, so it will add to a Server’s startup and map-change time (A maximum of 10 seconds but typically less than 1s).
Servers will cache (stored in memory) the retrieved list of extensions. This action is repeated on all map changes. All extensions will appear normally when Clients are connection to a Server.
The listing server is hosted on Amazon’s EC2 service.

When the ServerWorld is started, it parses the retrieved extensions-list and inserts them into the mods store.
They are always inserted at the head of the mounted mods, and as a result will always be mounted before any admin-defined mods (regardless of MapCycle or cmd-line arguments).

Since the extensions use the existing mod functionality, this means the same historical issues can arise with them (i.e. Steamworks is down for whatever reason and data can’t be downloaded).
As a result, this system will have a dedicated global backup server for all extensions it delivers. The backup server is hosted on Amazon’s EC2 service.
It will only serve files specific to extensions, and will not act as a backup for all files NS2 uses.
 Server Maintenance Manual Editorial note: More on workshop backup servers in the upcoming chapter(s)

On a final Note, UWE will take great care to mitigate potential impact on existing mods.
Between individual developer’s tests and the playtesting process, we will strive for minimal impact of this new functionality.
Furthermore, this system will only ever deliver Lua-oriented changes and will never push any binaries.

Thank you!
McGlaspie - brock@naturalselection2.com




Workshop Backup Server

NS2 uses mods. The same goes for NS2 servers. There is almost no server out there which runs vanilla. Sadly NS2 is not high on the steam priority list, sort of speak, when it comes down to the workshop. If the workshop has any issues, like with steam sales, issues can arise.

It is STRONGLY advised to run or use one. If you use one, be sure that server host allows its usage! (as in be nice and don’t use up all their bandwidth.)


Issues:

  • Client cannot download mod.
  • Server cannot download mod.
  • Client mod is outdated, cannot download new mod. (crash during nsl match > no reconnect & no chance to mapchange!)

The workshop backup server can be run on both linux and windows. It runs, in essence, a simple python script. You can get your most recent workshop backup server download here: https://github.com/GhoulofGSG9/NS2_WorkshopBackup


It has NO bandwidth limiter! So be sure to keep that in mind. If you have somewhat limited bandwidth, use it as a mere backup, not as the first source for mod downloads.


Its advised to run the script out of a servers mod folder. This allows the backup server to zip the modfolder and offer it for download. This matters if the backup server does not have the mod and cannot reach the workshop also.

For this reason run it out of your most busy servers modfolder. That way it always has the mods the server thinks is most recent, allowing players to connect.

As it keeps a mod history, if a mod gets updated during a match & someone who disconnects accidental gets updated, the person can now still connect with workshop backup.




Logs and their errors

Note some of the info below is only shown if you use verbose logging!

Please write down causes in pink, if needed. If you do not know how, please give someone who does a poke on forums or Discord.

To differentiate between a new log example, or the the same log example continuing on another line, please start any log example with [LE]. (As I strongly doubt that shows up in logs any time soon.) Ideas how to handle this 'better' in the future on the wiki are most welcome. I hope many admins well help make the error section grow.

Vanilla Benign

[LE][  0.092] Main : Error: PhysX: Invalid Cuda context!
[LE][  4.453] Worker 13 : Error: Unable to open 'models/effects/frag_metal_01.model' (usage 0x1)
[LE][  4.453] Worker 13 : Error: Couldn't open file 'models/effects/frag_metal_01.model'
[LE][ MainThread : Error : Attempting to sync an invalid collision rep! (Has something to do with the network protocal and the spectator not yet being synced with the
person you start to spectate upon death.)

On server shutdown:

[LE][36034.691] Main : Error: PhysX: Foundation destruction failed due to pending module references. Close/release all depending modules first.
[LE][36034.692] Main : Error: 10 memory leaks in 'PhysX' (2288 bytes)
[LE][36034.692] Main : Error: 10 memory leaks in 'Physics' (2288 bytes)
[LE][36034.692] Main : Error: 10 memory leaks in 'Engine' (2288 bytes)

IF a user connects:

[LE]TAW,O,- DCDarkling connected.
[LE]Client underflowed time credit (384.90 ms for move)
[LE]Client sent a move too far in the past (782799.94 ms behind)
[LE]Too many moves to process in a frame (6 allowed, 6 run), discarding move
[LE]Client sent a move in the future (628.42 ms ahead)

Mod Benign

[LE]Warning: The message ShineCustomVote was already hooked, old hook will be replaced
[LE]Warning: The message VoteChangeMap was already hooked, old hook will be replaced
[LE]Warning: The message VoteRandomizeRR was already hooked, old hook will be replaced
[LE]Warning: The message VotingForceEvenTeams was already hooked, old hook will be replaced
[LE]Warning: The message VoteKickPlayer was already hooked, old hook will be replaced
[LE]Warning: The message VoteResetGame was already hooked, old hook will be replaced

But I run custom rates!:

[LE]Server tickrate 30, client sendrate 20, bandwidth limit per player 25600 (Your mod has not loaded YET.)

Average (annoying)

[LE][  0.116] Main : Invalid history slot 3 cleared (pos 392): [  0.000/1967325184](mr 1966080, ip 6553600, tr 1966080, sr 1310720, mp 1310720) Score 16581, Q 0, np 0, 
ui 2162688(0) [ up(0/7%), mv (0/0%), id 80% ]  [ iw 0, if 0 ], net 0.0(0/0/0) (Delete your corrupt servers perfhist.bin)
[LE][  0.116] Main : Invalid history slot 4 cleared (pos 488): [ -0.000/1968504832](mr 1966080, ip 6553600, tr 1966080, sr 1310720, mp 1310720) Score 16500, Q 0, np 0, 
ui 2162688(0) [ up(-0/6%), mv (0/0%), id 82% ]  [ iw 0, if 0 ], net 0.0(0/0/0) (Delete your corrupt servers perfhist.bin)
[LE][  0.116] Main : Invalid history slot 5 cleared (pos 584): [ -0.000/1967718400](mr 1966080, ip 6553600, tr 1966080, sr 1310720, mp 1310720) Score 16515, Q 0, np 0, 
ui 2162688(0) [ up(-0/6%), mv (0/0%), id 82% ]  [ iw 0, if 0 ], net 0.0(0/0/0) (Delete your corrupt servers perfhist.bin)

Important (Users will suffer)

[LE][4059.527] Main : Error: Failed to add an obstacle at 5.05, -92.33, 144.96!
[LE]Client sent a move too far in the past (367.68 ms behind)
[LE]Too many moves to process in a frame (6 allowed, 6 run), discarding move
[LE]Network variable 'animationBlend' of class 'Ragdoll' has value 43630175829739231000000000000000000000.000000 which is outside the range 0.000000 to 1.000000
[LE]Network variable 'animationBlend' of class 'Ragdoll' has value 43630175829739231000000000000000000000.000000 which is outside the range 0.000000 to 1.000000
[LE]Client sent a move in the future (628.42 ms ahead)
[LE]Network variable 'posx' of class 'Damage' has value -1.#IND00 which is outside the range -1638.000000 to 1638.000000
[LE]Network variable 'posy' of class 'Damage' has value -1.#IND00 which is outside the range -1638.000000 to 1638.000000
[LE]Network variable 'posz' of class 'Damage' has value -1.#IND00 which is outside the range -1638.000000 to 1638.000000
[LE]Network variable 'posx' of class 'Damage' has value -1.#IND00 which is outside the range -1638.000000 to 1638.000000
[LE]Network variable 'posy' of class 'Damage' has value -1.#IND00 which is outside the range -1638.000000 to 1638.000000
[LE]Network variable 'posz' of class 'Damage' has value -1.#IND00 which is outside the range -1638.000000 to 1638.000000
[LE]Snapshot error ([5379.989] Main : Error: Exceeded maximum number of snapshots)


Critical (How is this server still up?)

nothing yet




Bandwidth and server load in general

Ifails, Iwarns and their penalty

Where does the GetNumInterpFails/Warns come from? If a client is experiencing interp failures due to something not directly related to the server,
for example a bad network at the clients end, does that increase the counters, and thus the performance score penalty?
It comes from how long the interval was between two network updates, compared to the interp variable. Warns is for > 75% of interp,
Fails is for > 100% of interp buffer.

Network latency or client problem does not figure into it, so internet problems will not affect the server performance score. 

raw tickrate vs tickrate

When viewing the raw tickrate (as shown as the last value in perf row with perfbrowser enabled),
it appears that servers with currently bad performance shows their ticks (lower than 30), but every other server show some value often higher than 100. What's that?

It it because what is show as tickrate is actually number of times the server goes through the main loop -
a lightly loaded server waits 5ms on the network socket before doing a loop, so it tops out at 200 "ticks" a second. It will only do a "proper" tick, ie update the AI enties every 33 ms.

When a server becomes loaded, it will spend more and more time processing player moves and loop around fewer times.

Bandwidth

So what about bandwidth?
Required bandwidth goes up by about the square of the number of players ... at a VERY rough guess,
you can probably fit 16 players fits inside a 10Mbit line... 32 players would be about 40Mbit or so, but I would definitely recommend a 100Mbit line for anyone running > 24 players.


If the server has a ‘BAD’ line:
max bwlimit in KB/s = (max server bandwith in KB/s* 0.8 - 200 KB/s) / playercount
If the server has a ‘GOOD’ line:
max bwlimit in KB/s = (max server bandwith in KB/s* 0.9 - 200 KB/s) / playercount

which makes it for 18Mb on a good line with 20 players:
(2304*0.9-200)/20 = 93.68


High Resolution Timer

My Windows server has a idling CPU, 3 players orso are on it, but the tickrate is dying. What is going on?
You probably experience a timer issue. Often known as a problem with the High Resolution Timer,
windows can need a little nudge to keep going.
Opening Windows Media Player, which forces the timer, will solve this issue.
Running a small service like the one from Brainless also solves the issue; http://www.brainless.us/downloads.aspx?did=5
More info on it here: https://msdn.microsoft.com/en-us/library/aa964692%28v=vs.85%29.aspx

Snapshots

At the moment snapshot history on servers is 3. (It used to be 2.) Why is this?
More snapshots slightly increase memory usage. But it also means a client can go up to 3 seconds without responding before it needs to be send a full snapshot.
So it increases the window where the server can recover a user with a delta snapshot. Editorial note: a snapshot with just changes.
So it should decrease bandwidth usage in exchange of memory, server side.

'Unlimited' sizes servers

So I can run 99 slot servers now?
As of the latest few patches, slot size has been unlocked. This simply means the old restriction on 24 slots max has been lifted.
Do note however that a increase in slow size STILL increases the resource consumption. (See the whole guide for more details.)
DEVs currently advice not going about 48 total slot size. This is INCLUDING spectator slots.
They stated the following:
Due to how the networking layer is managed, and (primarily) the memory allocators are setup...anything above that will cause problems. It's not a matter of "if", but rather "when" it will cause issues.

Servers and OS users

Do I need to run every NS2 server instance on its own user?
DEVs recently stated that due to engine code, running one user per instance is sort of a requirement.
There are many lines of code which do not take multi-instance usage into account for IO operations.
This matters for the mod service, for example. 
They are looking into it, to fix this for the future.

Consistency

The consistency file decides what files the server checks against hashes the client supplies. It’s not truly an anti-cheating mechanic, but more a way to make sure the well meaning client has no different files. (Cheats could just lie about what they use as the server uses info the client supplies.)


NS2 has a building default consistency. To use your own, you need to not only make a consistency json file, but also enable it in your servers config.

Certain mods, like NSL mod, supply their own consistency. This overwrites both vanilla NS2 and your own custom config.


If the server is not done caching mods when you load into the server, you use the servers consistency instead. So it is perfectly doable upon mapchange, on an official or compliant nsl server, to suddenly have a different consistency check then the nsl mod supplies.

Server admins can set their own consistency to one identical to NSLs, but like described it’s not an anti-cheat mechanic.




Netstats

You use netstats to either grab some network info from the connection to the server, or you let a client with issues check if its not them.

Below is an example of how the “net_stats 1” command looks on a client. The one below is of a actual client just connected to a server, hence the slight loss.

119.62 fps (avg 8 ms, bad > 18 ms)
Tick 30, idle 88.9%, move 2.5%(0.93ms/move), ent 5.6%, upload 3.2kb/sec
Ping: 88 ms
Average Sent: 2.53 Kb/s (loss 0.5%)
Average Recv: 3.63 Kb/s (loss 0.0%)
OutOfOrder : 0
Messages Sent: 0.00 per second
Messages Recv: 1.00 per second
Updates Recv: 19.93 per second
Server Had Error: false
Choke: 0.000000%
Prediction: 5 frames

What does it mean?

Current fps (average time in ms, bad frame highest time in ms)
Server actual tickrate, idle cpu %, move calc % (ms per move), entity calc in %, upload kb/sec
Client latency in ms
Average send in Kb/s (loss in %)
Average received in Kb/s (loss in %)
OutOfOrder : Total number of packets that have arrived out of order.
Messages sent per second
Messages received per second
Updates received per second. Yes this should match server sendrate!
If the server had any errors both ‘critical errors‘ and ‘of almost no importance’.
Client choke in %.
Client prediction. The higher, the less actual info is received and the more your client has to guess.




Net Log

You can enable network logging on both client and server by the net_log command.

// logging for network is a bitfield
// Some logging requires multiple bits to be set; LOG_TRACE is 

required to triggers any LogHeaders 
#define LOG_TRACE_BIT   0x20

#define LOG_MISC (m_verbose &   0x01)
#define LOG_DET (m_verbose &   0x02)
#define LOG_UREL (m_verbose &   0x04)
#define LOG_UREL_S (m_verbose &   0x08)
#define LOG_REL (m_verbose &   0x10)
#define LOG_TRACE (m_verbose & LOG_TRACE_BIT)0x20)
#define LOG_ACK (m_verbose &   0x40)
#define LOG_ANY (m_verbose !=   0)

#define LOG_PING (*m_verbose &   0x100)
#define LOG_PING_TRACE (*m_verbose &   0x200)
#define LOG_PING_QUAL (*m_verbose &   0x400)

These values are in HEX Decimal. They can also be mixed. What does this mean? Lets take the following example line:

 #define LOG_PING_QUAL (*m_verbose & 0x400)
400 hex is 1024 decimal, so the command becomes net_log 1024.

To include both PING_QUAL and PING_TRACE we mix those up.

#define LOG_PING_TRACE (*m_verbose & 0x200)
#define LOG_PING_QUAL (*m_verbose & 0x400)
400 hex is 1024 decimal.
200 hex is 512 decimal.
Both combined are 1536 decimal, so the command becomes net_log 1536.




The p_logall command

Both on clients and servers you can log all performance stuff which is happening in a .plog file. For servers its not advised to run this command for a long time. It will both drop performance due to the intensive logging and create a large large plog. Of course the bigger the server, the bigger the possible load while logging. The suggested max duration is around 5 minutes. Plogs are stored in the same location as your logfiles, like log-server.txt.

Starting p_logall

To start logging use the command sv_p_logall</span. You can also do it through rcon commands. Remember that both shine and ns2 can be configured for rcon. The command for this with shine is sh_rcon sv_p_logall.

Note that the client command is of course without the sv_ part so p_logall.

Stopping p_logall

You can either enter the stop command or simply change map. To stop logging use the command sv_p_endlog</span. You can also do it through rcon commands. Remember that both shine and ns2 can be configured for rcon. The command for this with shine is sh_rcon sv_p_endlog.

Note that the client command is of course without the sv_ part so p_endlog.




Client r_stats

Let the client use this to exclude the client from having issues.

Below is an example of the output of the “r_stats 1” command on a client.

 119.69 fps (avg 8 ms, bad > 18 ms)
 3[ 4] ms waiting for GPU
 1[ 1] ms waiting for render thread
 0[ 0] ms waiting for world update thread
 415 draw calls, 99841 primitives
 1144 MB virtual video memory free
 94 MB video memory allocated for render targets
 372 MB video memory used for textures (372 MB allocated)
 0 textures loaded last sec, 0 unloaded, max queuesize 0
 51 lights visible
 0 shadow maps generated
 115 models visible
 18 meshes visible
 0 particle emitters visible
 0 light probes generated


 Current fps (average time in ms, bad frame highest time in ms)
 GPU bottleneck in ms.
 Render/CPU bottleneck in Ms
 World bottleneck in Ms. Think processing movement info and the likes.
 Note for the above bottlenecks that stuff outside NS2 can influence these also, like a antivirus scanning every file before it can load.
 VERY simply said, stuff being drawn. And yes this is an oversimplification. Another oversimplification, triangles and polygons.
 The virtual memory you have available. May not always be real VRAM!
 Texture info, obviously using close to your real VRAM will cause problems. Same for high queuesize.
 Absurd light count can affect performance, will vary depending on GPU.
 Shadows can affect performance, will vary depending on GPU.
 Affects performance varying on GPU.




Connecting to the server by links

For links in external programs like teamspeak. Below are example IPs and ports.


To connect with a opened or closed NS2.

Connects via the steam master browser and not NS2. This type of link uses the NS2 servers queryport.

At the moment, 11th of December 2015, this method seems to work rather poorly. It suddenly stopped working since a few months. 
A inquire with the CDT/DEVS does not seem to point at NS2.  The ticket with valve is still pending after months.

steam://connect/DNSorIPHere:QueryPort/PasswordIfAny

To connect with a closed NS2 directly.

Connects through NS2 itself. It does not use the masterbrowser to query for playercount or if the server is up. It’s a blind connect. It uses the gameport.

It’s a bit crashy. It will open the app NS2 (which is 4920) and run the command in the console. It also asks for confirmation. 
Note that depending on the app you place the link in, like firefox, teamspeak,  website, it will require you to use stuff like %20 instead of spaces.

steam://run/4920//“connect%20DNSOrIP:GamePort%20PassWordIfAny”




Modlist and mapcycle

It is best to load all your compatible mods on server hardboot by putting them in the command line. The server will do its best to update them all as much as possible. Obviously do not mix gamemods with gamemods or with other incompatible mods.

Your mapcycle should be configured to load as few mods as possible per map.

You can load any mod on a per map basis, not just mapmods. Do not put such mods in the normal list! Such a situation however is only needed in certain situations like a mapmod with a dependency.


Do not edit mapcyle or add mods by the NS2 server browser. It is crap and messes up stuff.


{
	"maps": 
	[ 
		"ns2_summit",				 << normal official map.
		"ns2_eclipse",				 << normal official map.

		{
			"mods": 			
			[
				"63c559c"		 << Mod ID of custom map
			],
			"map": "ns2_caged"		 << Custom mod belonging to id
		},

		{
			"mods": 
			[
				"136d050a"
			],
			"map": "ns2_forgotten"

		}					 << Note the lack of a comma on the last one.
	],
	"time": 999,
	"mode": "order",
	"mods": 
	[
		"812f004",				 << normal non mapmods go here
		"706d242",
		"c6fbbb0"				 << Note the lack of a comma on the last one.
	]
}




Steam IDs

For adding users to your server for access to commands. NS2 does not use the usual SteamID format. If you have your old SteamID (from the 'status' command in a game like TF2), or from their profile page url.

You can convert it to a NS2 style SteamID using the following formulas:

STEAM_0:0:XXXXX :(SteamID * 2)
STEAM_0:1:XXXXX :(SteamID * 2 + 1)

Example:

STEAM_0:0:919317: 919317 * 2 = 1838634 
STEAM_0:1:919317: 919317 * 2 + 1 = 1838635

If you have difficulties, here is a working converter: http://steamid.org/ Just enter your SteamID, the 32-bit Steam Community ID of the output is your NS2.

Alternatively, the nsl keeps accurate IDs of their members. These can be found on their site. http://www.ensl.org/

STEAM64ID

NS2 uses the steam32id. However if need be you can create that from a 64 one.

STEAMID64 - 76561197960265728 = STEAMID32




Hive info

For the ELO like tracking hive system.

Whitelisting

  • Servers get whitelisted every 2 months manually by UWE.
  • Servers with game modifications which do NOT declare them as such do not get whitelisted. (Like faded mode says its a gamemod and thereby obeys the rule)
    • They in fact get blacklisted.
  • To disable whitelisting on hive go to your ServerConfig.json and set hiveranking to false.
    "hiveranking":false,

Rookie Servers

Rookie only servers are, as the name implies, for rookies only.

  • Bots are added and removed upon need and are mandatory.
  • Non rookies can not join.
  • Information for this is pulled from hive.
  • Some, but few, variables for Rookie only including rookie only itself can be set in your ServerConfig.json.
  • The "rookie tag is deprecated. It is no longer used.

Play Now

  • Play Now puts players on servers depending on a variety of info.
    • Server performance. More is better.
    • As equal skill as possible.
    • Within optimal player count.
  • It is on for servers by default.
  • It can be disabled by setting the taag ignore_playnow




Windows DMP Quick Info

Windows will make a memory dump file, or .dmp file when it crashes. Also known as the BSOD. Windows will also make a memory dump file, or .dmp file, when a program crashes. For this manual we shall focus on this type.

Devs can use memory dumps to track down problems in much more detail. Memory dumps can be either minidumps, with minimum amount of info but small in size, or full dumps. For full dumps the applications whole memory is dumped to file. (There are some in-between versions we shall leave out of this manual.) Due to needing enough memory for NS2 servers to begin with, we will assume you are running 64bit OS. If you are not, go run 64bit.

Dump a running process by Task Manager

Make a 32bit dump by opening C:\Windows\SysWOW64\taskmgr.exe. Right-click the process and make a dump file. You can not select the location of the dump file. It shall be announced in the popup.

Make a 64bit dump by opening C:\Windows\System32\taskmgr.exe, or by selecting the Task manager through the taskbar. Right-click the process and make a dump file. You can not select the location of the dump file. It shall be announced in the popup.

ProcDump JIT & Live dumps

Procdump is a tool from http://www.sysinternals.com (which redirects to MS Technet). It does not need to be installed and is easy to use. It can make 32b and 64b dumps. It can be enabled as a JIT handler (aka when something crashes) or monitor a live app for unhandled exceptions.

JIT handler:

C:\tools\Procdump\procdump.exe -ma -i C:\AppDumps
Procmon location Fulldump JIT location to place dmp

Please refer to its site for further info.

The following command opens ProcDump and makes it monitor ns2.exe live. It will make a dump when it sees a unhandled exception. It will make a full dump. (Please refer to its site for all variables.)

C:\tools\Procdump\procdump.exe -ma -e ns2.exe C:\AppDumps
Location of procmon full dump unhandled exception process to live monitor place to store dumps

Note you can only monitor one ns2.exe. If you have more you need the PID or another variable! This can for example also be CPU usage.




Detailed rates info

Starters

  • tickrate; controls how often the AI units update (normally 30) and acts as an upper limit to sendrate
  • sendrate; controls how often players gets updates sent to them (normally 20)
  • bwlimit: bandwidth-limit, how much data a player can receive per second (normally 25600 bytes / sec (25kB/sec))


These interact with two other variables that has been available for some time

  • mr; move rate; how often a player sends his movements and actions to the server (standard is 30)
  • interp; interpretation buffer; how much behind the server the client will run (standard is 2/sendrate or 0.1 sec (100ms))


What you want is a server that runs well enough to handle endgames, while still having a high move rate and low interp.


  • interp rough rule of thumb is 2/sendrate - or 0.1 sec for the standard 20 sendrate. This allows you to compensate for one packet lost and still stay inside the interp buffer. Less interp means that the game feels tighter.
  • mr, move rate is how often the players move is recorded. Increasing the moverate will lower the delay between when a user does something and when it actually happens. However, note that doubling the standard 30 moverate to 60 will only lower your average input delay from 16.5ms to 8.25 - or less than 10ms.


Major areas

There are two major areas here.

bwlimit

The easy thing first; bwlimit (bandwidth limit). The limit has two effects; it avoid the server overrunning the outgoing internet pipe, and helps protect the client from getting his incoming pipe overrun. That said, you should NOT be hitting the bwlimit while playing the game - if you do, the game will feel really awful, hitchy and players will not feel in control of their avatar.

If you send more data than fits the pipe, packets will either be dropped or queued up - the second one is actually much worse than the first one from the POV of a real-time game; a delayed packet is worse than a dropped one, because the delayed packet eats up bandwidth for information that are out of date and useless anyhow.

So make sure that your outgoing pipe can handle the bwlimit * numplayers.

The incoming pipe is less of a problem these days. 25kB/sec is about what a 256kbit line can handle ... which was a fair amount for home connection back in 2005. These days, 1Mbit download is considered to be pretty low-rate connection, and that's 100kB/sec (or bwlimit 100000).

How much bandwidth do you need? Well .. we don't quite know. What we do know is the worst case for a 24 player servers causes some choke to show up sometimes. So for 24 players and 20 sendrate, 25kB/sec seems to be borderline.


As a rough rule of thumb:

  • Doubling the sendrate means you will need double the bandwidth, pretty much.
  • Doubling the number of players means doubling the bandwidth


In practice though, if your outgoing line can handle it, we'd recommend to set bwlimit to 100000 (100kB/sec).

There is ONE big thing about bwlimit though - on mapchanges, the server pushes a big chunk of data to the clients, and during that push, it WILL use up the bwlimit. If you set bwlimit too high, you WILL kill your outgoing pipe and/or the client pipe. This was actually accidentally tested during 267 development. For those with a puny 1Mbit download line and a bit of latency, the record for slow mapchange was 17 minutes. While we fixed some protocol issues that aggravated the situation, we would still recommend not going over 100kB/sec - it should be enough.

CPU & workload

The second limit is CPU. The server is pretty CPU hungry as is, and increasing mr, tickrate and (to a lesser degree, sendrate) will increase the load.


The workload workload; work is (mr * number of players) + (tickrate * number of entities on the map) + (sendrate * number of players). Most of networking is done in second thread, so two cores are advisable if you have >20 players (a 24 player server can use about 10-20% of a core to deal with a 20 sendrate server).


Now, the the exact cost to handle one player move, or how much a tick costs varies over time. Unfortunately, the highest cost per move and tick both happens at the same time - when the game peaks with the large climactic end game battles.

THAT is when people need the smoothest running server, with no hitching or stutter. Letting your players down by giving them a shitty end to a good early/mid game is just bad style, and arguable bad for NS2.


YOU MUST SCALE YOUR GAME TO WHEN IT MATTERS - DURING ENDGAME BATTLES.


That means getting in there during the endgame and watch the server tick rate. If the tick rate is good 90% of the time but plummets during the endgame, you don't have enough CPU to serve your players.

Better tools to let players and admins check how loaded the server is on the wishlist for 268+.


So, how can you best spend your CPU?

  • lower your tickrate. Yes, you heard right. LOWER it. The tickrate only controls how often AI units updates, so you can save a bit of CPU by lowering it - player movement and actions are not affected by tickrate (this differs from Source, bw).

Do note that tickrate works as a limiter on sendrate; the clients will only be updated after a tick has passed, so the effective sendrate is min(sendrate, tickrate).

From a practical POV, the 267 Server browser does not know that you have intentionally lowered your tickrate, so lowering to 20-25 will show your server as having bad performance... so you probably don't want to lower the tickrate until the ServerBrowser knows about configurable tickrates.

Also, running a lower ticks _should_ work - but it has not been tested.

  • increase sendrate to == tickrate is probably the best way to spend CPU; especially if you are running on two cores. That will allow you to lower interp, which is good. (Notice that 267 has a small bug there, it uses "<" rather than "<=" in the sanity comparison, so you are probably required to keep it 1 less than tickrate or even lower).
  • make sure moverate > 1.25 * sendrate. Otherwise you risk sending out client updates that do not contain a move for a player, which is a little bad. Not fatal, just a bit of a waste.
  • the client can't generate more moves than his framerate, so the mr setting just sets the max number of moves.


The most cost-ineffective way of spending CPU is to increase moverate. Most of the CPU are spent updating moves, so if you increase mr from 30 to 40, you probably will use up about 15-20% more CPU. For a 5ms decrease in input latency. Having your server tank because you pay 20% more CPU for something noone can notice is NOT a good idea.

However, you do need to increase mr if you want to increase sendrate ... to lower interp.


Examples in a stage-by-stage fashion

#1: avoid choke
This is cheap and simple - the bwlimit default is probably unnecessarily tight.
bwlimit = 100000 // unless you have upload limits, 100kb/sec per player ensures that they never choke during play, and mapchanges should be safe as well
#2: Increase sendrate and lower interp (best, cheapest)
sendrate = 25 // for mr 30, 25 sendrate is about as high as you want to go.
interp = 80 // 40 ms send interval -> 80 ms interp.
#3: Bump mr a bit
This is pretty much required because you need mr > 1.25 sendrate, and sendrate < tickrate, so we can't go further without knowing we can handle this
mr 40 // make sure you have the CPU for it; requires that your server can run endgame battles at 75% load or less
sendrate 29 // would make it 30, but must be < tickrate
interp 70 // higher sendrate -> lower interp
#4: Cut another 20ms of interp
Once we can handle 40mr, we can try increasing tickrate so we can increase sendrate so we can lower interp a bit more
tickrate 35// watch that endgame load! Before you go here, you probably need the server to run endgames (with 40mr!) at 85% load or less.
sendrate 34 // if you don't run on two cores, DO SO NOW - this will easily use up 20-30% of a core in endgame battles!
interp 60
#5 Shampo, rinse, repeat ...
Pretty much #3 and #4 again, but you will start running into diminishing returns; you will be spending CPU (and risk endgame overload = shitty play) 
for pretty much nothing at all, and going any lower on interp  may start to cause problems for players with higher variations in latency.




Monitoring tools

We all like monitoring tools. Free or otherwise. Or what about monitoring advice? This chapter covers that.


Brainless Panel / UGCC

Fount at http://brainless.us/. This is a webpanel interface. So you can install it and let other admins manage NS2 servers by a webpanel. Please remember that the info below is accurate right now. But it may change in the future. Please readup on their own website for recent info.

  • It has a licensed (payed) and unlicensed version.
  • The licensed one can do more. Please look at their site for info what matches yourself best.
  • It has a easy and normal version. The easy one is preconfigured. The normal one needs to be installed with your own specifications.
  • Note you can not update the panel software if its unlicenced. If your licence expires you can not update the panel without losing the additional functionality.
    • Easy one is windows only. The normal one is windows/linux. The normal one takes different webserver packages.


LGSM

https://gameservermanagers.com/lgsm/ns2server/

  • For linux.
  • Scripts which you can use to install, update, start or stop your server, etc.


TF Perfmon

Shameless of Tactical Freedom has made a mood and corresponding tool for perfmon. https://steamcommunity.com/sharedfiles/filedetails/?id=616698541 ModID: 24c212ad

  • The mod collects perfmon data and offers this through the default ns2 web admin as raw text data. (Adds a "perfmondata" action to webadmin

that outputs the past 30min of verbose perfmon data. RAW data can be read with ?request=getperfmondata.)

  • This can be fed into various tools, starting with Shameless his own tool called ns2web, which can be found at https://github.com/ShamelessCookie/ns2web.
  • It collects the data every second just like in console.
  • It actively shows data of the last 30 minutes and resets on mapchange.
  • It now has steam login integration on ns2web. (But for up to date info I strongly suggest to read the github link.)
  • The API aggregates this into chunks of 15 seconds. Note that no data is lost.