Showing posts with label Web. Show all posts
Showing posts with label Web. Show all posts

Thursday, 1 May 2014

Nginx Common Section Configuration Considerations

This blog is about the Nginx common section configuration.



Main module

error_log

error_log is used for error logging function. syntax is

error_log file | stderr [ debug | info | notice | warn | error | crit | alert | emerg ]

if we compile the nginx with "--with-debug", we can enable debug function as below:
error_log LOGFILE [debug_core | debug_alloc | debug_mutex | debug_event | debug_http | debug_imap];
the way to disable the error_log :

error_log /dev/null crit;
 

timer_resolution

by default, everytime when return from kevent(), epoll, /dev/poll, select() or poll() , nginx will call gettimeofday(), it may not needed, we can specify the internal by
timer_resolution interval

for example
timer_resolution  100ms;

 

worker_cpu_affinity

this configuration can use sched_setaffinity() to combine the thread to dedicated CPU.

worker_cpu_affinity cpumask ...

for example
worker_processes     4;
worker_cpu_affinity 0001 0010 0100 1000;

 

worker_priority

This defination setup the nice value for the worker processes.
worker_priority number

 

worker_processes

this configuration defines the working processes numbers, usually it should match the CPU numbers and also the affinity

 

worker_rlimit_nofile

this configuration defines the maxim file descriptions each working procedure can open.
worker_rlimit_nofile number

 

Events module

worker_connections

it defines the max connection each worker process can handle
max clients = worker_processes * worker_connections 
under the reverse proxy situation: 
max clients = worker_processes * worker_connections/4

 

use

you can choose the IO mode for Nginx. By default, it will be epoll in Linux System.
use [ kqueue | rtsig | epoll | /dev/poll | select | poll | eventport ]

Wednesday, 2 April 2014

Apache configuration – virtual host


Virtual host is widely used in web servers. Probably most of the public facing web servers use the technology. It provides the ability to host multiple hosts in a single Apache HTTP (or any other web servers) instance.

There are two types of virtual host:
  • IP-based virtual host: use different IP address to provides different content to the user
  • Name-based virtual host: use different host name to provide different content to the user
Obviously, name-based virtual host is much more important as IP address is limited and involves lots of infrastructure configuration.

Name-based virtual host

Name-based virtualhost is based on a field called HOST in http request.
Here is part of my http request to the website
 
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-GB,en-US;q=0.8,en;q=0.6
Cache-Control:max-age=0
Connection:keep-alive
Host:fujtest:90
If-Modified-Since:Fri, 23 Sep 2011 00:52:33 GMT

The web server then severs the request based on the HOST field.
It is strongly recommended to use different configuration file rather than the main httpd.conf to enable the virtual host setting for a large website. It will make the main section httpd.conf more concise and readable.

Here is a very basic configuration for the name-based virtual host configuration.
In main httpd.conf
Include conf/extra/httpd-vhosts.conf    # to include the virtual host configuration.
In the conf/extra/httpd-vhosts.conf, we first need to enable name virtual host function by


NameVirtualHost [FQDN or IP]:[PORT]
Then define the virtualhost section
<VirtualHost ip:port>
       ServerName
       DocumentRoot
       etc
</VirtualHost>


Here is a simple sample configuration.

NameVirtualHost 192.168.179.150:90
<VirtualHost 192.168.179.150:90>
     DocumentRoot "/usr/local/apache2/htdocs/fujtest1"
     ServerName fujtest1
     ErrorLog "logs/fujtest1_error.log"
     CustomLog "logs/fujtest1_access.log" common
</VirtualHost>

<VirtualHost 192.168.179.150:90>
    DocumentRoot "/usr/local/apache2/htdocs/fujtest2"
    ServerName fujtest2
    ErrorLog "logs/fujtest2_error.log"
    CustomLog "logs/fujtest2_access.log" common
</VirtualHost>

You may need to check the permission options if virtual host is not in your main document ROOT section

Tuesday, 1 April 2014

Apache HTTP server configuration – Core Part




For the Apache HTTP configuration, usually there are 3 parts. The core part, the container part and the extension part. In this blog, we are going to talk about the core part briefly.

The core part is for the Apache main modules, it applies for the whole Apache configuration.  There are some common configuration items as below:

ServerName: specify the default global server name for the http server.  You may use FQDN or ip for this section.
Sample config: ServerName www.example.com:80         #specify my host is www.example.com

Listen: specify the listening port for the http server. By default, you may define it as 80 but it can be any other port.
Sample config: Listen 80 #specify the http server is listen on port 80

ServerRoot: it is used to define the server’s root. Usually it is defined when the source files are configured by --prefix=[location].  If it is a binary installation. Usually use /etc/apache2.
Sample:  ServerRoot "/usr/local/apache2" #specify my root path is /usr/local/apache2

DocumentRoot: it is used to define the default html root location. It could be overwritten in the virtual host settings.
Sample: DocumentRoot "/usr/local/apache2/htdocs" #specify my root path is /usr/local/apache2

ServerAdmin: used to specify an WebAdmin’s email address . when there is a problem, you may send the email to report the issue.
Sample: ServerAdmin rafa.xu.au@gmail.com

ScriptAlias and Alias: as used to map the url to a specify directory in the machine. ScriptAlias will let the server recoginse the files in the directory as cgi scripts while Alias is only as normal directory.
Sample:
    ScriptAlias /cgi-bin/ "/usr/local/apache2/cgi-bin/"
    Alias /alias/ "/var/tmp/alias/"

User and Group: used to define the ownership for subprocesses. For security reason’s the user and group should be dedicated for httpd processes as an non-login account.
User apache
Group apache

Loadmodule: the command is used to load the modules dynamically.
Sample: LoadModule mime_module modules/mod_mime.so     #it will load mime_module into Apache.

ErrorDocument:   specify the error code with friendly page or scripts
Sample: ErrorDocument 404 /404.html #make a specify 404 page for the user.


Monday, 24 March 2014

Nginx process structure




Usually Nginx is working in multiple thread mode, there will be a master process, it will prefork multiple children processes called worker processes. 

  • Master is not worked for web client connection and session management. It only manages the children processes.
  • Worker process handles the requests from clients (usually http/https) and servers them. The number of worker processes are decided by configuration

Configuration Syntax

user userid;
worker_processes  [number];   #define how many worker processes
events{
worker_connections [number];                #define how many connection one process can server
}

Sample configuration
user  nginx;
worker_processes  10;

the processes output

root      5031     1  0 13:41 ?        00:00:00 nginx: master process ./nginx
nginx     5032  5031  0 13:41 ?        00:00:00 nginx: worker process
nginx     5033  5031  0 13:41 ?        00:00:00 nginx: worker process
nginx     5034  5031  0 13:41 ?        00:00:00 nginx: worker process
nginx     5035  5031  0 13:41 ?        00:00:00 nginx: worker process
nginx     5036  5031  0 13:41 ?        00:00:00 nginx: worker process
nginx     5037  5031  0 13:41 ?        00:00:00 nginx: worker process
nginx     5038  5031  0 13:41 ?        00:00:00 nginx: worker process
nginx     5039  5031  0 13:41 ?        00:00:00 nginx: worker process
nginx     5040  5031  0 13:41 ?        00:00:00 nginx: worker process
nginx     5041  5031  0 13:41 ?        00:00:00 nginx: worker process

process 5031 is the master(parent) process
processes 5032-5041 are the worker (children) process

hardware and core configuration

low traffic website
medium traffic website
high traffic website

CPU : 2 cores,
memory 2GB,
visits ~ 1/s
CPU : 4 cores,
memory 4GB,
visits ~ 50/s
CPU : 8 cores,
memory 12GB,
visits ~ 1000/s




worker_processor 2;
worker_priority -4;
worker_cpu_affinity 01 10
events{
     worker_connections 128;
}
worker_processor 4;
worker_priority 0;
worker_cpu_affinity 0001 0010 0100 1000
events{
     worker_connections 1024;
}
worker_processor 8;
worker_priority 0;
events{
     worker_connections 8192;
}











Friday, 21 February 2014

TCP, HTTP and web performance

This is a study note to Udemy class
https://www.udemy.com/tcp-http-spdy-deep-dive/

Web loading performance impacts the user feeling about the website. The research shows 100ms is the ideal time for web loading time.


By general, we can improve the web loading time in the below four areas:
Makeup/content:
Make fewer HTTP request
Optiomize css and scripts
Minimize cookies
Browser:
       User progressive enhancement
       Load scripts without blocking
       Use AJAX and defferred scripts
Network:
       Use caching and compression
       Use CDN
       Reduce DNS lookups
       Avoid redirctions
       Prefect commonly used resources
Server:
       Load balancing
       Backend server scripts
       Optimize database

Beside the webserver and backend processing time, the network overload has a great impact on the web loading time.

TCP was designed and devlopped in 1980 under the lower network condition. It was very good to handle the low bandwidth network. It is stream focusing with the features such as slow start, sliding window, congestion windows, nagel argithem etc.

RTT is very important for web response time. It is controlled by the light traveling time between you and the server plus lots of other factors such as network device hops, bandwidth.

Then how web load time is influenced by the TCP/HTTP
1.       1 RTT to establish the TCP
2.       1 RTT to send the HTTP request and get the response time
3.       1 RTT to get the other date further than the 3 packages
4.       extermly slow down when package lost, Retransission happens.

What we can do to improve the response time
1.       paralley TCP sessions
2.       reuse TCP sessions (persistent HTTP connections)
3.       pre-establish TCP sessions
4.       increase initial congestion window
5.       use CDN to reduce the RTT
6.       TCP fast open (HTTP GET request with TCP SYN)

Persistent HTTP Sessions
TCP session is not closed after the HTTP response is sent. The feature is supported by all major web sites and browsers. It can save TCP session control overload but will have to keep session in web server side (more threads or worker process). Timeout is set for apache

Initial congestion window: google experiment shows 10 is the suitable value for current internet congestion condition. It can send about 15k data to the browser so the content can be shown if the page is well designed.

HTTP request is sent in SYN package. Only experimental.


Web loading performance impacts the user feeling about the website. The research shows 100ms is the ideal time for web loading time.


By general, we can improve the web loading time in the below four areas:
Makeup/content:
Make fewer HTTP request
Optiomize css and scripts
Minimize cookies
Browser:
       User progressive enhancement
       Load scripts without blocking
       Use AJAX and defferred scripts
Network:
       Use caching and compression
       Use CDN
       Reduce DNS lookups
       Avoid redirctions
       Prefect commonly used resources
Server:
       Load balancing
       Backend server scripts
       Optimize database

Beside the webserver and backend processing time, the network overload has a great impact on the web loading time.

TCP was designed and devlopped in 1980 under the lower network condition. It was very good to handle the low bandwidth network. It is stream focusing with the features such as slow start, sliding window, congestion windows, nagel argithem etc.

RTT is very important for web response time. It is controlled by the light traveling time between you and the server plus lots of other factors such as network device hops, bandwidth.

Then how web load time is influenced by the TCP/HTTP
1.       1 RTT to establish the TCP
2.       1 RTT to send the HTTP request and get the response time
3.       1 RTT to get the other date further than the 3 packages
4.       extermly slow down when package lost, Retransission happens.

What we can do to improve the response time
1.       paralley TCP sessions
2.       reuse TCP sessions (persistent HTTP connections)
3.       pre-establish TCP sessions
4.       increase initial congestion window
5.       use CDN to reduce the RTT
6.       TCP fast open (HTTP GET request with TCP SYN)

Persistent HTTP Sessions
TCP session is not closed after the HTTP response is sent. The feature is supported by all major web sites and browsers. It can save TCP session control overload but will have to keep session in web server side (more threads or worker process). Timeout is set for apache

Initial congestion window: google experiment shows 10 is the suitable value for current internet congestion condition. It can send about 15k data to the browser so the content can be shown if the page is well designed.

HTTP request is sent in SYN package. Only experimental.