Rafa XU's technical blog: Web

Showing posts with label Web. Show all posts

Thursday, 1 May 2014

Nginx Common Section Configuration Considerations

This blog is about the Nginx common section configuration.

Main module

error_log

timer_resolution

by default, everytime when return from kevent(), epoll, /dev/poll, select() or poll() , nginx will call gettimeofday(), it may not needed, we can specify the internal by
timer_resolution interval
for example
timer_resolution 100ms;

worker_cpu_affinity

this configuration can use sched_setaffinity() to combine the thread to dedicated CPU.

worker_cpu_affinity cpumask ...
for example
worker_processes 4;
worker_cpu_affinity 0001 0010 0100 1000;

worker_priority

This defination setup the nice value for the worker processes.
worker_priority number

worker_processes

this configuration defines the working processes numbers, usually it should match the CPU numbers and also the affinity

worker_rlimit_nofile

this configuration defines the maxim file descriptions each working procedure can open.
worker_rlimit_nofile number

Events module

worker_connections

it defines the max connection each worker process can handle
max clients = worker_processes * worker_connections
under the reverse proxy situation:
max clients = worker_processes * worker_connections/4

use

Wednesday, 2 April 2014

Apache configuration – virtual host

Virtual host is widely used in web servers. Probably most of the public facing web servers use the technology. It provides the ability to host multiple hosts in a single Apache HTTP (or any other web servers) instance.

There are two types of virtual host:

IP-based virtual host: use different IP address to provides different content to the user
Name-based virtual host: use different host name to provide different content to the user

Obviously, name-based virtual host is much more important as IP address is limited and involves lots of infrastructure configuration.

Name-based virtual host

Name-based virtualhost is based on a field called HOST in http request.

Here is part of my http request to the website

Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-GB,en-US;q=0.8,en;q=0.6
Cache-Control:max-age=0
Connection:keep-alive
Host:fujtest:90
If-Modified-Since:Fri, 23 Sep 2011 00:52:33 GMT

The web server then severs the request based on the HOST field.

It is strongly recommended to use different configuration file rather than the main httpd.conf to enable the virtual host setting for a large website. It will make the main section httpd.conf more concise and readable.

Here is a very basic configuration for the name-based virtual host configuration.

In main httpd.conf

Include conf/extra/httpd-vhosts.conf # to include the virtual host configuration.

In the conf/extra/httpd-vhosts.conf, we first need to enable name virtual host function by

NameVirtualHost [FQDN or IP]:[PORT]
Then define the virtualhost section
<VirtualHost ip:port>
       ServerName
       DocumentRoot
       etc
</VirtualHost>

Here is a simple sample configuration.

NameVirtualHost 192.168.179.150:90
<VirtualHost 192.168.179.150:90>
     DocumentRoot "/usr/local/apache2/htdocs/fujtest1"
     ServerName fujtest1
     ErrorLog "logs/fujtest1_error.log"
     CustomLog "logs/fujtest1_access.log" common
</VirtualHost>

<VirtualHost 192.168.179.150:90>
    DocumentRoot "/usr/local/apache2/htdocs/fujtest2"
    ServerName fujtest2
    ErrorLog "logs/fujtest2_error.log"
    CustomLog "logs/fujtest2_access.log" common
</VirtualHost>

You may need to check the permission options if virtual host is not in your main document ROOT section

Tuesday, 1 April 2014

Apache HTTP server configuration – Core Part

For the Apache HTTP configuration, usually there are 3 parts. The core part, the container part and the extension part. In this blog, we are going to talk about the core part briefly.

The core part is for the Apache main modules, it applies for the whole Apache configuration. There are some common configuration items as below:

ServerName: specify the default global server name for the http server. You may use FQDN or ip for this section.

Sample config: ServerName www.example.com:80 #specify my host is www.example.com

Listen: specify the listening port for the http server. By default, you may define it as 80 but it can be any other port.

Sample config: Listen 80 #specify the http server is listen on port 80

ServerRoot: it is used to define the server’s root. Usually it is defined when the source files are configured by --prefix=[location]. If it is a binary installation. Usually use /etc/apache2.

Sample: ServerRoot "/usr/local/apache2" #specify my root path is /usr/local/apache2

DocumentRoot: it is used to define the default html root location. It could be overwritten in the virtual host settings.

Sample: DocumentRoot "/usr/local/apache2/htdocs" #specify my root path is /usr/local/apache2

ServerAdmin: used to specify an WebAdmin’s email address . when there is a problem, you may send the email to report the issue.

Sample: ServerAdmin rafa.xu.au@gmail.com

ScriptAlias and Alias: as used to map the url to a specify directory in the machine. ScriptAlias will let the server recoginse the files in the directory as cgi scripts while Alias is only as normal directory.

Sample:

ScriptAlias /cgi-bin/ "/usr/local/apache2/cgi-bin/"

Alias /alias/ "/var/tmp/alias/"

User and Group: used to define the ownership for subprocesses. For security reason’s the user and group should be dedicated for httpd processes as an non-login account.

User apache

Group apache

Loadmodule: the command is used to load the modules dynamically.

Sample: LoadModule mime_module modules/mod_mime.so #it will load mime_module into Apache.

ErrorDocument: specify the error code with friendly page or scripts

Sample: ErrorDocument 404 /404.html #make a specify 404 page for the user.

Monday, 24 March 2014

Nginx process structure

Usually Nginx is working in multiple thread mode, there will be a master process, it will prefork multiple children processes called worker processes.

Master is not worked for web client connection and session management. It only manages the children processes.
Worker process handles the requests from clients (usually http/https) and servers them. The number of worker processes are decided by configuration

Configuration Syntax

user userid;

worker_processes [number]; #define how many worker processes

events{

worker_connections [number]; #define how many connection one process can server

}

Sample configuration

user nginx;

worker_processes 10;

the processes output

root 5031 1 0 13:41 ? 00:00:00 nginx: master process ./nginx

nginx 5032 5031 0 13:41 ? 00:00:00 nginx: worker process

nginx 5033 5031 0 13:41 ? 00:00:00 nginx: worker process

nginx 5034 5031 0 13:41 ? 00:00:00 nginx: worker process

nginx 5035 5031 0 13:41 ? 00:00:00 nginx: worker process

nginx 5036 5031 0 13:41 ? 00:00:00 nginx: worker process

nginx 5037 5031 0 13:41 ? 00:00:00 nginx: worker process

nginx 5038 5031 0 13:41 ? 00:00:00 nginx: worker process

nginx 5039 5031 0 13:41 ? 00:00:00 nginx: worker process

nginx 5040 5031 0 13:41 ? 00:00:00 nginx: worker process

nginx 5041 5031 0 13:41 ? 00:00:00 nginx: worker process

process 5031 is the master(parent) process

processes 5032-5041 are the worker (children) process

hardware and core configuration

low traffic website	medium traffic website	high traffic website
CPU : 2 cores, memory 2GB, visits ~ 1/s	CPU : 4 cores, memory 4GB, visits ~ 50/s	CPU : 8 cores, memory 12GB, visits ~ 1000/s



worker_processor 2; worker_priority -4; worker_cpu_affinity 01 10 events{ worker_connections 128; }	worker_processor 4; worker_priority 0; worker_cpu_affinity 0001 0010 0100 1000 events{ worker_connections 1024; }	worker_processor 8; worker_priority 0; events{ worker_connections 8192; }

Friday, 21 February 2014

TCP, HTTP and web performance

This is a study note to Udemy class

https://www.udemy.com/tcp-http-spdy-deep-dive/

Web loading performance impacts the user feeling about the website. The research shows 100ms is the ideal time for web loading time.

By general, we can improve the web loading time in the below four areas:

Makeup/content:

Make fewer HTTP request

Optiomize css and scripts

Minimize cookies

Browser:

User progressive enhancement

Load scripts without blocking

Use AJAX and defferred scripts

Network:

Use caching and compression

Use CDN

Reduce DNS lookups

Avoid redirctions

Prefect commonly used resources

Server:

Load balancing

Backend server scripts

Optimize database

Beside the webserver and backend processing time, the network overload has a great impact on the web loading time.

TCP was designed and devlopped in 1980 under the lower network condition. It was very good to handle the low bandwidth network. It is stream focusing with the features such as slow start, sliding window, congestion windows, nagel argithem etc.

RTT is very important for web response time. It is controlled by the light traveling time between you and the server plus lots of other factors such as network device hops, bandwidth.

Then how web load time is influenced by the TCP/HTTP

1. 1 RTT to establish the TCP

2. 1 RTT to send the HTTP request and get the response time

3. 1 RTT to get the other date further than the 3 packages

4. extermly slow down when package lost, Retransission happens.

What we can do to improve the response time

1. paralley TCP sessions

2. reuse TCP sessions (persistent HTTP connections)

3. pre-establish TCP sessions

4. increase initial congestion window

5. use CDN to reduce the RTT

6. TCP fast open (HTTP GET request with TCP SYN)

Persistent HTTP Sessions

TCP session is not closed after the HTTP response is sent. The feature is supported by all major web sites and browsers. It can save TCP session control overload but will have to keep session in web server side (more threads or worker process). Timeout is set for apache

Initial congestion window: google experiment shows 10 is the suitable value for current internet congestion condition. It can send about 15k data to the browser so the content can be shown if the page is well designed.

HTTP request is sent in SYN package. Only experimental.

Web loading performance impacts the user feeling about the website. The research shows 100ms is the ideal time for web loading time.

By general, we can improve the web loading time in the below four areas:

Makeup/content:

Make fewer HTTP request

Optiomize css and scripts

Minimize cookies

Browser:

User progressive enhancement

Load scripts without blocking

Use AJAX and defferred scripts

Network:

Use caching and compression

Use CDN

Reduce DNS lookups

Avoid redirctions

Prefect commonly used resources

Server:

Load balancing

Backend server scripts

Optimize database

Beside the webserver and backend processing time, the network overload has a great impact on the web loading time.

RTT is very important for web response time. It is controlled by the light traveling time between you and the server plus lots of other factors such as network device hops, bandwidth.

Then how web load time is influenced by the TCP/HTTP

1. 1 RTT to establish the TCP

2. 1 RTT to send the HTTP request and get the response time

3. 1 RTT to get the other date further than the 3 packages

4. extermly slow down when package lost, Retransission happens.

What we can do to improve the response time

1. paralley TCP sessions

2. reuse TCP sessions (persistent HTTP connections)

3. pre-establish TCP sessions

4. increase initial congestion window

5. use CDN to reduce the RTT

6. TCP fast open (HTTP GET request with TCP SYN)

Persistent HTTP Sessions

HTTP request is sent in SYN package. Only experimental.

WHY INTERNET

I have being working in the traditional IT environment for about 8 years and now (from 2004) I decided to turn to the Internet Area as it is more exciting and challengeable.

Now the area I am interested in and working on:

High volume traffic Web infrastructure. This are has a very wide scope. It includes the web deployment Architecture, The web framework, support procedure for extremely high HA.

Web Mining. This is more artificial intelligence and data mining related. It is the technology used to find the useful information from semi-structed web pages.

Other areas I am interested may be include Cloud Platform (OpenStack), NoSQL, Hadoop, Steaming Computing. These are the leading technologies widely used in Internet Company. But I just have the basic idea how these are working.

I can be contacted by rafa.xu.au@gmail.com

My Next 6 months study Areas:

OpenStack

Linux DevOps

Python/C

puppet(certified)

About Me

Thursday, 1 May 2014

Nginx Common Section Configuration Considerations

Main module

error_log

timer_resolution

worker_cpu_affinity

worker_priority

worker_processes

worker_rlimit_nofile

Events module

worker_connections

use

Wednesday, 2 April 2014

Apache configuration – virtual host

Name-based virtual host

Tuesday, 1 April 2014

Apache HTTP server configuration – Core Part

Monday, 24 March 2014

Nginx process structure

Configuration Syntax

hardware and core configuration

Friday, 21 February 2014

TCP, HTTP and web performance