Rafa XU's technical blog: April 2014

Friday 25 April 2014

File Magic Number

In many OSes, different type of files have different starting bytes. The typical starting numbers are called magic number. Usually the command ‘file’ checks the magic number to decide what’s the file type

Examples of ‘file’ command:

[ec2-user@ip-172-31-16-100 webchecker]$ file dnstest.py
dnstest.py: a /usr/bin/python script text executable
[ec2-user@ip-172-31-16-100 webchecker]$ file /bin/ls
/bin/ls: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped
[ec2-user@ip-172-31-16-100 webchecker]$ file hostname.conf
hostname.conf: ASCII text

when we write the script, either with python, perl or bash. We actually set the magic number for the file. The script may start with a "shebang" (#!, 23 21) followed by the path to an interpreter,

we may use readelf command to get the magic number

[ec2-user@ip-172-31-16-100 webchecker]$ readelf -h /usr/bin/python
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           Advanced Micro Devices X86-64
  Version:                           0x1
  Entry point address:               0x400620
  Start of program headers:          64 (bytes into file)
  Start of section headers:          7048 (bytes into file)
  Flags:                             0x0
  Size of this header:               64 (bytes)
  Size of program headers:           56 (bytes)
  Number of program headers:         8
  Size of section headers:           64 (bytes)
  Number of section headers:         31

  Section header string table index: 30

Thursday 24 April 2014

Git Commit Golden Rules

commit is an essential activity in Git usage. here are some important rules for commit

only commit one thing in a single commit

don’t mix things together in one single commit. Always commit just one function, one bug-fix for the issue.

write good commit message

commit message is like the comment to the program. Make sure you commit the change with a suitable commit message for further reference.

commit a completed work.

Never commit something that is half-done. If you need to save your current work temporarily in something like a clipboard, you can use Git's "Stash" feature

only commit tested work

Related to the point above, you shouldn't commit code that you think is working. Test it well - and before you commit it to the repository.

Friday 18 April 2014

Amazon AWS introduction – EBS

EBS (Elastic block store) is like a virtual hard drive you can use with your Amazon EC2 instances.

The advantage of using EBS is:

Keep you data separated from your computer instances
Can be attached to your instances only when needed
When EC2 instances failed, the data on the EBS won’t be lost

The EBS varies from 1G to 1T, it is created on an Amazon AZ (Availability Zone), replicated over AZ to prevent data lost. One EBS can only be attached to one EC2 instance at a time. It is very similar to a hard drive and any other storage device. In practice, you can add multiple EBS to EC2 instance that can stripe IO and increase READ/WRITE performance especially for data-intensive applications such as database.

How to use EBS

In AWS console, go to EC2 Dashboard – ELASTIC BLOCK STORE – Volumes.

Click “Create Volume” and assign the size of storage you want., the AZ and type.

The you will need to attach the EBS volume to the EC2 instance by right click the EBS and select attach volume.

Then you can create file system using fdisk and LVM tools as it is a normal local disk or SAN disk

Thursday 17 April 2014

Advanced linux keyborad Skills

Sysadmins use terminal or virtual terminal to control the Linux Systems. It is very usually for the system admin to master some terminal skills.

Ctrl-a	Move the cursor to the beginning of the line
Ctrl-e	Move the cursor to the end of the line
Ctrl-f	Move forward the cursor one character ahead
Ctrl-b	Move backward the cursor one character behind
Alt-f	Move forward the cursor one word ahead
Alt-b	Move backward the cursor one word behind
Ctrl-d	Clear the screen and keep the command
Alt-l	Change all the letters lower case from cursor to end of the word
Alt-u	Change all the letters upper case from cursor to end of the word
Ctrl-k	cut the command line from cursor to end of the line
Ctrl-u	cut the command line from cursor to beginning of the line
Alt-d	Cut the word from cursor to end of word
Alt-Backspace	Cut the word from cursor to beginning of word
Ctrl-y	Paste
Alt-?	Same as [tab][tab], list all possible command
history command	List all the history command
Ctrl-r	Search in the history command
!!	Execute the last command
!number	Execute the number command

Practice is the most efficient way to master the commands

Program compile and link in linux.

Most of system software provides the source code and we can use compile to build it into the executable binaries. This blog will introduce what is inside the compilation process.

Compile tools

On linux platform, gcc is the most wide used tool for c, c++ and even java compiler.

You will use

#which gcc
to determine if gcc has been installed

Single file compilation

here is a single c program:

#include <stdio.h> 
void main(void) 
{ 
  printf("hello world!"); 
}

now we can compile the program and run it

[root@X001 cprogram]# ls
hello.c
[root@X001 cprogram]# gcc hello.c
[root@X001 cprogram]# ls
a.out  hello.c
[root@X001 cprogram]# ./a.out
hello world!

or you may use 'gcc -o outputfile sourcefile' to assign an output executable file name

multiple files compilation

if we have more than one single source file, which is the most case in application development. source code A will refer to other files, when one of the files is changed, do we need to re-compile all of the related files?
the answer is no.
we have two files

File:a.c 
#include <stdio.h> 
int main () 
{ 
    printf("this is from first file\n");
    method(); 
} 

File:b.c 
#include <stdio.h> 
void method(void) 
{ 
    printf("this is from second file:\n"); 
}

now we will do
1. compile the individual source files into object files
2. link the object files into binary

[root@X001 cprogram]# ll
total 8
-rw-r--r--. 1 root root 95 Apr 14 14:11 a.c
-rw-r--r--. 1 root root 87 Apr 14 14:05 b.c
======compile the files here ===========
[root@X001 cprogram]# gcc -c a.c b.c
[root@X001 cprogram]# ll
total 16
-rw-r--r--. 1 root root   95 Apr 14 14:11 a.c
-rw-r--r--. 1 root root 1568 Apr 14 14:11 a.o
-rw-r--r--. 1 root root   87 Apr 14 14:05 b.c
-rw-r--r--. 1 root root 1504 Apr 14 14:11 b.o
=======link the files here ==============
[root@X001 cprogram]# gcc -o result a.o b.o

[root@X001 cprogram]# ./result
this is from first file
this is from second file

if we change the print statement of the b.c to "this is the updated file" .
we only need re-compile the b.o file and then regenerate the executable binary.

link library

none of the commercial software is developed from scratch. We will use some of common software component called library. for example, you may use mathmatics library to get the sin value for PI.

Sunday 13 April 2014

Linux file types

Linux supports multiple file types such as normal files, directory, socket files etc. we can use command ‘ll’ or ‘li –al ’ to list the file and check the file type by the first character of the output.

In general, linux system has the below file types:

-     normal files
d     directory file
l     link (symlink) file
s     socket file
p     pipe file
b     block file
c     character file

normal file. (marked as ‘-’)

Most of the files in linux system are normal files, it includes normal text file, library files, zip files, executable binaries

Directory: (marked as ‘d’)

Directory is a special kind of the file, it contains can contain other kind of files including directories.

Block device file(marked as ‘b’)

Block files are in /dev/ directory, they are some kind of presuedo files. The file can be visited randomly such as disk, usd stick.

Character device file (marked as ‘c’)

Character files are in /dev/ directory as well, the file has to be visited by sequence. It contains device such as mouse, keyboard, serial ports, console tty.

Both character and block files are called device file.

Socket file (marked as ‘s’)

Socket file is used for process communication. Note, it is not the network socket file, it is the unix socket file.

Link file (marked as ‘l’)

The file is linked file (symlink) and created by ‘ln -s’ command.

Tuesday 8 April 2014

Linux filesystem Introduction

In general linux file system has 3 parts

Superblock: it records the meta information for the linux filesystem. It contains the inode/iblock, amount, usage, free capacity and other file system information. The superblock information can be viewed by tune2fs
Inode. Record the file attributes. Every file will have a inode
Block: record the file content

Every file will use one inode and some blocks, the data will be allocated to the first block of the file, if it is over the block, then it will use the second block. Block can’t be shared between files

Inode contains the below information (most of information can be seen by stat file)

1. permission

2. ownership

3. file size

4. ctime, atime, mtime

5. ACL

6. file pointer

inode 3-layer index The inode use the 3 level table to contain the inodes mapping

Thursday 3 April 2014

how to set the initial password for MYSQL

After installed MySQL in your machine, you may meet the below error when you first login

ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)

This is because the password hasn’t been set for the root user

To resolve the problem.

1. stop mysql server

service mysql stop

2. start mysql using safe_mode

/usr/bin/mysqld_safe --user=mysql --skip-grant-tables --skip-networking &

3. login to mysql without password

mysql -u root

4. update mysql user password

Update user SET Password=PASSWORD('111111') where USER='root';

5. start mysql as normal and enjoy your mysql trip

Django Web Design Step by Step – install and setup environment

the demo source code can be downloaded in github.
git@github.com:aaaaatoz/djangoweb.git

Download and Install Django

Django is the most popular Python-based web framework (https://www.djangoproject.com/), in this serial blogs, we are going to setup a website using Django step by step

Django can be downloaded from the official website, the current version is 1.6 but in the blog, we will use 1.5.5 for demonstration.

When Django is downloaded, unzip the file and you will get the files as below:

Then we will use the traditional python command to install the django infrastructure

#python setup.py install

After a few mins, you will get the Django framework installed. To test if it is ready just type

[root@hadoop1 Django-1.5.5]# which django-admin.py

/usr/bin/django-admin.py

Setup a Project

Setup a Project and Application for your website

In Django 1.5, project is the framework and application is the web container. We will navigate to a proper location and create a project as

#django-admin.py startproject djangoweb

You will find a directory called djangoweb has been created. In side the directory, there is a directory with the same name (djangoweb) and a manage.py file

Setup an Application

Now, make sure you are in the djangoweb (project not the subdirectory) directory and create an application called mysite by the below commands:

#django-admin.py startapp mysite

You will see a dictory called mysite has been created. Now You djangoweb project should look like:

Run Django Web Framework

We will explain the files and their function in the next a few blogs but now let’s enjoy Django by the command

#python manage.py runserver 0.0.0.0:8080

There should be no error and the web server is ready to serve the request.

Try using web browser to access it and you will see an Django Welcome page.

Wednesday 2 April 2014

Apache configuration – virtual host

Virtual host is widely used in web servers. Probably most of the public facing web servers use the technology. It provides the ability to host multiple hosts in a single Apache HTTP (or any other web servers) instance.

There are two types of virtual host:

IP-based virtual host: use different IP address to provides different content to the user
Name-based virtual host: use different host name to provide different content to the user

Obviously, name-based virtual host is much more important as IP address is limited and involves lots of infrastructure configuration.

Name-based virtual host

Name-based virtualhost is based on a field called HOST in http request.

Here is part of my http request to the website

Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-GB,en-US;q=0.8,en;q=0.6
Cache-Control:max-age=0
Connection:keep-alive
Host:fujtest:90
If-Modified-Since:Fri, 23 Sep 2011 00:52:33 GMT

The web server then severs the request based on the HOST field.

It is strongly recommended to use different configuration file rather than the main httpd.conf to enable the virtual host setting for a large website. It will make the main section httpd.conf more concise and readable.

Here is a very basic configuration for the name-based virtual host configuration.

In main httpd.conf

Include conf/extra/httpd-vhosts.conf # to include the virtual host configuration.

In the conf/extra/httpd-vhosts.conf, we first need to enable name virtual host function by

NameVirtualHost [FQDN or IP]:[PORT]
Then define the virtualhost section
<VirtualHost ip:port>
       ServerName
       DocumentRoot
       etc
</VirtualHost>

Here is a simple sample configuration.

NameVirtualHost 192.168.179.150:90
<VirtualHost 192.168.179.150:90>
     DocumentRoot "/usr/local/apache2/htdocs/fujtest1"
     ServerName fujtest1
     ErrorLog "logs/fujtest1_error.log"
     CustomLog "logs/fujtest1_access.log" common
</VirtualHost>

<VirtualHost 192.168.179.150:90>
    DocumentRoot "/usr/local/apache2/htdocs/fujtest2"
    ServerName fujtest2
    ErrorLog "logs/fujtest2_error.log"
    CustomLog "logs/fujtest2_access.log" common
</VirtualHost>

You may need to check the permission options if virtual host is not in your main document ROOT section

Tuesday 1 April 2014

Apache HTTP server configuration – Core Part

For the Apache HTTP configuration, usually there are 3 parts. The core part, the container part and the extension part. In this blog, we are going to talk about the core part briefly.

The core part is for the Apache main modules, it applies for the whole Apache configuration. There are some common configuration items as below:

ServerName: specify the default global server name for the http server. You may use FQDN or ip for this section.

Sample config: ServerName www.example.com:80 #specify my host is www.example.com

Listen: specify the listening port for the http server. By default, you may define it as 80 but it can be any other port.

Sample config: Listen 80 #specify the http server is listen on port 80

ServerRoot: it is used to define the server’s root. Usually it is defined when the source files are configured by --prefix=[location]. If it is a binary installation. Usually use /etc/apache2.

Sample: ServerRoot "/usr/local/apache2" #specify my root path is /usr/local/apache2

DocumentRoot: it is used to define the default html root location. It could be overwritten in the virtual host settings.

Sample: DocumentRoot "/usr/local/apache2/htdocs" #specify my root path is /usr/local/apache2

ServerAdmin: used to specify an WebAdmin’s email address . when there is a problem, you may send the email to report the issue.

Sample: ServerAdmin rafa.xu.au@gmail.com

ScriptAlias and Alias: as used to map the url to a specify directory in the machine. ScriptAlias will let the server recoginse the files in the directory as cgi scripts while Alias is only as normal directory.

Sample:

ScriptAlias /cgi-bin/ "/usr/local/apache2/cgi-bin/"

Alias /alias/ "/var/tmp/alias/"

User and Group: used to define the ownership for subprocesses. For security reason’s the user and group should be dedicated for httpd processes as an non-login account.

User apache

Group apache

Loadmodule: the command is used to load the modules dynamically.

Sample: LoadModule mime_module modules/mod_mime.so #it will load mime_module into Apache.

ErrorDocument: specify the error code with friendly page or scripts

Sample: ErrorDocument 404 /404.html #make a specify 404 page for the user.

WHY INTERNET

I have being working in the traditional IT environment for about 8 years and now (from 2004) I decided to turn to the Internet Area as it is more exciting and challengeable.

Now the area I am interested in and working on:

High volume traffic Web infrastructure. This are has a very wide scope. It includes the web deployment Architecture, The web framework, support procedure for extremely high HA.

Web Mining. This is more artificial intelligence and data mining related. It is the technology used to find the useful information from semi-structed web pages.

Other areas I am interested may be include Cloud Platform (OpenStack), NoSQL, Hadoop, Steaming Computing. These are the leading technologies widely used in Internet Company. But I just have the basic idea how these are working.

I can be contacted by rafa.xu.au@gmail.com

My Next 6 months study Areas:

OpenStack

Linux DevOps

Python/C

puppet(certified)

About Me