Rafa XU's technical blog: May 2014

Friday, 30 May 2014

MySQL data type

In MySQL, Each data type has several characteristics:

What kind of values it can represent
How much space values take up
Whether values are fixed-length (all values of the type take the same amount of space) or variable-length (the amount of space depends on the particular value being stored)
How MySQL compares and sorts values of the type
Whether the type can be indexed

Character String Types

CHAR(n)
VARCHAR(n)
BINARY(n)
VARBINARY(n)
TEXT(n)
BLOB(n)

we need to specify the lenth of the characters. char has a fixed length, When the stored string is less than the number you specify, it will be filled with 'null', when we use varchar,the database will specify the length of the value, additional cost will be incurred.

blob -- case sensitive

text/char -- case insensitive

numeric type:

Exact numeric type
integer:

TINYINT
SMALLINT
MEDIUMINT
INT
BIGINT

Integer has some attributes.

UNSIGNED: the value is just for zero or positive value.
NOT NULL: the field does not allow NULL value
AUTO_INCREMENT: the field's value will be increase for each line. usually leave it to system and don't need to change it manually. DBA can use LAST_INSERT_ID() to get the current value

the exact numeric has another type:
DECIMAL

Approximate numeric type:

FLOAT
DOUBLE

The FLOAT data type is a single-precision floating-point number.A FLOAT is limited in how many significant digits it can store

The DOUBLE data type is a double-precision floating-point number.Like a FLOAT, a DOUBLE is limited in how many significant digits it can store

Datetime:

DATE: only store date
TIME: only store time
DATETIME: store the time eclipsed from 1970-01-01
TIMESTAMP: store the date and time

vThe TIMESTAMP field is stored in UTC. When the value is stored, it is converted to UTC, using the mysqld system variable time_zone to figure out the proper value. When a TIMESTAMP value is retrieved, it is converted to the current time_zone of the mysqld server

Bool:

True or False

Internal：

ENUM， SET

MySQL adds in two data types that are very different from standard data types

The ENUM data type is an enumerated list of 1 to 65,535 strings, which indicate the allowed values for the field. Only one of the allowed values can be stored in an ENUM field
The SET data type is an enumerated list of 1 to 64 strings, which indicate the allowed values for the field.In the SET data type, any combination of the strings in the enumerated list can be stored as a comma-delimited list

Tuesday, 27 May 2014

MySQL client tool - mysql

MySQL client tool mysql is a very powerful tool for DBA

1. how to connect to the database

mysql -- the CLI command for DBA
--user, -u
--host, -h
--password, -p
--port
--protocol
--database DATABASE, -D
-A disable completion of table and column names

mysql < SQLfile.sql - execute the SQL batch

2. common commands:

here is the official help document about the mysql client command. these commands will take effect on the client only.

clear (\c) Clear the current input statement.
connect (\r) Reconnect to the server. Optional arguments are db and host.
delimiter (\d) Set statement delimiter.
edit (\e) Edit command with $EDITOR.
ego (\G) Send command to mysql server, display result vertically.
exit (\q) Exit mysql. Same as quit.
go (\g) Send command to mysql server.
help (\h) Display this help.
nopager (\n) Disable pager, print to stdout.
notee (\t) Don't write into outfile.
pager (\P) Set PAGER [to_pager]. Print the query results via PAGER.
print (\p) Print current command.
prompt (\R) Change your mysql prompt.
quit (\q) Quit mysql.
rehash (\#) Rebuild completion hash.
source (\.) Execute an SQL script file. Takes a file name as an argument.
status (\s) Get status information from the server.
system (\!) Execute a system shell command.
tee (\T) Set outfile [to_outfile]. Append everything into given outfile.
use (\u) Use another database. Takes database name as argument.
charset (\C) Switch to another charset. Might be needed for processing binlog with multi-byte charsets.
warnings (\W) Show warnings after every statement.
nowarning (\w) Don't show warnings after every statement.

I think the most important commands are:
use : switch between database;
MySQL>use mysql
Database changed

status: show briefly about the server performance
MySQL>MySQL>status
--------------
mysql Ver 14.14 Distrib 5.6.15, for Linux (x86_64) using EditLine wrapper

Connection id: 3
Current database: mysql
Current user: root@localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 5.6.15 MySQL Community Server (GPL)
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: latin1
Db characterset: latin1
Client characterset: utf8
Conn. characterset: utf8
UNIX socket: /var/lib/mysql/mysql.sock
Uptime: 14 min 31 sec

Threads: 1 Questions: 60 Slow queries: 0 Opens: 87 Flush tables: 1 Open tables: 80 Queries per second avg: 0.068

source: execute the SQL in batch
system: execute the shell
MySQL>system cat /var/tmp/test.sql
select user,host from user;
MySQL>source /var/tmp/test.sql
+------+-----------+
| user | host |
+------+-----------+
| root | 127.0.0.1 |
| root | ::1 |
| root | localhost |
| root | x002 |
+------+-----------+
4 rows in set (0.00 sec)

3. how to understand the prompt:

mysql> --- standard prompt
-> --- waiting for statement delimiter
'> --- waiting for end of '
"> --- waiting for end of "
`> --- waiting for end of `

Bind configuration verification

DNS/Bind is the critical infrastructure over Internet. And it has the strict syntax format for it's configuration and zone files. the missing or typo can cause the service unable to start. When we are ready to restart or reload the configuration, we will need to test the configuration.

1. named-checkconf:
This tool is used to check the BIND main configuration (named.conf)
basic usage is:
named-checkconf [path of named.conf]
this is a bad example
[root@ns tmp]# named-checkconf /etc/named.conf
/etc/named.conf:27: missing ';' before end of file

this is a good example:
[root@ns tmp]# named-checkconf /etc/named.conf
[root@ns tmp]#

the tool will check the syntax of named.conf but it won't be able to check the logic mistakes such as missing zone files

2. named-checkzone
This tool is used to check the zone configuration file
basic usage is:
named-checkzone [zone-name] [zone file]
good example
[root@ns tmp]# named-checkzone "localhost" /var/named/named.localhost
zone localhost/IN: loaded serial 0
OK
[root@ns tmp]# named-checkzone "rafa.com" /var/named/rafa.com
zone rafa.com/IN: loaded serial 2014052401
OK
[root@ns tmp]# named-checkzone "56.168.192.in-addr.arpa" /var/named/192.168.56.zone
zone 56.168.192.in-addr.arpa/IN: loaded serial 2014052401
OK
[root@ns tmp]#

bad example:
[root@ns tmp]# named-checkzone "www.rafa.com" /var/named/rafa.com
/var/named/rafa.com:2: ignoring out-of-zone data (rafa.com)
zone www.rafa.com/IN: has 0 SOA records
zone www.rafa.com/IN: has no NS records
zone www.rafa.com/IN: not loaded due to errors.

3. service named configtest
this is the tool for all configuration verification.

[root@ns tmp]# service named configtest
zone localhost/IN: loaded serial 0
zone 0.0.127.in-addr.arpa/IN: loaded serial 0
zone rafa.com/IN: loaded serial 2014052401
zone 56.168.192.in-addr.arpa/IN: loaded serial 2014052401
umount: /var/named/chroot/var/named: device is busy.
(In some cases useful info about processes that use
the device is found by lsof(8) or fuser(1))

Monday, 26 May 2014

memcached client commands

Memcached is a very powerful caching tool and widely used in Internet-based company. We usually consider it as a key-value NoSQL. In this blog, I will introduce some of the common Memcached client commands.

1. connect to memcached

memcached even does not provide any client tool, we can use telnet client to connect to the Memcached database.

[root@X001 ~]# telnet localhost 11211
Trying ::1...
Connected to localhost.
Escape character is '^]'.

2. set

set command is used to set key-value pair to the memcached

<command name> <key> <flags> <exptime> <bytes>
<data block>

parameters：

<command >	set
<key>	key
<flags>	integer, used by client as additional field
<exptime>	keep alive time (seconds), use 0 for never expired
<bytes>	bytes
<data block>	value

for example:

set a 999 0 2
10
STORED

3. add and replace

Set will replace the previous value if the key exist, if you don't want the old value to be overwritten, you can use 'add' instead. if the key is there, it will give out a 'not_stored' warning message.

add a 999 0 2

11

NOT_STORED

If you just wants to update the value, you can use 'replace' to set the value, if the key is not in memcached, it will give out a 'not_stored' warning message.

replace b 99 0 2
44
STORED

4. get and gets

get will return the value of a key while gets will add a version. both of them support get multiple values at a time.

for example:

get b
VALUE b 88 2
tt
END
gets b
VALUE b 88 2 7     # version number
tt
END
gets a b                     #get a and b
VALUE a 888 2 4
12
VALUE b 88 2 7
tt
END

5. delete and flush_all

delete command will delete the value by key. flush_all will just flush the all key-value data.

delete a noreply
get a
END
delete b
DELETED

6. STATS

stats command give the admin the running status for the memcached process

stats
STAT pid 1729
STAT uptime 5865
STAT time 1401159261
STAT version 1.4.4
STAT pointer_size 64
STAT rusage_user 1.018845
STAT rusage_system 1.179820
STAT curr_connections 10
STAT total_connections 11
STAT connection_structures 11
STAT cmd_get 18
STAT cmd_set 14
STAT cmd_flush 2
STAT get_hits 13
STAT get_misses 5
STAT delete_misses 0
STAT delete_hits 2
STAT incr_misses 0
STAT incr_hits 0
STAT decr_misses 0
STAT decr_hits 0
STAT cas_misses 0
STAT cas_hits 0
STAT cas_badval 0
STAT auth_cmds 0
STAT auth_errors 0
STAT bytes_read 503
STAT bytes_written 1432
STAT limit_maxbytes 67108864
STAT accepting_conns 1
STAT listen_disabled_num 0
STAT threads 4
STAT conn_yields 0
STAT bytes 0
STAT curr_items 0
STAT total_items 9
STAT evictions 0

Wednesday, 21 May 2014

DNS Technology Introduction

DNS

DNS (Domain Name Service) is one of the most important Internet infrastructures. It provides a human readable name for the Internet users while translate it to IP address or other machine understandable marks.

In DNS world, we usually use the item domain name or hostname, such as www.google.com or mail.gmail.com. the standard name is FQDN: Full Qualified Domain Name and it should be ended with a '.' eg www.google.com.

In most case, DNS provides the translation between FQDN and IP but that is not all it does.

NSSWITCH

hosts: files dns
file: /etc/hosts
dns: DNS

in OS, it uses libnss_file.so or libnss_dns.so as it's C library.

Stub Resolver

stub resolver: is the resolver client program on the host.it will handle the DNS request for the application but it can only understand a full answer to the request. it can't deal with a referral.
if it doesn't find an answer to the query in its cache, it will ask the recursive resolver that is configured.
it will accept the first valid answer that comes through, ignoring any answers that might come later.

Domain type:

organization domains：.com, .org, .net, .cc
country domains: .cn, .tw, .hk, .iq, .ir, .jp
reversed domains: IP-->FQDN

Query type:

Recursive Query: the DNS client requires that the DNS server respond to the client with either the requested resource record or an error message stating that the record or domain name does not exist. The DNS server cannot just refer the DNS client to a different DNS server.

Iterative Query: is one in which a DNS client allows the DNS server to return the best answer it can give based on its cache or zone data. If the queried DNS server does not have an exact match for the queried name, the best possible information it can return is a referral (that is, a pointer to a DNS server authoritative for a lower level of the domain namespace). The DNS client can then query the DNS server for which it obtained a referral. It continues this process until it locates a DNS server that is authoritative for the queried name, or until an error or time-out condition is met.
DNS database is a global distributed database
The parent know where it's directed attached children
The Child don't know where is his parent node, usually it knows the ROOT.

DNS Server functions:

accept the local requests for
local clients query the non-authority DNS query(recursive)
outside clients query the authority DNS query(iterative)
whether positive or nagative, DNS should give the answer with a TTL
for the outside clients query the non-authority DNS query, usually it should be denied.

DNS server types:

master DNS server: the SOA of the domain and maintain the database.
slave DNS server: the SOA of the domain and but synchronize the database from master
cached DNS server: only cache the result. Non-SOA
forwarding server: only forward the query to some other DNS server.

Friday, 9 May 2014

regular expression cheet sheet

.	any character except newline
\w \d \s	word, digit, whitespace
\W \D \S	not word, digit, whitespace
[abc]	any of a, b, or c
[^abc]	not a, b, or c
[a-g]	charcter between a & g
^abc$	start / end of the string
\b	word boundary
\. \* \\	escaped special characters
\t \n \r	tab, linefeed, carriage return
\u00A9	unicode escaped ©
(abc)	capture group
\1	backreference to group #1
(?:abc)	non-capturing group
(?=abc)	positive lookahead
(?!abc)	negative lookahead
a* a+ a?	0 or more, 1 or more, 0 or 1
a{5} a{2,}	exactly five, two or more
a{1,3}	between one & three
a+? a{2,}?	match as few as possible
ab\|cd	match ab or cd

git common commands

git init --- initialise git repository on the local machine
git status --- view the current files changes (add,remove or untracked...)
git add   --- add untrack files to repository (wildcast supported)
git add   --- remove files to repository and disk as wll (wildcast supported)
git commit --- commit the change to the repository
git log -- show git commit history
git remote add origin [github or git server URL] --- link the local repository to the
git push origin master    --- push the local commits to remote repository
git pull origin master --- pull down the remote repository
git reset --- reset the stage
git branch [branchname] --- create a new branch
git checkout [branchname] --- switch to another branch
git merge [branchname] --- merge the branch with the current branch
git branch -d [branchname] --- delete the branch

Monday, 5 May 2014

Linux Filesystem Hierarchy Standard(FHS)

Linux FHS or Filesystem Hierarchy Standard is the standard linux file system outlet across all distributions.

/boot: boot related file such as
vmlinuz-2.6.32-279.el6.i686
initramfs-2.6.32-279.el6.i686.img
grub (directory, bootloader)

boot must be in a separated file system

/dev: the preudo file system to contain the device files
block device: random access, data is organised as block
char device: sequence access, data is organised as char.
each device is identified by a major and a minor.

/etc：contains the configuration files

/home：user home directory, usually it is /home/USERNAME

/root：super user's home directory

/lib：library files
static linked files .a
dynamic linked files .so (shared object)
/lib/modules：kernel related modules

/media：mount point used for external device

/mnt：mount point used for temporary usage.

/opt：optional directory. often used by 3rd party program
/proc：preudo file system, used as kernel parameters mapping files.
/sys：preudo file system, used as hardware mapping files.
/temporary directory, another option is /var/tmp
/var：variable files
/bin: user level executable binaries
/sbin：super user's executable binaries

/usr：shared, read-only
/usr/bin
/usr/sbin
/usr/lib

/usr/local：third party program
/usr/local/bin
/usr/local/sbin
/usr/local/lib

Sunday, 4 May 2014

man sections

'man' (manual) is the most useful tool for sysadmin. We can use man to understand the command, configuration file format, etc. but man has different sections. they are:

    1      User Commands
    2      System Calls
    3      C Library Functions
    4      Devices and Special Files
    5      File Formats and Conventions
    6      Games et. Al.
    7      Miscellanea
    8      System Administration tools and Deamons

we can use command whatis to check how many sections a command will be in
as

root@ip-172-31-7-84:~# whatis time

time (7)             - overview of time and timers

time (1)             - run programs and summarize system resource usage

time (2)             - get time in seconds

then you can use:

man 7 time
man 1 time
man 2 time

to check different means of time in linux

Thursday, 1 May 2014

Nginx Common Section Configuration Considerations

This blog is about the Nginx common section configuration.

Main module

error_log

timer_resolution

by default, everytime when return from kevent(), epoll, /dev/poll, select() or poll() , nginx will call gettimeofday(), it may not needed, we can specify the internal by
timer_resolution interval
for example
timer_resolution 100ms;

worker_cpu_affinity

this configuration can use sched_setaffinity() to combine the thread to dedicated CPU.

worker_cpu_affinity cpumask ...
for example
worker_processes 4;
worker_cpu_affinity 0001 0010 0100 1000;

worker_priority

This defination setup the nice value for the worker processes.
worker_priority number

worker_processes

this configuration defines the working processes numbers, usually it should match the CPU numbers and also the affinity

worker_rlimit_nofile

this configuration defines the maxim file descriptions each working procedure can open.
worker_rlimit_nofile number

Events module

worker_connections

it defines the max connection each worker process can handle
max clients = worker_processes * worker_connections
under the reverse proxy situation:
max clients = worker_processes * worker_connections/4

use

WHY INTERNET

I have being working in the traditional IT environment for about 8 years and now (from 2004) I decided to turn to the Internet Area as it is more exciting and challengeable.

Now the area I am interested in and working on:

High volume traffic Web infrastructure. This are has a very wide scope. It includes the web deployment Architecture, The web framework, support procedure for extremely high HA.

Web Mining. This is more artificial intelligence and data mining related. It is the technology used to find the useful information from semi-structed web pages.

Other areas I am interested may be include Cloud Platform (OpenStack), NoSQL, Hadoop, Steaming Computing. These are the leading technologies widely used in Internet Company. But I just have the basic idea how these are working.

I can be contacted by rafa.xu.au@gmail.com

My Next 6 months study Areas:

OpenStack

Linux DevOps

Python/C

puppet(certified)

About Me