Friday, 28 February 2014

File operations in Python



1.       File Open

Here is the basic function for opening a file
f = open(‘file’, ‘mode’)
there are multiple mode used in python
f is called file handler, that is the object we used to control the file in further.
Common mode in python:
  • a: append mode – append the content at the end of the file
  • w: write mode – write the content into the file, it will erase the original conent of the file
  • r: read mode – read the cotent from the file. The default mode.
  • r+ : read and write mode


2.       File Read

There are a few methods to read the content from a file
f.readline() – just read one line from the file, return a string
f.readlines() – read all lines and return a list containing all lines.
f.read([character number]) – read specific characters. Without parameter, it will read all the content. The read will move the reading pointer ahead. We can use f.tell() to get the current pointer position.

3.       File Write

When we need to write the content to the file, we need to open the file as write or append mode
f.write(“string”) – write some string into the file
f.writelines(string list) – write mulitple lines into the file

4.       File delete

We need to use the os module to delete the file in operation system. To avoid deleting non-existing file, we need to check if the file is there.
os.remove(‘file’) or os.unlink(‘file’)    #delete the file from os
Sample code:
import os
if os.path.exists(‘/var/tmp/file’) : os.remove (‘/var/tmp/file’)

5.       file copy/move

we need to use the shutil module to copy or move the file in between OS. To avoid the source file is already existing, we need to check the source file as well.
Sample code
import shutil
if os.path.exists(‘/var/tmp/file’) : shutil.copyfile(‘/var/tmp/file’,’/var/tmp/file1’)            #copy file
if os.path.exists(‘/var/tmp/file’) : shutil.move(‘/var/tmp/file’,’/var/tmp/file1’)         #move file

6.       directory relate operations

os.mkdir(“dirpath”, mode=)         #mkdir
os.makedirs(‘dirpath’, mode=)      #create directory when the parent directory is not existing
os.rmdir(‘directory’)                   #remove the directory
os.removedirs(‘directorytree’)             #remove the directory tree.
os.listdir(‘path’)                   #list the files and directories in the directory (the name is a little bit confusing)
os.walk() or os.path.walk()    # Traversal the directory.
os.walk()        will return a tuple, each element contains the path, subdirectories and files

7.       stdin, stdout, stderr

we can change the stdin, stdout, stderr in python by assign sys.[file] to other values
for example, we will change the stdout
>>> import sys
>>> sys.stdout=open(r"./hello.txt","a")        #change the stdout value
>>> print "good bye"                           # you won’t see the ‘good bye’ printed onto screen.
>>> sys.stdout.close()                         # it is in the hello.txt file



Tuesday, 25 February 2014

BASH command line intercept and procession


Bash command is is the interface for sysadmin to control the bash. It is very important for sysadmin to understand how BASH intercepts the command. Here is the brief introduction how it is working.

  • split the command into tokens using delimiters.The delimiters include SPACE, TAB, NEWLINE, ; , (, ), <,  >, |, &
  • build the command stack (complicated process, not discussed here)
  • check if the first token of command is an alias, if it is, it will replace the alias with the value.
  • expand the {}, eg. It will expand a{a,b} to aa and ab
  • if the token is started with ~, it will replace with the home directory
  • any expression started with $, it will replace it with expression value.
  • execute the command in between ``
  • calculate the $((expression)) and replace it with result
  • wildcast expansion. Such as * ? , [ / ]
  • find the exact commands (buildin, $PATH)
  • IO redirection


Here is an example.

echo ~/i* $PWD `echo Yahoo Hadop` $((21*20)) > output

step 1. split the command into tokens
token[1] = echo
token[2] = ~/i*
token[3] = $PWD
token[4] = `echo Yahoo Hadop`
token[5] = $((21*20))
> output are not the tokens, they will be process in the IO rediretion.
Step 2,3,4 skipped
Step 5. replace ~ with /root. So the command is looking like
echo /root/i* $PWD `echo Yahoo Hadop` $((21*20))
step 6. replace $PWD with the current path for example it is:
              echo /root/i* /root `echo Yahoo Hadop` $((21*20))
step 7. excute the command in ``. so it would look like (iteriter process)
        echo /root/i* /root Yahoo Hadop $((21*20))
step 8. calculate the value in $(()). so it would look like
        echo /root/i* /root Yahoo Hadop 420
step 9: expand the wildcast.(take example)
        echo /root/indirect.sh /root/install.log /root/install.log.syslog /root Yahoo Hadop 420

now the BASH is ready to execute the commands as echo is a buidin command
        it will redirect the output to ouput file

Monday, 24 February 2014

Linux trace introduction- 1 strace command



Linux provides system admin quite a few useful tools for troubleshooting. Strace is one of the tools which can provide the details of syscalls including parameters, values, and the consumed time.

Strace is a very complicated command with quite a few options; we need to understand some common options for daily usage:

-c -- count time, calls, and errors for each syscall and report summary
-f -- follow forks, -ff -- with output into separate files
-r -- print relative timestamp, -t -- absolute timestamp, -tt -- with usecs
-e expr -- a qualifying expression: option=[!]all or option=[!]val1[,val2]...
   options: trace, abbrev, verbose, raw, signal, read, or write
-o file -- send trace output to FILE instead of stderr
-p pid -- trace process with process id PID, may be repeated

Some examples

Try to ls a non-existing file
[root@X001 tmp]# strace ls notexisting
execve("/bin/ls", ["ls", "notexisting"], [/* 29 vars */]) = 0
brk(0)                                  = 0x1b7b000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5b87f51000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=38923, ...}) = 0
mmap(NULL, 38923, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f5b87f47000
close(3)                                = 0

-----omitted-----

ioctl(1, SNDCTL_TMR_TIMEBASE or TCGETS, {B38400 opost isig icanon echo ...}) = 0
ioctl(1, TIOCGWINSZ, {ws_row=63, ws_col=237, ws_xpixel=0, ws_ypixel=0}) = 0
stat("notexisting", 0x1b7c0e0)          = -1 ENOENT (No such file or directory)
lstat("notexisting", 0x1b7c0e0)         = -1 ENOENT (No such file or directory)
open("/usr/share/locale/locale.alias", O_RDONLY) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=2512, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5b87f50000
read(3, "# Locale name alias data base.\n#"..., 4096) = 2512
read(3, "", 4096)                       = 0
close(3)                                = 0

exit_group(2)                           = ?

try to open an non-listening port only with network syscall enabled
[root@X001 tmp]# strace -e trace=network telnet localhost 9999
socket(PF_NETLINK, SOCK_RAW, 0)         = 3
bind(3, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 0
getsockname(3, {sa_family=AF_NETLINK, pid=2395, groups=00000000}, [12]) = 0
sendto(3, "\24\0\0\0\26\0\1\3\342\346\vS\0\0\0\0\0\0\0\0", 20, 0, {sa_family=AF_NETLINK, pid=0, groups=00000000}, 12) = 20
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"0\0\0\0\24\0\2\0\342\346\vS[\t\0\0\2\10\200\376\1\0\0\0\10\0\1\0\177\0\0\1"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 108
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"@\0\0\0\24\0\2\0\342\346\vS[\t\0\0\n\200\200\376\1\0\0\0\24\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 128
recvmsg(3, {msg_name(12)={sa_family=AF_NETLINK, pid=0, groups=00000000}, msg_iov(1)=[{"\24\0\0\0\3\0\2\0\342\346\vS[\t\0\0\0\0\0\0\1\0\0\0\24\0\1\0\0\0\0\0"..., 4096}], msg_controllen=0, msg_flags=0}, 0) = 20
socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
socket(PF_FILE, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 3
connect(3, {sa_family=AF_FILE, path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(9999), sin_addr=inet_addr("127.0.0.1")}, 16) = 0
getsockname(3, {sa_family=AF_INET, sin_port=htons(33896), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
socket(PF_INET6, SOCK_DGRAM, IPPROTO_IP) = 3
connect(3, {sa_family=AF_INET6, sin6_port=htons(9999), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(3, {sa_family=AF_INET6, sin6_port=htons(57576), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
Trying ::1...
socket(PF_INET6, SOCK_STREAM, IPPROTO_TCP) = 3
connect(3, {sa_family=AF_INET6, sin6_port=htons(9999), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = -1 ECONNREFUSED (Connection refused)
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
setsockopt(3, SOL_IP, IP_TOS, [16], 4)  = 0
connect(3, {sa_family=AF_INET, sin_port=htons(9999), sin_addr=inet_addr("127.0.0.1")}, 16) = -1 ECONNREFUSED (Connection refused)
telnet: connect to address 127.0.0.1: Connection refused
[root@X001 tmp]#

try to get the summary of the syscalls
[root@X001 tmp]# strace -c -e trace=network telnet localhost 9999
Trying ::1...
telnet: connect to address ::1: Connection refused
Trying 127.0.0.1...
telnet: connect to address 127.0.0.1: Connection refused
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
100.00    0.022996        3285         7           socket
  0.00    0.000000           0         6         4 connect
  0.00    0.000000           0         1           sendto
  0.00    0.000000           0         3           recvmsg
  0.00    0.000000           0         1           bind
  0.00    0.000000           0         3           getsockname
  0.00    0.000000           0         1           setsockopt
------ ----------- ----------- --------- --------- ----------------
100.00    0.022996                    22         4 total
[root@X001 tmp]#
 

to understand the output of strace, we need to have a brief idea about the linux internal and syscalls

BASH IO redirection

IO redirection:
IO redirection will capture a file , commands, program, scripts’s output, and send it to another file, commands and scripts.

Background: every script has three standard file discriptor:
       stdin: standard input
       stout: standard output
       stderr: standard error output

IO redirection example:
Output redirection
> file: output to the target file, it will overwrite the file content.
>> file: output to the target file, it will append the content at the end of the file
>| file : override the file even with noclobber option

stderr redirection
2>newfile: redirect the stderr to the newfile, for example: ls –al zz 2>newfile.
              If no file matched. The error msg will be written to newfile

stdin redirection
script < file: use file as my stdin instead of default stdin

<<delimiter – redefine the delimiter
For example
#cat > mytest<<GO
>this is my test
>it is ok
>let’s GO
>GO
#cat mytest
this is my test
it is ok
let’s GO

block redirction

> outputfile and < input file. It can used to change the stdin or stdout temporarry. 

Saturday, 22 February 2014

BASH variables

bash variables are used to store the useful information referred by the scripts

  1. local variables: declared and used only in the local shall process.
  2. environment variables: used by the login process and the sub process. the environment variables can be used by all editor's, scripts
  3. parameter variables: used to pass the parameters to shell scripts. they are read only.

some examples:
variable=value or ${variable=value} #how to set value to a variable:
echo $variable         #get the value
unset $variable                 #clear the variable
readonly variable #set the variable to readonly (immutable), you need to set the value and set the property to readonly
let a=a+1 # integer operations

by default, the BASH variable is string. default value is null if not declared before. if you have already set a variable to a string, you can still use the variable for integer operations. the variable will be 0 as initial value.

environment variables:

eg: export environment-variables # declare it is a environment variable
you can use env command to show the defined environment variables. To set and unset variables are the same as

there are some important pre-defined environment variables for the users
PWD,OLDPWD: current user location and previous location
PATH: the location shell is going to search for external commands, scripts and executable program
HOME: user's home directory
SHELL: user's default shell (/bin/bash)
USER: user login name such as root
UID: user UID.
PPID: parent PID.
some tips: the child process can inherit the environment variable from parenet process but if the variable is changed in child process, it can't be passed to the parent

~/.bash_profiles: usually where you define your BASH environment variables.
if .bash_profiles is not existing, it will use /etc/profiles as alternative file.
~/.login is used by Cshell, ~./profiles is used by kshell. the variables there can be referred by BASH but strongly not recommended.

parameter variables:


  • $0: the script itself, $1, $2, $3: the first, second and third parameters. ${10}, the 10th parameter
  • $# paramter numbers
  • $* all parameters
  • $? exit code. 0 for successful and 
  • $$ current PID

Quotes in Bash


"" (partially quote): all the characters are treated as normal characters except $ ` and \ . it can also reserve the space in the variable
'' (full quote): all the characters are treated as normal characters.
`command`: use the command as a linux command

Friday, 21 February 2014

the mooc course I am learning / have studied (keep updated)

MOOC is really a revolution to IT engineers. It provides us a keep-learning way and makes our knowledge base more wider and updated.

I did some course from 2013 but I began to record them from Feb 2014.

here is the MOOC list I am learning.


here is the list I have attended and finished.

  • Introduction to Google Tools (Udemy)- very small free course
  • TCP, HTTP and SPDY Deep Dive(Udemy)- very small free course

TCP, HTTP and web performance

This is a study note to Udemy class
https://www.udemy.com/tcp-http-spdy-deep-dive/

Web loading performance impacts the user feeling about the website. The research shows 100ms is the ideal time for web loading time.


By general, we can improve the web loading time in the below four areas:
Makeup/content:
Make fewer HTTP request
Optiomize css and scripts
Minimize cookies
Browser:
       User progressive enhancement
       Load scripts without blocking
       Use AJAX and defferred scripts
Network:
       Use caching and compression
       Use CDN
       Reduce DNS lookups
       Avoid redirctions
       Prefect commonly used resources
Server:
       Load balancing
       Backend server scripts
       Optimize database

Beside the webserver and backend processing time, the network overload has a great impact on the web loading time.

TCP was designed and devlopped in 1980 under the lower network condition. It was very good to handle the low bandwidth network. It is stream focusing with the features such as slow start, sliding window, congestion windows, nagel argithem etc.

RTT is very important for web response time. It is controlled by the light traveling time between you and the server plus lots of other factors such as network device hops, bandwidth.

Then how web load time is influenced by the TCP/HTTP
1.       1 RTT to establish the TCP
2.       1 RTT to send the HTTP request and get the response time
3.       1 RTT to get the other date further than the 3 packages
4.       extermly slow down when package lost, Retransission happens.

What we can do to improve the response time
1.       paralley TCP sessions
2.       reuse TCP sessions (persistent HTTP connections)
3.       pre-establish TCP sessions
4.       increase initial congestion window
5.       use CDN to reduce the RTT
6.       TCP fast open (HTTP GET request with TCP SYN)

Persistent HTTP Sessions
TCP session is not closed after the HTTP response is sent. The feature is supported by all major web sites and browsers. It can save TCP session control overload but will have to keep session in web server side (more threads or worker process). Timeout is set for apache

Initial congestion window: google experiment shows 10 is the suitable value for current internet congestion condition. It can send about 15k data to the browser so the content can be shown if the page is well designed.

HTTP request is sent in SYN package. Only experimental.


Web loading performance impacts the user feeling about the website. The research shows 100ms is the ideal time for web loading time.


By general, we can improve the web loading time in the below four areas:
Makeup/content:
Make fewer HTTP request
Optiomize css and scripts
Minimize cookies
Browser:
       User progressive enhancement
       Load scripts without blocking
       Use AJAX and defferred scripts
Network:
       Use caching and compression
       Use CDN
       Reduce DNS lookups
       Avoid redirctions
       Prefect commonly used resources
Server:
       Load balancing
       Backend server scripts
       Optimize database

Beside the webserver and backend processing time, the network overload has a great impact on the web loading time.

TCP was designed and devlopped in 1980 under the lower network condition. It was very good to handle the low bandwidth network. It is stream focusing with the features such as slow start, sliding window, congestion windows, nagel argithem etc.

RTT is very important for web response time. It is controlled by the light traveling time between you and the server plus lots of other factors such as network device hops, bandwidth.

Then how web load time is influenced by the TCP/HTTP
1.       1 RTT to establish the TCP
2.       1 RTT to send the HTTP request and get the response time
3.       1 RTT to get the other date further than the 3 packages
4.       extermly slow down when package lost, Retransission happens.

What we can do to improve the response time
1.       paralley TCP sessions
2.       reuse TCP sessions (persistent HTTP connections)
3.       pre-establish TCP sessions
4.       increase initial congestion window
5.       use CDN to reduce the RTT
6.       TCP fast open (HTTP GET request with TCP SYN)

Persistent HTTP Sessions
TCP session is not closed after the HTTP response is sent. The feature is supported by all major web sites and browsers. It can save TCP session control overload but will have to keep session in web server side (more threads or worker process). Timeout is set for apache

Initial congestion window: google experiment shows 10 is the suitable value for current internet congestion condition. It can send about 15k data to the browser so the content can be shown if the page is well designed.

HTTP request is sent in SYN package. Only experimental.


My Next 6 month study path - way to a Full Stack Engineer

There is a very popular idea called full stack engineer(FSE).  FSE means a engineer understand the whole development stack in the web/Internet environment. It would be difficult to be an FSE as it will take very long time, great effort and a good Devops environment.
I won’t be able to become an FSE in recent years but I believe DevOps will be the furture of System admin which means you have to understand lots of IT area and know how to devolop system.
I will work on the below area in the next 6 months. Then let’s see how it goes.(20 hours per week)
1 Linux admin/Internal
2)    BASH programming
3)    Python programing
4)  Network infrastructure
5)  TCP/IP stack and application protocol
6)  Web System infrastructure
7)    Web Framework (SSH/Django)

I will write about 100 blogs to cover the above sections. Bye the end of July. I hope I can have a solid knowledge about the above areas.


The next 6 month after July will be focusing on Web development, from backend to frontend and mobile side but will check the result about the first 6 months.

Sunday, 16 February 2014

BASH text file processing

In this Blog, we will show some common bash commands about text file processing.

cut: cut is a powerful tool to extract the dedicated colume, fields tool from a text-based file. 
The common options are:
  • -c <list>:      the specified columns for output.
  • -d <delimiter>  the delimiter used to separate the file, default is space and tab
  • -f <fields>     the fields for output.


For example,
The command to print the 1st and 7th field of the /etc/passwd file using : as delimiter

The command to print the 1st to 10th characters of /etc/passwd


sort: display the file by sorting the field
Some important paramters
  • -b: ignore the blank
  • -d: sort by dictionary
  • -g: sort by float
  • -f: ignore the case
  • -k: define the key
  • -n: sort by integer
  • -o: send the output to output file
  • -t: delimiter
  • -u: unique


Example: sort the file by the second field as float


Sort the /etc/password file using UID by descend.
sort -t : -k3 -n -r /etc/passwd


uniq: delete the duplicated records
-c: show the line number
-i: ignore the case
-u: only show the unduplicated records
-d: only show the duplicated records



wc command:
show the files, line counts, word counts and character counts
the file has to use space or tab as delimiters


head and tail commands to show the first and list lines (by default is 10)
head –n number <file>:  the first number lines
head –n -number <file> all the lines to the last number-st line
tail –n number <file>: the last number lines

tail –n +number <file>: the bottom number lines