Example Automated MySQL Backup Script

Rather than use --all-databases, which will prevent you from being able to selectively restore any single database, consider the following:

Ideally, you should have a daily backup, with some history. It should be bulletproof (--force), it should be logged (>> ...log), it should be compressed (| gzip), it should keep separate copies of each database, and it should automatically pick up any databases that are added.

Consider, rather, a shell script like this:

#!/bin/bash

Host=server.domain.com
BDir=/home/backup/backup/mysql

Dump="/usr/bin/mysqldump --skip-extended-insert --force"
MySQL=/usr/bin/mysql

Today=$(date "+%a")

# Get a list of all databases
Databases=$(echo "SHOW DATABASES" | $MySQL -h $Host)

for db in $Databases; do
        date=`date`
        file="$BDir/$Host-$db-$Today.sql.gz"
        echo "Backing up '$db' from '$Host' on '$date' to: "
        echo "   $file"
        $Dump -h $Host $db | gzip > $file
done

Which is assuming that you have a file ~/.my.cnf (chmod 600), that has:

[client]
user = "BACKUP"
password = "SOMEPASS8342783492"

Make sure that whatever user you are using for BACKUP has this grant statement:

GRANT
  SELECT, SHOW VIEW ON *.*
  TO BACKUP@localhost
  IDENTIFIED BY 'SOMEPASS8342783492';

So simply add this to a nightly cronjob, and you have a daily backup that rotates each 7 days week.

0 3 * * *   backup-mysql >> backup-mysql.log 2>> backup-mysql.log

The backup directory then contains:

-rw-r--r-- 1 backup backup 2217482184 Sep  3 13:35 base.appcove.net-VOS4_0-20090903.sql.gz
-rw-rw-r-- 1 backup backup 2505876287 Dec 25 00:48 base.appcove.net-VOS4_0-Fri.sql.gz
-rw-r--r-- 1 backup backup 2500384029 Dec 21 00:48 base.appcove.net-VOS4_0-Mon.sql.gz
-rw-r--r-- 1 backup backup 2506849331 Dec 26 00:48 base.appcove.net-VOS4_0-Sat.sql.gz
-rw-r--r-- 1 backup backup 2499859469 Dec 20 00:48 base.appcove.net-VOS4_0-Sun.sql.gz
-rw-rw-r-- 1 backup backup 2505046147 Dec 24 00:48 base.appcove.net-VOS4_0-Thu.sql.gz
-rw-rw-r-- 1 backup backup 2502277743 Dec 22 00:48 base.appcove.net-VOS4_0-Tue.sql.gz
-rw-r--r-- 1 backup backup 2504169910 Dec 23 00:48 base.appcove.net-VOS4_0-Wed.sql.gz
-rw-r--r-- 1 backup backup   76983829 Dec 25 00:49 base.appcove.net-VOS4_Mail_0-Fri.sql.gz
-rw-r--r-- 1 backup backup   76983829 Dec 21 00:49 base.appcove.net-VOS4_Mail_0-Mon.sql.gz
-rw-r--r-- 1 backup backup   76983829 Dec 26 00:49 base.appcove.net-VOS4_Mail_0-Sat.sql.gz
-rw-r--r-- 1 backup backup   76983829 Dec 20 00:48 base.appcove.net-VOS4_Mail_0-Sun.sql.gz
-rw-rw-r-- 1 backup backup   76983829 Dec 24 00:49 base.appcove.net-VOS4_Mail_0-Thu.sql.gz
-rw-rw-r-- 1 backup backup   76983829 Dec 22 00:49 base.appcove.net-VOS4_Mail_0-Tue.sql.gz
-rw-r--r-- 1 backup backup   76983829 Dec 23 00:49 base.appcove.net-VOS4_Mail_0-Wed.sql.gz
-rw-r--r-- 1 backup backup  304803726 Dec 25 00:49 base.appcove.net-WeSell_0-Fri.sql.gz
-rw-r--r-- 1 backup backup  303480087 Dec 21 00:49 base.appcove.net-WeSell_0-Mon.sql.gz
-rw-r--r-- 1 backup backup  304710121 Dec 26 00:49 base.appcove.net-WeSell_0-Sat.sql.gz
-rw-r--r-- 1 backup backup  303791294 Dec 20 00:49 base.appcove.net-WeSell_0-Sun.sql.gz
-rw-rw-r-- 1 backup backup  305315415 Dec 24 00:49 base.appcove.net-WeSell_0-Thu.sql.gz
-rw-rw-r-- 1 backup backup  302516217 Dec 22 00:49 base.appcove.net-WeSell_0-Tue.sql.gz
-rw-r--r-- 1 backup backup  303314217 Dec 23 00:49 base.appcove.net-WeSell_0-Wed.sql.gz
-rw-r--r-- 1 backup backup     135301 Dec 25 00:30 dc40.appcove.net-mysql-Fri.sql.gz
-rw-r--r-- 1 backup backup     135301 Dec 21 00:30 dc40.appcove.net-mysql-Mon.sql.gz
-rw-r--r-- 1 backup backup     135301 Dec 26 00:30 dc40.appcove.net-mysql-Sat.sql.gz
-rw-r--r-- 1 backup backup     135301 Dec 20 00:30 dc40.appcove.net-mysql-Sun.sql.gz
-rw-rw-r-- 1 backup backup     135301 Dec 24 00:30 dc40.appcove.net-mysql-Thu.sql.gz
-rw-rw-r-- 1 backup backup     135301 Dec 22 00:30 dc40.appcove.net-mysql-Tue.sql.gz
-rw-r--r-- 1 backup backup     135301 Dec 23 00:30 dc40.appcove.net-mysql-Wed.sql.gz

How to Author a *simple* jQuery plugin

I use jQuery as the JavaScript library for most projects. It’s terribly convenient for selecting elements and processing them.

However, jQuery is not JavaScript, and there is a lot of things it cannot do. Rather than writing a separate set of functionality outside of jQuery, why not simply extend it on an application-per-application basis? It really is that easy. Perhaps if you get some functionality that is good enough, you will want to convert them into a formal plugin and post it to http://plugins.jquery.com/

Here is the document that explains, in depth, how to author plugins.
http://docs.jquery.com/Plugins/Authoring

Here is an example at a very simple plugin that I whipped together while learning about plugins… It’s designed to accept an object of data, and apply it to any of the elements whose name attribute match the key of the data.

jQuery.fn.populate = function(Data) {
   this.each(function() {
      if (this.name in Data)
      {
         switch(this.type)
         {
            case 'text': this.value = Data[this.name]; break;
            case 'checkbox': this.checked = Data[this.name]; break
         }
      }
   });
   return this;
};

To call it… Select all input elements of the #Login form, and populate them with the passed data.

$('#Login :input').populate({ FirstName: 'sammy', LastName: 'jones' });

The .fn attribute of the jQuery object is where you attach new methods. The this member of those functions is set to the current jQuery object. The function must return the current jQuery object (self) unless explicitly stated otherwise.

Iterating through the elements of the current jQuery object should be done with this.each(…).

All in all, it’s very simple, and very powerful to be able to nest custom functionality into the jQuery object within a given application.

Read more at:

jQuery mouseout vs. mouseleave

Hello!

jQuery has a mouseout and a mouseleave event. The main difference is in how they handle child elements. Mouseout fires when the pointer moves into or out from child element, while mouseleave doesn’t.

This was causing a slight problem of having some menu’s close when the mouse encountered a link or other child element within the menu, it was triggering the mouseout event of the menu.

The fix, was to convert to mouseleave.

   $("#Nav1_Content > p").mouseleave(CloseTopMenuArea);

http://docs.jquery.com/Events/mouseout

Python has keyword only parameters!

Maybe I should have known this about Python by now, but…

Little did I know (was wishing for it today), but there IS a way to specify that arguments MUST be keyword only.

>>> def foo(a,b,*,c,d):
...     print(a,b,c,d)
...
>>> foo(1,2,3,4)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: foo() takes exactly 2 positional arguments (4 given)
In this case, everything after the * is required to be keyword only. This means that you can safely amend the function definition to re-order or even add keyword arguments.
Likewise, the following requires ALL keyword arguments.
>>> def foo(*,a,b):
...     print(a,b)
...
>>> foo(1,2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: foo() takes exactly 0 positional arguments (2 given)

>>> foo(a=1, b=2)
1 2


Read more about it at:

http://www.python.org/dev/peps/pep-3102/

A response to Preventing a pirate attack

An open letter to Daniel Jones, of GDS Publishing Ltd.
“The definitive resource for the global oil and gas energy industries online…”


Hello Mr. Jones,

 

I just read your well done article on ways to prevent a pirate attack.  I liked the infographic, and the descriptions that accompanied it.

http://www.ngoilgas.com/news/oil-tanker-pirate-attack/

While all seem to be interesting ideas, consider this:

Given that a M2 .50 caliber machine gun and spotlight mounted on the bow and stern would provide a physicalactualcapable, and pointed defense against any small to medium pirate attack…

Given that these ships travel in international water, where it is not reasonable to expect any police protection…

Given that the security issues of maintaining such weapons are important, but any organized army will entrust this and more powerful weapons to a trained 18 year old solider…

Given that the captain of the ship already has the responsiblity for life and wellbeing of his crew, not causing tremendous environmental disasters, and not crashing his ship…

Given that the pirates intend to use real military weapons to cause death and mayhem…

Why is it that such a reasonable (even if politically unlikely) method of defense was not even mentioned in your article — as if it didn’t exist?

As if it didn’t exist, when in reality, it is perhaps the most secure and reasonable approach to deterring armed pirates.   The same reason that army soldiers carry weapons.  The same reason that police carry weapons.  The same reason that security guards carry weapons.  The same reason that many civilians carry weapons.

May I suggest, that if  most rational people who would be asked why police carry weapons, were re-asked why oil tankers should carry weapons, the answer would be the same.

These are the same vessels that transport huge quantities of all kinds of goods, chemicals, military hardware, and other toxic, dangerous, or harmful substances.  It would be pathetic to think of transporting large quantities of cash from bank to bank without armed guards.  How much more so the cargo of merchant ships in the middle of the ocean?

In summary, why did you not mention this as a possibility in your otherwise excellent article?

Sincerely,

Jason Garber
Citizen, Father, Business Owner
Altoona, Pennsylvania

Update:  as soon as I posted this, these were the possibly related links that appeared.  Interesting…

A brief introduction to AppStruct

Have been very busy at work lately.  We made the decision about a month ago to switch (most|all) new projects over to use Python 3 with Apache, mod_wsgi, and AppStruct.  You may know what the first 3 are, but the 4th??

Special thanks goes to Graham Dumpleton behind mod_wsgi, and James William Pye behind Python>>Postgresql.   They are not involved or affiliated with AppCove or AppStruct (aside from great mailing list support) BUT if it were not for them, this framework would not exist.

AppStruct is a component of Shank, a meta-framework.  A stand-alone component in it’s own right, it represents the AppCove approach to web-application development.  Most of it is in planning, but the parts that have materialized are really, really cool.

Briefly, I’ll cover the two emerging areas of interest:

AppStruct.WSGI

This is a very pythonic (in my opinion) web application framework targeted toward Python 3.1 (a challenge in itself at this point).  We really wanted to base new development on Python 3.1+, as well as PostgreSQL using the excellent Python3/PostgreSQL library at http://python.projects.postgresql.org/.  However, none of the popular frameworks that I am aware of support Python 3, and most (if not all) of them have a lot of baggage I do not want to bring to the party.

Werkzeug was the most promising, but alas, I could not find Python 3 support for it either.  In fact, I intend to utilize a good bit of code from Werkzeug in finishing off AppStruct.WSGI.  (Don’t you just love good OSS licences?  AppStruct will be released under one of those also).

HTTP is just not that complicated.  It dosen’t need bundled up into n layers of indecipherable framework upon framework layers.   It doesn’t need abstracted to death.  I just needs to be streamlined a bit (with regard to request routing, reading headers, etc…).

Python is an amazing OO language.  It’s object model (or data model) is one of (if not the) most well conceived of any similar language, ever.  I want to use that to our advantage…

Inheritance, including multiple inheritance, has very simple rules in Python.  Want to use that as well.

Wish to provide developers with access to the low level guts they need for 2% of the requests, but not make them do extra work for the 98% of requests.

Speed is of the essence.  Servers are not cheap, and if you can increase your throughput by 5x, then that’s a lot less servers you need to pay for.

So, how does it work?

Well, those details can wait for another post.  But at this point the library is < 1000 lines of code, and does a lot of interesting things.

  • A fully compliant WSGI application object
  • 1 to 1 request routing to Python packages/modules/classes
  • All request classes derived from AppStruct.WSGI.Request

The application maps requests like /some/path/here to objects, like Project.WSGI.some.path.here.  If there is a trailing slash, then the application assumes that the class is named Index.  The object that is found is verified to be a subclass of AppStruct.WSGI.Request, and then…

Wait!  What about security?

Yes, yes, very important.  A couple things to point out.  First, the URLs are passed through a regular expression that ensures that they adhere to only the characters that may be used in valid python identifiers (not starting with _), delimited by “/”.  Second, the import and attribute lookup verify that any object (eg class) found is a subclass of the right thing.  And so on and so forth…

But you may say “wait, what about my/fancy-shmancy/urls/that-i-am-used-to-seeing?  Ever hear of mod_rewrite?  Yep.  Not trying to re-invent the wheel.  Use apache for what it was made for, not just a dumb request handler.

What about these request objects?

They are quite straightforward.  There are several attributes which represent request data, the environment, query string variables, post variables, and more.  There is a .Response attribute which maps to a very lightweight response object.

Speaking of it — it has only 4 attributes: [Status, Header, Iterator, Length].  As you see, it’s pretty low-level-wsgi-stuff.  But the developer would rarely interact with it, other than to call a method like .Response.Redirect(‘http://somewhere-else.com&#8217;, 302)

Once the application finds the class responsible for handling the URL, it simply does this (sans exception catching code):

RequestObject = Request(...)
with RequestObject:
   RequestObject.Code()
   RequestObject.Data()
return RequestObject.Response

Wow, that’s simple.  Let me point out one more detail.  The default implementation of Data() does this:

def Data(self):
   self.Response.Iterator = [self.Text()]
   self.Response.Length = len(self.Response.Iterator[0])

So the only thing really required of this Request class is to override Text() and return some text?  Yep, that simple.

But typically, you would at some point mixin a template class that would do something like this:

class GoodLookingLayout:
   def Text(self):
      return ( 
         """<html><head>...</head><body><menu>""" +
         self.Menu() +
         """</menu><div>""" +
         self.Body() +
         """</div></body></html>"""
         )

And then it would be up to the developer to override Menu and Body (each returning the appropriate content for the page).

Ohh, you may say.  What about templating engine X?  Well, it didn’t support Python 3, and I probabally didn’t want it anyway (for 9 reasons)…  If it’s really, really good and fits this structure, drop me a line, please.

What about that Code() method?

Yeah, that’s the place that any “logic” of UI interaction should go.  I’m not advocating mixing logic and content here, but you could do that if you wanted.  You will find in our code, a seperate package for application business logic and data access that the request classes will call upon.  But again, if you are writing a one page wonder, why go to all the trouble?

The only requirement for the Code() method is that it calls super().Code() at the top.  Since the idea is that the class .Foo.Bar.Baz.Index will inherit from the class .Foo.Bar.Index, this gives you a very flexible point of creating .htaccess style initialization/access-control code in one place.  So in /Admin/Index, you could put a bit of code in Code() which ensures the user is logged in.  This code will be run by all sub-pages, therefore ensuring that access control is maintained.  Relative imports are important for this task.

from .. import Index as PARENT
from AppStruct.WSGI.Util import *
class Index(PARENT):
   def Code(self):
      super().Code()
      self.Name = self.Post.FirstName + " " + self.Post.LastName
      # some other init stuff
   def Body(self):
      return "Hello there " + HS(self.Name) + "!"

Summary of AppStruct.WSGI…

To get up and running with a web-app using this library:

  1. A couple mod_wsgi lines in httpd.conf
  2. A .wsgi file that has 1 import and 1 line of code
  3. A request class in a package that matches an expected URI (myproject.wsgi.admin.foo ==  /admin/foo)

Speed?

With no database calls, it’s pushing over 2,000 requests per second on a dev server.  With a couple PostgreSQL calls, it is pushing out ~ 800 per second.

AppStruct.Database.PostgreSQL

This is not nearly as deep as the WSGI side of AppStruct, but still really cool.  To start off, I’d like to say that James William Pye has created an amazing Postgresql connection library in Python 3.  I mean just amazing.  In fact, almost so amazing that I didn’t want to change it (but then again…)

What we did here was subclass the Connection, PreparedStatement, and Row classes.  (well, actually replaced the Row class).

Once these were subclassed, we simply added a couple useful features.  Keep in mind that all of the great functionality of the underlying library is retained.

Connection.CachePrepare(SQL)
Connection._PS_Cache

Simply a dictionary of {SQL: PreparedStatementObject} that resides on the connection object.  When you directly or indirectly invoke CachePrepare, it says “if this SQL is in the cache, return the associated object.  Otherwise, prepare it, cache it, and return it”.  This approach really simplifies the storing of prepared statement objects in a multi-threaded environment, where there is really no good place to store them (and be thread safe, connection-failure safe, etc…)

Connection.Value(SQL, *args, **kwargs)
Connection.Row(SQL, *args, **kwargs)

These simple functions take (SQL, *args, **kwargs) and return either a single value or a single row.  They will raise an exception is != 1 row is found.  They make use of CachePrepare(), so you get the performance benefits of prepared statements without the hassle.  More on *args, **kwargs later under PrePrepare()

Connection.ValueList(SQL, *args, **kwargs)
Connection.RowList(SQL, *args, **kwargs)

Same as above, except returns an iterator (or list, not sure) of  zero or more values or rows.

Row class

Simply a dictionary that also supports attribute style access of items.  After evaluating the Tuple that behaves like a mapping, I decided for our needs, a simple dict would be a better representation of a row.

Connection.PrePreprepare(SQL, args, kwargs) -> (SQL, args)

Ok, so what’s the big deal?  Well, the big deal is that we don’t like positional parameters.  They are confusing to write, read, analyze, and see what the heck is going on when you get about 30 fields in an insert statement.  Feel free to argue, but maybe I’m not as young as I once was.

We do like keyword parameters.  But postgresql prepared statement api uses numeric positional parameters.  Not changing that…

The most simple use case is to pass SQL and keyword arguments.

"SELECT a,b,c FROM table WHERE id = $id AND name = $name"
dict(id=100, name='joe')

It returns

"SELECT a,b,c FROM table WHERE id = $1 AND name = $2"
(100, "joe")

Which is suitable for passing directly into the Connection.prepare() method that came with the underlying library.

I don’t know about you, but we find this to be very, very useful.

If you pass tuples of (field, value) as positional arguments, then they will replace [Field][Value], and [Field=Value] (in the SQL) with lists of fields, lists of values (eg $1, $2), or lists of field=values (eg name=$1, age=$2).  That really takes the verbosity out of INSERT and UPDATE statements with long field lists.

Conclusion

This is just in the early stages, and has a good deal of polishing to be done (especially on the WSGI side).  My purpose here was to introduce you to what you can expect to get with AppStruct, some of the rationale behind it, and that it’s really not that hard to take the bull by the horns and make software do what you want it to (especially if you use python).

Feel free to comment.

Python 3.1 and mod_wsgi performance notes

We’re researching the use of Python and mod_wsgi running under apache for developing some extensive web applications.  Here are some notes on a performance test that we recently ran.
==================================================================
Server:

x86_64
Python 3.1.1
mod_wsgi 3.0c5
apache 2.2
RHEL 5.3
quad core xenon
8 GB ram

Development system – not in production use.

==================================================================
Application:

1 import time
2
3 def application(environ, start_response):
4     status = ‘200 OK’
5
6     output = “hello world!”
7
8     #time.sleep(1)
9
10     response_headers = [
11         (‘Content-type’, ‘text/plain’),
12         (‘Content-Length’, str(len(output))),
13         ]
14
15     start_response(status, response_headers)
16
17     return [output]

==================================================================
Apache Configuration:

WSGISocketPrefix run/wsgi
<VirtualHost *>
ServerName shankproject.jason.star.ionzoft.net
DocumentRoot /home/jason/Code/ShankProject/Web
WSGIScriptAlias /Admin /home/jason/Code/ShankProject/WSGI/
Admin.wsgi
WSGIDaemonProcess shankproject.jason.star.ionzoft.net threads=15
WSGIProcessGroup shankproject.jason.star.ionzoft.net
</VirtualHost>

==================================================================
Tests:

—————————————————–
# Baseline with one process and 15 threads
# 15 threads total

threads=15
no process definition

WITHOUT time.sleep(1)
concurrency = 1  >> 1800 / second
concurrency = 100 >> 3900 / second

WITH time.sleep(1)
concurrency = 1  >> 1 / second
concurrency = 100  >> 14 / second

—————————————————–
# Get a marginal improvement by doubling the threads to 30
# 30 threads total

threads=30
no process definition

WITHOUT time.sleep(1)
concurrency = 1  >> 1680 / second
concurrency = 100 >> 3500 / second

WITH time.sleep(1)
concurrency = 1  >> 1 / second
concurrency = 100  >> 30 / second

—————————————————–
# Take processes from 1 to 3
# 90 threads total

threads=30
processes=3

WITHOUT time.sleep(1)
concurrency = 1  >> 1770 / second
concurrency = 100 >> 3500 / second

WITH time.sleep(1)
concurrency = 1  >> 1 / second
concurrency = 100  >> 88 / second

—————————————————–
# Take processes from 3 to 6
# Take threads from 30 to 15
# 90 threads total

threads=30
processes=3

WITHOUT time.sleep(1)
concurrency = 1  >> 1550 / second
concurrency = 100 >> 3300 / second

WITH time.sleep(1)
concurrency = 1  >> 1 / second
concurrency = 100  >> 88 / second

==================================================================
Conclusion:

mod_wsgi performance is outstanding.  Even running slower requests, it
can still handle significant concurrency in daemon mode without any
apparent issues.

Questions:
Is there any information on the balance between more processes less
threads and more threads less processes?

Thanks!

What is a tuple?

I get asked this from time to time, so for the fun of it, I’ll post this:

Tuples are immutable lists in python (and other languages).  Very lightweight, unchangable arrays.  They are used for storing multiple values in one variable (usually a fixed number of values).

x = (1, 2, 3, 'abc')
x[0]
-> 1
x[1] 
-> 2
x[2] 
-> 3

x[3] 
-> abc
x["bob"]
-> TypeError: tuple indices must be integers, not str

Tuples are immutable lists in python (and other languages).  Very lightweight, unchangable arrays.
x = (1,2,3)
x[0] == 1
x[1] == 2
x[2] == 3
x[“bob”] == “tuple indices must be integers”

A common example is storing x,y coordinances.  You could use a tuple for that:

point = (12,38)

Python can unpack tuples also.

x,y = (12,38)
x
-> 12
y
-> 38

Enjoy.

Freeky Bug

Ever have one of those bugs that customers complain about, but you just cannot reproduce it? Here is a good one…

Customers were complaining about being logged out when clicking a download link.

This particular setup is a Cisco CSS 11501 series load balancer with 2 Dell Poweredge web servers sitting behind it.  Each webserver is running apache, as well as an application server (python) which handles authentication and processing for THAT server.

For weeks, I could not reproduce this bug.  So tonight when I finally got bit by it (at home), I was clueless for a while.  The code is so simple.  A simple key lookup in a simple dictionary, yet it just was not making sense.

Here is the story:

A while ago, we were having problems with Internet Explorer downloading content over SSL.  This turns out to be a common problem with IE, so to fix it, I caused the downloads to not use SSL, which is more efficient anyway.

We use a cisco hardware load balancer which balances incoming requests to different backend servers.  It has a feature called STICKY SOURCE IP, which means that any connections routed from the same IP to the same site will be delivered to the same backend server.  This is nice, because you are always visiting the same server.

So as it turns out, by turning the download SSL off, the load balancer was using another “site” definition to handle the DOWNLOAD request.  STICKY SOURCE IP was out the window, and the request was being passed back to a “random” webserver.

About 50% of the time, users (like me tonight) were tossed to the other server, which knew nothing about the user login. That is why it was complaining about the “WB4_App::$DSEG and/or WB4_App::$AuthToken must be set in order to contact the     applications server.” error message, which is not one that should normally be shown.

To make matters worse, our IP address at work was apparently always using the same server, so I could not reproduce the problem.  I’m lucky that it happened to me at home, or I would still be banging my head against the desk…