How to install a Trusted Certificate Authority on Windows 7

At my company AppCove, we have our own certificate authority that we use with development servers and sites.  This allows us to (at no additional cost) use HTTPS and SSL for all of these alternate domains and subdomains.

The downside is that our certificate is not trusted by any stock browser or operating system.

Therefore, to prevent getting an ugly and scary SSL warning, anyone who needs to visit these (private audience) sites must first “trust” our certificate authority.

A note on security.  If you are telling your computer to trust a certificate authority, then you must really actually “trust” that authority.  If the signing key fell into the wrong hands, then they could create fake certificates for other sites you visit, like, and intercept your data.  At AppCove, we use aggressive security measures to protect the certificate authority key (as we do for customer data and applications).

In this example, I am causing my Windows 7 workstation to trust appcove-ca-cert.pem.crt















— Start of slight detour — 

If you want to verify it was installed, do this.  Otherwise, skip the next 2 screens.



— End of slight detour —



At this point, you should be able to visit any HTTPS site that was signed with this certificate authority and your browser will indicate that it is a secure connection.

Introducing FileStruct (for Python)

FileStruct is a lightweight and fast file-cache / file-server designed for web-applications.  It solves the problems of “where do I save all of those uploads” that has been encountered time and time again.  FileStruct uses the local filesystem, but in a sensible way (keeping permissions sane), and with the ability to secure it to a reasonable level.

Here is a simple example of taking an image upload, resizing, and saving it:

with client.TempDir() as TempDir:
   open(TempDir.FilePath('upload.jpg'), 'wb').write(mydata)
   TempDir.ResizeImage('upload.jpg', 'resize.jpg', '100x100')
   hash1 = TempDir.Save('upload.jpg')
   hash2 = TempDir.Save('resize.jpg')

Design Goals

Immutable Files

FileStruct is designed to work with files represented by the SHA-1 hash of their contents. This means that all files in FileStruct are immutable.

High Performance

FileStruct is designed as a local repository of file data accessable (read/write) by an application or web application. All operations are local I/O operations and therefore, very fast.

Where possible, streaming hash functions are used to prevent iterating over a file twice.

Direct serving from Nginx

FileStruct is designed so that Nginx can serve files directly from it’s Data directory using an X-Accel-Redirect header. For more information on this Nginx configuration directive, see

Assuming that nginx runs under nginx user and file database is owned by the fileserver group, nginx needs to be in thefileserver group to serve files:

# usermod -a -G fileserver nginx


FileStruct is designed to be as secure as your hosting configuration. Where possible, a dedicated user should be allocated to read/write to FileStruct, and the database directory restricted to this user.


FileStruct is designed to be incredibly simple to use.

File Manipulaion

FileStruct is designed to simplify common operations on files, especially uploaded files. Image resizing for thumbnails is supported.

Temporary File Management

FileStruct is designed to simplify the use of Temp Files in an application. The API supports creation of a temporary directory, placing files in it, Ingesting files into FileStruct, and deleting the directory when completed (or retaining it in the event of an error)

Garbage Collection

FileStruct is designed to retain files until garbage collection is performed. Garbage collection consists of telling FileStruct what files you are interested in keeping, and having it move the remaining files to the trash.

Backup and Sync with Rsync

FileStruct is designed to work seamlessly with rsync for backups and restores.

Atomic operations

At the point a file is inserted or removed from FileStruct, it is a filesystem move operation. This means that under no circumstances will a file exist in FileStruct that has contents that do not match the name of the file.

No MetaData

FileStruct is not designed to store MetaData. It is designed to store file content. There may be several “files” which refer to the same content. empty.logempty.txt, and empty.ini may all refer to the empty fileData/da/39/da39a3ee5e6b4b0d3255bfef95601890afd80709. However, this file will be retained as long as any aspect of the application still uses it.

Automatic De-Duplication

Because file content is stored in files with the hash of the content, automatic file-level de-duplication occurs. When a file is pushed to FileStruct that already exists, there is no need to write it again.

This carries the distinct benifit of being able to use the same FileStruct database across multiple projects if desired, because the content of file Data/da/39/da39a3ee5e6b4b0d3255bfef95601890afd80709 is always the same, regardless of the application that placed it there.

Note: In the event that multiple instances or applications use the same database, the garbage collection routine MUST take all references to a given hash into account, across all applications that use the database. Otherwise, it would be easy to delete data that should be retained.

nginx + apache + mod_wsgi + python: how to make dynamic pages expire

When writing dynamic web applications, we use nginx as a front-end web server and apache+mod_wsgi as an application server.

It is the job of nginx to:

  1. Handle SSL, and domain-level rewriting/redirects
  2. Handle static content (.jpeg, .png, .css, .js, .txt, .ico, .pdf, etc….)
  3. Handle dynamic downloads through X-Accel-Redirect
  4. Proxy other requests to apache
  5. Set the proper cache-control and expires headers on content

Ever run into the situation where you click log out, and then click the back button, and are still able to see the pages!  That is bad.   They are dynamic pages anyway, and should not be cached.

However, images, etc… SHOULD be cached. It is important that any references to images have a way to invalidate the cache. We append a number as a query string:


This number is updated from time to time (via Python variable) when we need to invalidate the cache.

Anyway, here are some helpful nginx configuration directives.

# Send static requests directly back to the client
location ~ \.(gif|jpg|png|ico|xml|html|css|js|txt|pdf)$
    root  /path/to/document/root;
    expires max;

# Send the rest to apache
location /
    add_header Cache-Control 'no-cache, no-store, max-age=0, must-revalidate';
    add_header Expires 'Thu, 01 Jan 1970 00:00:01 GMT';

Why you should consider using the IUS Community Project


“The IUS Community Project is aimed at providing up to date and regularly maintained RPM packages for the latest upstream versions of PHP, Python, MySQL and other common software specifically for Redhat Enterprise Linux. IUS can be thought of as a better way to upgrade RHEL, when you need to.”

Our Perspective at AppCove

Imagine being able to combine the rock-solid stability of RedHat Enterprise Linux (or Oracle, Centos, Scientific) with the latest versions of popular software packages like PHP, Python, MySQL, mod_wsgi, redis, and others? The IUS Community Project is the answer.

Enterprise Linux is great for the stability, security, and compatibility. But sometimes you need a newer version of an installed package, like Python. At the time of this writing, RedHat is still not providing any standard way to obtain Python 3.2, MySQL 5.5, or PHP 5.4, years after they have been released.

The IUS Community project has provided AppCove, Inc. and all of our clients the perfect mix of stability and functionality. IUS has enabled us to focus on our core competencies (software development) while being confident that the packages we use are as secure and up-to-date as possible.

Our confidence in the IUS team is second to none. AppCove has worked in close conjunction with the IUS team on several occasions, and they have always been impeccably experienced, knowledgeable, and professional.

We highly recommend that any users of RedHat Enterprise Linux, Oracle Enterprise Linux, Scientific Linux, or Centos Linux take a close look at the IUS Community Project for their servers.

A brief introduction to AppStruct

Have been very busy at work lately.  We made the decision about a month ago to switch (most|all) new projects over to use Python 3 with Apache, mod_wsgi, and AppStruct.  You may know what the first 3 are, but the 4th??

Special thanks goes to Graham Dumpleton behind mod_wsgi, and James William Pye behind Python>>Postgresql.   They are not involved or affiliated with AppCove or AppStruct (aside from great mailing list support) BUT if it were not for them, this framework would not exist.

AppStruct is a component of Shank, a meta-framework.  A stand-alone component in it’s own right, it represents the AppCove approach to web-application development.  Most of it is in planning, but the parts that have materialized are really, really cool.

Briefly, I’ll cover the two emerging areas of interest:


This is a very pythonic (in my opinion) web application framework targeted toward Python 3.1 (a challenge in itself at this point).  We really wanted to base new development on Python 3.1+, as well as PostgreSQL using the excellent Python3/PostgreSQL library at  However, none of the popular frameworks that I am aware of support Python 3, and most (if not all) of them have a lot of baggage I do not want to bring to the party.

Werkzeug was the most promising, but alas, I could not find Python 3 support for it either.  In fact, I intend to utilize a good bit of code from Werkzeug in finishing off AppStruct.WSGI.  (Don’t you just love good OSS licences?  AppStruct will be released under one of those also).

HTTP is just not that complicated.  It dosen’t need bundled up into n layers of indecipherable framework upon framework layers.   It doesn’t need abstracted to death.  I just needs to be streamlined a bit (with regard to request routing, reading headers, etc…).

Python is an amazing OO language.  It’s object model (or data model) is one of (if not the) most well conceived of any similar language, ever.  I want to use that to our advantage…

Inheritance, including multiple inheritance, has very simple rules in Python.  Want to use that as well.

Wish to provide developers with access to the low level guts they need for 2% of the requests, but not make them do extra work for the 98% of requests.

Speed is of the essence.  Servers are not cheap, and if you can increase your throughput by 5x, then that’s a lot less servers you need to pay for.

So, how does it work?

Well, those details can wait for another post.  But at this point the library is < 1000 lines of code, and does a lot of interesting things.

  • A fully compliant WSGI application object
  • 1 to 1 request routing to Python packages/modules/classes
  • All request classes derived from AppStruct.WSGI.Request

The application maps requests like /some/path/here to objects, like  If there is a trailing slash, then the application assumes that the class is named Index.  The object that is found is verified to be a subclass of AppStruct.WSGI.Request, and then…

Wait!  What about security?

Yes, yes, very important.  A couple things to point out.  First, the URLs are passed through a regular expression that ensures that they adhere to only the characters that may be used in valid python identifiers (not starting with _), delimited by “/”.  Second, the import and attribute lookup verify that any object (eg class) found is a subclass of the right thing.  And so on and so forth…

But you may say “wait, what about my/fancy-shmancy/urls/that-i-am-used-to-seeing?  Ever hear of mod_rewrite?  Yep.  Not trying to re-invent the wheel.  Use apache for what it was made for, not just a dumb request handler.

What about these request objects?

They are quite straightforward.  There are several attributes which represent request data, the environment, query string variables, post variables, and more.  There is a .Response attribute which maps to a very lightweight response object.

Speaking of it — it has only 4 attributes: [Status, Header, Iterator, Length].  As you see, it’s pretty low-level-wsgi-stuff.  But the developer would rarely interact with it, other than to call a method like .Response.Redirect(‘;, 302)

Once the application finds the class responsible for handling the URL, it simply does this (sans exception catching code):

RequestObject = Request(...)
with RequestObject:
return RequestObject.Response

Wow, that’s simple.  Let me point out one more detail.  The default implementation of Data() does this:

def Data(self):
   self.Response.Iterator = [self.Text()]
   self.Response.Length = len(self.Response.Iterator[0])

So the only thing really required of this Request class is to override Text() and return some text?  Yep, that simple.

But typically, you would at some point mixin a template class that would do something like this:

class GoodLookingLayout:
   def Text(self):
      return ( 
         """<html><head>...</head><body><menu>""" +
         self.Menu() +
         """</menu><div>""" +
         self.Body() +

And then it would be up to the developer to override Menu and Body (each returning the appropriate content for the page).

Ohh, you may say.  What about templating engine X?  Well, it didn’t support Python 3, and I probabally didn’t want it anyway (for 9 reasons)…  If it’s really, really good and fits this structure, drop me a line, please.

What about that Code() method?

Yeah, that’s the place that any “logic” of UI interaction should go.  I’m not advocating mixing logic and content here, but you could do that if you wanted.  You will find in our code, a seperate package for application business logic and data access that the request classes will call upon.  But again, if you are writing a one page wonder, why go to all the trouble?

The only requirement for the Code() method is that it calls super().Code() at the top.  Since the idea is that the class .Foo.Bar.Baz.Index will inherit from the class .Foo.Bar.Index, this gives you a very flexible point of creating .htaccess style initialization/access-control code in one place.  So in /Admin/Index, you could put a bit of code in Code() which ensures the user is logged in.  This code will be run by all sub-pages, therefore ensuring that access control is maintained.  Relative imports are important for this task.

from .. import Index as PARENT
from AppStruct.WSGI.Util import *
class Index(PARENT):
   def Code(self):
      self.Name = self.Post.FirstName + " " + self.Post.LastName
      # some other init stuff
   def Body(self):
      return "Hello there " + HS(self.Name) + "!"

Summary of AppStruct.WSGI…

To get up and running with a web-app using this library:

  1. A couple mod_wsgi lines in httpd.conf
  2. A .wsgi file that has 1 import and 1 line of code
  3. A request class in a package that matches an expected URI ( ==  /admin/foo)


With no database calls, it’s pushing over 2,000 requests per second on a dev server.  With a couple PostgreSQL calls, it is pushing out ~ 800 per second.


This is not nearly as deep as the WSGI side of AppStruct, but still really cool.  To start off, I’d like to say that James William Pye has created an amazing Postgresql connection library in Python 3.  I mean just amazing.  In fact, almost so amazing that I didn’t want to change it (but then again…)

What we did here was subclass the Connection, PreparedStatement, and Row classes.  (well, actually replaced the Row class).

Once these were subclassed, we simply added a couple useful features.  Keep in mind that all of the great functionality of the underlying library is retained.


Simply a dictionary of {SQL: PreparedStatementObject} that resides on the connection object.  When you directly or indirectly invoke CachePrepare, it says “if this SQL is in the cache, return the associated object.  Otherwise, prepare it, cache it, and return it”.  This approach really simplifies the storing of prepared statement objects in a multi-threaded environment, where there is really no good place to store them (and be thread safe, connection-failure safe, etc…)

Connection.Value(SQL, *args, **kwargs)
Connection.Row(SQL, *args, **kwargs)

These simple functions take (SQL, *args, **kwargs) and return either a single value or a single row.  They will raise an exception is != 1 row is found.  They make use of CachePrepare(), so you get the performance benefits of prepared statements without the hassle.  More on *args, **kwargs later under PrePrepare()

Connection.ValueList(SQL, *args, **kwargs)
Connection.RowList(SQL, *args, **kwargs)

Same as above, except returns an iterator (or list, not sure) of  zero or more values or rows.

Row class

Simply a dictionary that also supports attribute style access of items.  After evaluating the Tuple that behaves like a mapping, I decided for our needs, a simple dict would be a better representation of a row.

Connection.PrePreprepare(SQL, args, kwargs) -> (SQL, args)

Ok, so what’s the big deal?  Well, the big deal is that we don’t like positional parameters.  They are confusing to write, read, analyze, and see what the heck is going on when you get about 30 fields in an insert statement.  Feel free to argue, but maybe I’m not as young as I once was.

We do like keyword parameters.  But postgresql prepared statement api uses numeric positional parameters.  Not changing that…

The most simple use case is to pass SQL and keyword arguments.

"SELECT a,b,c FROM table WHERE id = $id AND name = $name"
dict(id=100, name='joe')

It returns

"SELECT a,b,c FROM table WHERE id = $1 AND name = $2"
(100, "joe")

Which is suitable for passing directly into the Connection.prepare() method that came with the underlying library.

I don’t know about you, but we find this to be very, very useful.

If you pass tuples of (field, value) as positional arguments, then they will replace [Field][Value], and [Field=Value] (in the SQL) with lists of fields, lists of values (eg $1, $2), or lists of field=values (eg name=$1, age=$2).  That really takes the verbosity out of INSERT and UPDATE statements with long field lists.


This is just in the early stages, and has a good deal of polishing to be done (especially on the WSGI side).  My purpose here was to introduce you to what you can expect to get with AppStruct, some of the rationale behind it, and that it’s really not that hard to take the bull by the horns and make software do what you want it to (especially if you use python).

Feel free to comment.

Python 3.1 and mod_wsgi performance notes

We’re researching the use of Python and mod_wsgi running under apache for developing some extensive web applications.  Here are some notes on a performance test that we recently ran.

Python 3.1.1
mod_wsgi 3.0c5
apache 2.2
RHEL 5.3
quad core xenon
8 GB ram

Development system – not in production use.


1 import time
3 def application(environ, start_response):
4     status = ‘200 OK’
6     output = “hello world!”
8     #time.sleep(1)
10     response_headers = [
11         (‘Content-type’, ‘text/plain’),
12         (‘Content-Length’, str(len(output))),
13         ]
15     start_response(status, response_headers)
17     return [output]

Apache Configuration:

WSGISocketPrefix run/wsgi
<VirtualHost *>
DocumentRoot /home/jason/Code/ShankProject/Web
WSGIScriptAlias /Admin /home/jason/Code/ShankProject/WSGI/
WSGIDaemonProcess threads=15


# Baseline with one process and 15 threads
# 15 threads total

no process definition

WITHOUT time.sleep(1)
concurrency = 1  >> 1800 / second
concurrency = 100 >> 3900 / second

WITH time.sleep(1)
concurrency = 1  >> 1 / second
concurrency = 100  >> 14 / second

# Get a marginal improvement by doubling the threads to 30
# 30 threads total

no process definition

WITHOUT time.sleep(1)
concurrency = 1  >> 1680 / second
concurrency = 100 >> 3500 / second

WITH time.sleep(1)
concurrency = 1  >> 1 / second
concurrency = 100  >> 30 / second

# Take processes from 1 to 3
# 90 threads total


WITHOUT time.sleep(1)
concurrency = 1  >> 1770 / second
concurrency = 100 >> 3500 / second

WITH time.sleep(1)
concurrency = 1  >> 1 / second
concurrency = 100  >> 88 / second

# Take processes from 3 to 6
# Take threads from 30 to 15
# 90 threads total


WITHOUT time.sleep(1)
concurrency = 1  >> 1550 / second
concurrency = 100 >> 3300 / second

WITH time.sleep(1)
concurrency = 1  >> 1 / second
concurrency = 100  >> 88 / second


mod_wsgi performance is outstanding.  Even running slower requests, it
can still handle significant concurrency in daemon mode without any
apparent issues.

Is there any information on the balance between more processes less
threads and more threads less processes?


Freeky Bug

Ever have one of those bugs that customers complain about, but you just cannot reproduce it? Here is a good one…

Customers were complaining about being logged out when clicking a download link.

This particular setup is a Cisco CSS 11501 series load balancer with 2 Dell Poweredge web servers sitting behind it.  Each webserver is running apache, as well as an application server (python) which handles authentication and processing for THAT server.

For weeks, I could not reproduce this bug.  So tonight when I finally got bit by it (at home), I was clueless for a while.  The code is so simple.  A simple key lookup in a simple dictionary, yet it just was not making sense.

Here is the story:

A while ago, we were having problems with Internet Explorer downloading content over SSL.  This turns out to be a common problem with IE, so to fix it, I caused the downloads to not use SSL, which is more efficient anyway.

We use a cisco hardware load balancer which balances incoming requests to different backend servers.  It has a feature called STICKY SOURCE IP, which means that any connections routed from the same IP to the same site will be delivered to the same backend server.  This is nice, because you are always visiting the same server.

So as it turns out, by turning the download SSL off, the load balancer was using another “site” definition to handle the DOWNLOAD request.  STICKY SOURCE IP was out the window, and the request was being passed back to a “random” webserver.

About 50% of the time, users (like me tonight) were tossed to the other server, which knew nothing about the user login. That is why it was complaining about the “WB4_App::$DSEG and/or WB4_App::$AuthToken must be set in order to contact the     applications server.” error message, which is not one that should normally be shown.

To make matters worse, our IP address at work was apparently always using the same server, so I could not reproduce the problem.  I’m lucky that it happened to me at home, or I would still be banging my head against the desk…

Updating a cert on the Cisco 11500 Series Content Services Switches (CSS)

Having recently moved some of our hosting infrastructure to the excellent Rackspace Platform group, we inherited the management of the Cisco 11500 Series Content Services Switches (CSS), which we use for general load balancing + ssl termination.

As a side note, it’s really powerful, fast, and well, plain nice.  Not having to manage SSL certs on each apache instance is really nice, and all the LAN communication is done over plain old HTTP.

This blog post is a regurgitation of some notes I took internally.  Perhaps someone who finds themselves managing this device will benefit…

The task at hand was re-issuing and updating one of our primary wildcard certificates that powers a lot of subdomains.

The first step is to generate the key, csr, and crt…

All these files should be:

  • Named the same as the domain that SSL is being generated for.
  • use WILD for a wildcard subdomain
  • Use this format “”, where 08 is the from year and 10 is the to year
  • (the short version is because of name length limits on the CSS)

Start by generating the key and csr

This should be done in the ciscoftp role under the ~/load directory

# openssl genrsa -out 1024
# openssl req -new -key -out

Then get the certificate issued by (global sign)

Put the certificate into the the ~/load directory.  When done, it should look like:

-rw-rw-r-- 1 ciscoftp ciscoftp  3139 Apr  6 15:59
-rw-rw-r-- 1 ciscoftp ciscoftp   773 Apr  6 15:49
-rw-rw-r-- 1 ciscoftp ciscoftp   883 Apr  6 15:47

Put the crt and key onto the load balancer

To do this, use the “copy command” on the load balancer

20132-201292# copy ssl ftp base import PEM "rack"
20132-201292# copy ssl ftp base import PEM "rack"

Then make the associations...

20132-201292# config
20132-201292(config)# ssl associate cert 
20132-201292(config)# ssl associate cert

Now, it’s time to install it.  Requires SSL downtime!

  1. Suspend the SSL content rule
  2. Suspend the SSL service
  3. Suspend the SSL proxy list
  4. Run the updates
  5. Activate the SSL proxy list
  6. Activate the SSL service
  7. Activate the SSL content rule

Here are the exact commands:

20132-201292# config
20132-201292(config)# owner
20132-201292(config-owner[])# content
20132-201292(config-owner-content[])# suspend

20132-201292# config
20132-201292(config)# service ssl-service
20132-201292(config-service[ssl-service])# suspend

20132-201292# config
20132-201292(config)# ssl-proxy-list ssl-proxy

In the following commands, we remove the whole ssl-server so that it shows up at the bottom in one concise unit. Otherwise, the startup-config and running-config become fragmented.

20132-201292(config-ssl-proxy-list[ssl-proxy])# suspend
20132-201292(config-ssl-proxy-list[ssl-proxy])# no ssl-server 6
20132-201292(config-ssl-proxy-list[ssl-proxy])# ssl-server 6
20132-201292(config-ssl-proxy-list[ssl-proxy])# ssl-server 6 rsakey
20132-201292(config-ssl-proxy-list[ssl-proxy])# ssl-server 6 rsacert
20132-201292(config-ssl-proxy-list[ssl-proxy])# ssl-server 6 vip address
20132-201292(config-ssl-proxy-list[ssl-proxy])# ssl-server 6 cipher rsa-with-rc4-128-sha 81
20132-201292(config-ssl-proxy-list[ssl-proxy])# active

20132-201292# config
20132-201292(config)# service ssl-service
20132-201292(config-service[ssl-service])# active

20132-201292# config
20132-201292(config)# owner
20132-201292(config-owner[])# content
20132-201292(config-owner-content[])# active

Test test test.  Firefox, IE, Chrome...

20132-201292# copy running-config ftp base running-config

Review changes with git diff

20132-201292# write memory

20132-201292# copy startup-config ftp base startup-config

And… Here is the git diff

diff --git a/load/startup-config b/load/startup-config
index 7042490..36fbbaa 100644
--- a/load/startup-config
+++ b/load/startup-config
@@ -1,4 +1,4 @@
-!Generated on 04/06/2009 16:05:48
+!Generated on 04/06/2009 21:51:02
!Active version: sg0810205

@@ -64,6 +64,8 @@ configure
+  ssl associate rsakey
+  ssl associate cert

!*********************** SSL PROXY LIST ***********************
ssl-proxy-list ssl-proxy
-  ssl-server 6
-  ssl-server 6 rsakey
-  ssl-server 6 rsacert
-  ssl-server 6 vip address
-  ssl-server 6 cipher rsa-with-rc4-128-sha 81
@@ -146,6 +141,11 @@ ssl-proxy-list ssl-proxy
+  ssl-server 6
+  ssl-server 6 rsakey
+  ssl-server 6 rsacert
+  ssl-server 6 vip address
+  ssl-server 6 cipher rsa-with-rc4-128-sha 81

I highly recommend yum + createrepo + rpmbuild

As I was discussing lightly before, I have recently been involved in building quite a few RPMs for our server clusters at AppCove.

Where we have arrived:

Our (new) primary production cluster consists of multiple RedHat Enterprise Linux 5 boxes in different capacities (webserver, appserver, database master, database slave, etc…).

Each machine is registered with 3 yum repositories:

  1. RHEL (RedHat Enterprise Linux)
  2. EPEL (Extra Packages for Enterprise Linux)
  3. ACN (AppCove Network)

All of our custom software packages and custom builds of open source software are placed into individual RPMs, and entered into our ACN repository.

From there, it is a snap to update any given server with the correct version of the software that server needs.

We have a dedicated build area, versioned with git, that is used to build and package all of the custom software that is needed.

(note, RPMs are not used for web application deployment — rsync via ssh is used for that)


Having worked through the process from start to finish, I must say that I would highly recommend the following tools to anyone who is responsible for RedHat Enterprise, Centos, or Fedora system administration.

  • git – to keep your .spec files versioned
  • rpmbuild – to build the rpms
  • createrepo – to create your very own yum repository
  • apache – to serve the yum repository
  • yum – to obtain, install, and upgrade your rpms

Additionally, if you are using RedHat Enterprise or Centos, I would highly recommend using Extra Packages for Enterprise Linux (EPEL) to get a few of those “other” packages that don’t come with your OS (git, for example).

Learning how to build RPMs was a fairly steep curve.  But it wasn’t long.  It is one of those things that if you know it you say “that’s easy” and if you don’t you say “what the ???

yum+rpm was invented (I assume) to make life easier for countless system administrators and software publishers.  So it’s not the kind of thing that everyone is involved in.

I was a bit tough to figure out the caveats of how to correctly build RPM’s that work.  The documentation is a bit sparse.  A bit here and a bit there.

What are the benefits?

Many.  Let me list a few.

Your system stays really clean. With RPMs, you can uninstall everything you installed without leaving extra files laying around.

Upgrades are a snap. Once you have registered your own yum repository on a system, you can upgrade a given package by running:

yum upgrade your-package

All your systems can be on the same “page”. It is very easy, using yum, to ensure that all of your systems are using the exact same version of software.

Custom builds are super easy to maintain. We custom-compile php, python, and various other software.  Once the .spec files are in place, all of your software can be re-packaged with a single command.

In our specific case, we wanted to have the memcached client statically compiled into PHP.  With a few extra commands in the .spec file, it was a snap to pull in the source from pecl, and update `configure` to take it into account.

All builds can take place in one place. With one set of documentation, one consistent set of development tools, etc…  We have a user called `build` on one of the hosts that is specifically used for building all of the RPMs.

Where to learn?

The best way to learn, as usual, is to jump in and figure it out.   There is some really good documentation buried in the site.   It is a book called Maximum RPM, origninally published by redhat.  The current snapshot of the book is available online.

Google is another good resource, depending on what it is you are looking for.

Installing Source RPMs to your home directory

I’ve been involved in an ongoing project to build RPMs for all of the “custom” software installs we use on RedHat Enterprise Linux 5 (RHEL5) at AppCove.

By default (on RHEL), source RPMs are installed to /usr/src/redhat. This is nice, except that I don’t want to be running as root when building software.

rpm -i --relocate /usr/src/redhat=/home/build/RPMBUILD setuptools-0.6c9-1.src.rpm

The previous command will install the specified source rpm to a local directory under the “build” user.  That makes it easy to tweak the .spec file, and then build the desired RPM.