The upcoming Amazon’s Android tablet will eclipse the iPhone App Store.

August 27, 2011

This holiday season Amazon will change the mobile industry in a major way. I don’t have inside info, but aggregating the public knowledge produces a conclusive prediction: a $200 Android tablet with a book reader, an app store, and free 3g+ wireless internet. A “Super-Kindle” if I may call it that. Of course the internet is not going to be entirely free, rather it will be subsidized by the content providers and the app developers.

Already you can buy a book on the current Kindle device and have it delivered at no extra cost, and with no prior data contract required, because Amazon has negotiated the connectivity contract with cell carriers on your behalf and bundled the amortized connectivity cost into the price of the book. The difference with a smart phone data plan is subtle but psychologically very important – with Kindle a user only has to pay when and if he obtains certain tangible value in return, a book he liked enough to shell out the money; while with a smartphone data plan a user pays for a data plan today, and he gets some value out of it sometime next week, or not at all. There is a huge number of people who are unable to overcome this “pay now, get something later or never” gap, and most everyone else overcomes it with hesitation and resentment.

Now, there is no reason not to apply the same model to the mobile apps. The user will pay nothing upfront for the internet connection, but he will continue to pay his yearly dues for the “Remember the Milk” to-do app. The app developer will in turn pay for his share of user’s traffic to Amazon, who will pass on the money to the mobile carriers. That’s the pricing model required to get the remaining half of the US population on the mobile device bandwagon, and Amazon will deliver it.

Supporting evidence:

  1. The rumors about Amazon shipping their own Android tablet are widely circulated, including the FCC leaks and the Jeff Bezos’ famous “Stay tuned” comment.
  2. Kindle (current models) SDK is in Beta that it’s not likely to ever come out of, but the monetization thinking is entirely evident: http://www.amazon.com/kdk
  3. Amazon has their own Android app store. Why did they bother with making their own? Well, now it’s clear why they absolutely need one.
  4. Amazon Web Services has huge experience with making pay-as-you-go computing services. If anyone can do this, it’s them.
  5. Amazon already has billing relationship with million of Kindle users and many more regular shoppers. They don’t need to do a lot of convincing to get people started. And it’s one of the best audiences in the world – people who like paying for things. Similar to Apple users, and unlike Google users.
  6. Amazon strives on low margins, and they can also use content/app subsidy to drive down the hardware prices to the levels Apple can never reach due to their always high profit margins.

The described above seems to be a very likely outcome, and it will have the most reaching consequences:

  1. The users will be much more likely to adopt a mobile device, given the low entry cost and no upfront ongoing commitment.
  2. The developers will work had to reduce bandwidth usage (incidentally increasing speed), and thus the end-user cost. Current model promotes bandwidth waste as no individual developer is ever held accountable for his data usage.
  3. Amazon has enough clout and a direct incentive to negotiate the best rates with the carriers. This sort of “collective bargaining on steroids” will drive down the rates.
  4. Amazon, not the carriers, will own the data pipe billing relationship with the end users. Apple wrought billing control of mobile app purchases form the carriers’ hands, and it did a much better job of it; and now Amazon will do the same with the data pipe.
  5. The developers of connected apps will have a perfect excuse to bill users recurringly, thus have enough cash and incentive to actually support and update their apps. The users in turn will become accustomed to such recurring app pricing schemes, same as they became accustomed to make one-time purchases on the iPhone, which will then drive up the conversion rates. Note that recurring billing does not necessitate commitment – the user would only pay for the app if he’s using it.
  6. Consequently, the users will enjoy lasting, well-maintained apps.

This coming change is so profound, I would say it is as important to the mobile computing world as the emergence of the original App Store on the iPhone, if not more so.

And when the apps get taken care of, there is no reason not to try extending the same model to web browsing – a web site operator who wants to be seen on Super-Kindle can either charge the end user a quarter to read the article with the help of Amazon’s payment API, kick back some advertising revenue, or find some other way to pay for the connection. Unlike the app situation, the web sites idea is much less certain, but the potential is so huge it would be “Super Stupid” not to try.

The so-called economic recovery described in a single graph.

August 4, 2011

The “Baltic Dry” index reflects cost of leasing a large cargo ship. When economy does well everyone wants to ship cargo, demand is high, ships are expensive. When economy does poorly only few people want to rent a ship, the price falls. The chart below demonstrates everything that’s wrong with this economic recovery. Simply put, there isn’t one. The goods, they just aren’t moving.

Economic history of the last 5 years

Maze62: a dense and speedy alphanumeric encoding for binary data

May 16, 2011

Abstract

There is a need to encode binary data into a text format, such that it can be used without further encoding where only a subset of ASCII text is normally used: identifiers, URL parameters, HTTP form content, HTTP cookie values, and so on.

There is number of existing solutions to this problem, but I believe there is room for improvement to all of them, and I propose a new encoding which I dub “Maze62″.

Existing solutions

Base64

Every three bytes (24 bits) of the input get regrouped into four 6-bit numbers, and each such number gets represented with one of the 62 alphanumerics and two other characters. Read more: http://en.wikipedia.org/wiki/Base64

Base16 (BinHex)

Each byte of the input is split into two 4-bit numbers, and each such numbers gets represented by one of the alphanuerics, typically constrained to the set of the hex digits [0-9a-f].

Base32

Every five bytes (40 bits) of input get split into a  group of eight 5-bit numbers, and each such number is encoded with an alphnumeric.

Base10

The input byte array gets represented as a a very large integer number, which is printed in decimal notation.

Base62

The input byte array gets represented as a a very large integer number, which is then converted to base-62 representation, similar to Base10 just with a different base. Entire set alphanumerics is used to represent the “digits”.

Shortcomings of existing solutions

While base64 is the usual choice, there are often problems with the two non-alphanumeric characters. What’s worse, when encoding algorithm is chosen, it is not known which symbols will pose problems in any of the future places where the data is to be used. For example, a forward slash is difficult to use in a URL, and a dash is illegal as a first character in a DNS name. Ideally, this mess would be excluded from consideration, which can only be achieved by sticking to the 62 alphanumerics.

An ideal solution therefore is one that only uses alphanumerics, is as dense as possible, and allows for speedy processing. None of the aforementioned solutions satisfy that criteria. In particular:

Base64 is dense with only 33% explosion in size, but is using more than the 62 alphanumerics and thus is prone to mangling.

Base16 is easy to code, but 100% explosion in size make it undesirable.

Base32 is still a 60% explosion. Absent any better ideas this would be my weapon of choice.

Base62 is almost as dense as Base64, and is neatly contained in its character set, but unlike other discussed solutions the cost of encoding is quadratic to the size of the input, which is clearly unacceptable.

Proposed solution

If you would like to try to solve the problem yourself, read no further. Otherwise, the algorithm below defines Maze62 encoding and presents it for your attention:

The algorithm

  1. The input data is represented as a stream of bits.
  2. Six bits are read from the input stream into current value CV. CV is therefore 63 or less.
  3. CV is divided by 62, and the remainder of the division, a value that is certain to be less than 62, is represented with alphanumeric and is written into the output stream of characters
  4. In theory, we must also store the quotient, which is either one or zero. However there is an important fact to be noted  – whenever the remainder of the division by 62 is larger than one, the quotient cannot be anything but zero. Given that the remainder is already emitted into the output stream, emitting the quotient would result into no new information being added, unless said remainder is either zero or one. Only in the latter case we also need to preserve and emit the quotient, which is also limited to single bit of data.
  5. Therefore, if the remainder is more than one, go to step 2
  6. If remainder is zero or one, memorize the one bit which represents the quotient
  7. Read five bits from the input stream into CV.
  8. Shift CV by one bit to the left, and insert the quotient from step 6 as the least significant bit.
  9. Go to step 3

The algorithm terminates after no more data can be read from the input and last piece of data is written out, including the potential stray quotient.

Analysis

The proposed encoding algorithm is linear in speed, nearly as dense as Base64, and fully constrained to alphanumeric set of characters. The only downside is a somewhat tricky implementation, but it’s not much worse than Base64. As I refine my python implementation of the algorithm I will post results on the comparative encoding efficiency.

The name

The name of the encoding is an amalgamation of “base62″ and the word amazing, hence “maze62″.

What’s next.

I will be publishing the python source code I have written to test this idea in the coming days. If there is interest, we could organize a public repository of Maze62 implementations in other languages, as well as a set of acceptance tests for independent implementations.

Discussion

In addition to the comments below there’s a discussion going on on hacker news: http://news.ycombinator.com/item?id=2553753

Further reading

In particular the reference to base85 encoding may prove useful: http://en.wikipedia.org/wiki/Base85

How to get started with creating your own software business.

April 5, 2011

By now I can say I have successfully bootstrapped my own software business, starting with some very vague ideas in 2005. I’m not out of the woods yet, but I have made enough progress to gain confidence in my choice – the most precious of all commodities. If you want to achieve the same but substantially faster than me, I recommend these four things:

  1. Read a book by someone who has done it before many times.
  2. Come listen to him talk, and ask questions.
  3. Listen to a guy who interviewed hundreds of people who became exactly that – creators of their own self-funded software businesses.
  4. Hang out with like-minded people, people who created or are still creating their software business.

That, and a lot of hard work, is all it takes to succeed. But hard work is nothing when you know you’re doing the right thing.

More specifically:

  1. Buy this book: http://www.startupbook.net/
  2. Attend this conference in the beginning of June in Las Vegas: http://www.microconf.com/ Tickets go on sale tomorrow April 6th at 9am PST, and the price will go up in a few days, if not hours. EDIT: tickets are now on sale: http://microconf.eventbrite.com/
  3. Attend this conference in in early May in Vancouver, BC: http://www.agilevancouver.ca/lean-startup-conference/conference-agenda/meet-rob-walling/

At the second conference you will get a chance to talk with the book author.

At the first conference you will hear the same book author, who will give you depth, my self-funded start-up hero role model Patrick McKenzie, who will give you inspiration, and Andrew Warner who interviewed hundreds of startup founders, and who will give you breadth. You will also meet a lot of like-minded people who will give you valuable peer support and a confidence boost.

This is not a paid endorsement, I write this out of sheer enthusiasm. I wish I could have paid someone money for all of this back in 2005, it would have saved me years of my life. Don’t let yours go by.

If you already know me personally, I can give you a ride to Vancouver. If you don’t, but would like to get to know me, come to either conference and I’ll be happy to get acquainted.

Making sense of Amazon EC2 reserved instance savings.

November 3, 2010

Amazon EC2 allows you to prepay a reservation fee in exchange for much lower hourly instance rates. Savings abound.

However, there’s a little problem – if you decide to upgrade to a beefier instance, change the availability zone, or leave AWS altogether, the fee is non-refundable, and you can’t resell your reservation to someone else. So should you pre-pay or not?

I have a simple answer for you: if you’re likely to be using your Linux box for about 8.5 months or more, you should prepay for 3 years, because that’s how many months it takes for the cost of the reservation to amortize through the lower hourly rates. The same threshold for a Windows box is about 7 months.

The whole breakdown can be found in this Google Spreadsheet, which also has other data such as TCO and the 1-year reservation math.

I found that “months-to-breakeven” is a number that’s much easier to grasp than the accurate but rather dry pricing data on Amazon’s pricing page. Hopefully Amazon will add it there, or at least provide their prices as a public dataset so that I don’t need to copy-paste the relevant data together.

The French Bakery hours and locations

October 16, 2010

The French Bakery web site isn’t much help, so I’ll post their hours and locations below in case I need them again.

This custom Google Map gives the most accurate locations, but the addresses below have the advantage of opening native Maps on the iPhone.

Last updated April 4th 2011. Unfortunately, they keep changing their hours.

Downtown Kirkland

219 Kirkland Ave, #102, Kirkland, WA 98033
(425) 898-4510
Mon-Fri 7am-5pm
Sat 7:30am-4pm
Sun 8:30am-4pm

Downtown Bellevue

909 112th Ave NE, #106, Bellevue, WA 98004
(425) 590-9640
Mon-Fri 6am-5pm
Sat 7:30am-4pm
Sun 8:30am-4pm
The menu (crêpes, paninis etc) officially opens at 11am, although the chef might show up as early as 10am.

Crossroads Bellevue

15600 NE 8th St., #K4 Bellevue, WA 98004
(425) 747-0557
Mon-Sat 7am-7pm
Sun 8am-6pm

(technical) A practical way of encrypting backups

September 27, 2010

TL/DR.

Openssl (a payment-free, GUI-free, setup-free tool) can be used to perform asymmetric encryption on large files without hassle. Rather unexpectedly, one needs to use the SMIME command with DEM encoding to achieve the desired result.

Problem.

A sysadmin wants to perform his duty[1] to encrypt the off-site backups in his care, and he wishes for a solution which is reliable and easy to use. In particular:

  1. Resilient against hackers.
  2. Resilient against accidental leaks and human errors, even if leaks are partial. People make mistakes.
  3. Resilient against key loss, which equates data loss and obviously defeats the purpose of doing the backup.
  4. No messy software to install on the server. Ideally, that would be just an extra line in the backup script and maybe a binary which can be simply copied during deployment.
  5. No configuration hassles on per-server basis. Obviously, no GUIs to click through for each machine. Ideally, no configuration at all, so scripts can be simply copied directly form the source control to a bare machine.
  6. It shouldn’t cost and an arm and a leg.
  7. The data formats should not become obsolete and unsupported.

The most simple, obvious, and wrong solution.

Symmetric encryption is easy to use – the user comes up with the “symmetric key”, generally a text pass-phrase[2], and that key is then used to both encrypt the data during backups and decrypt it during restore. Popular free tools as 7-Zip or Openssl make it easy to encrypt backups, and then to decrypt them back (optionally compressing them as well):

7za a myfile.7z myfile.bkp -pPassPhrase
7za e myfile.7z -pPassPhrase

openssl enc -aes-256-cbc -salt -in myfile.bkp -out myfile.bkp.enc -pass pass:PassPhrase
openssl enc -d -aes-256-cbc -in myfile.bkp.enc -out myfile.bkp.unenc -pass pass:PassPhrase

And yet it’s the wrong approach. The problem with symmetric encryption is in its very definition – the same key that is used to encrypt can be used to decrypt.

  • If the key is stored in the source of the script, a source code leak will automatically compromise all backups taken previously.
  • If the key is not stored in the source of the script it has to be provisioned when the server is being setup, hence no simple copy-deployment is possible. Either a manual intervention is required or a separate key-management procedure must be set up and followed diligently.
  • If a machine that is doing periodic backups of itself is compromised, not only that machine’s current data set is compromised, which is generally inevitable[3], but many other backups are compromised too – past backups of the data assumed to be long gone made on the same machine, and all backups made on other machines with the same key.
  • To partially mitigate the previous problem different keys could be used for different servers and during different time periods, but that will lead to explosion in the number of keys which need to be remembered and correctly applied in order to restore the backups. A key-management database will become inevitable at this point.

The right solution.

Asymmetric encryption is a much better alternative for the backup encryption problem – the encryption key is split in two parts, and one part can only be used to encrypt the data, while the other can also decrypt it. The latter is called “private key”, while the former is known as “public key” – there is no risk in exposing it, and so one doesn’t need to worry about any of the problems mentioned in the previous chapter. One would normally be restoring the backups several orders of magnitude less frequently than creating them, and so separating keys needed to encrypt and decrypt the data offers substantial improvement in security and manageability.

The easiest way this author found to implement such solution in practice is to use a free and open-source tool called Openssl. Many systems already have it installed, and if not, the binary can be copied without any installation. [4]

The first step is to generate the private and public keys [5]. This only needs to be done once.

openssl req -x509 -nodes -days 100000 -newkey rsa:2048 -keyout MyCompanyBackupsPRIVATE.pem -out MyCompanyBackupsPublicCert.pem -subj ‘/’

The private key is placed into the file MyCompanyBackupsPRIVATE.pem, and the public key is in MyCompanyBackupsPublicCert.pem. The two keys can now be used to encrypt the valuable data:

openssl smime -encrypt -aes256 -in ImportantFile.bkp -binary -outform DEM -out ImportantFile.bkp.openssl_smime_dem MyCompanyBackupsPublicCert.pem

and to decrypt it:

openssl smime -decrypt -in ImportantFile.bkp.openssl_smime_dem -binary -inform DEM -inkey MyCompanyBackupsPRIVATE.pem -out ImportantFile.bkp.decrypted

That is all there is to know – the three simple commands, and the fact that the private key should be stored in a safe place until a backup needs to be restored.

Closing remarks.

A script implementing such scheme would do well to link back to this page for the benefit of the subsequent mantinainers.

It is possible to specify more than one private key in the encryption command and thus encrypt the data with several keys using very little extra space. This would be practical if one key is used for regular every day operations and is replaced regularly, while another “failsafe” key is kept by the company executives in case of emergency and is never changed.

Downloading: http://www.google.com/search?q=openssl+download

Coming up next: the simplest script to take a set of files, compress them, slice, encrypt, and upload to Amazon S3 for safekeeping.

Footnotes.

[1] Why encrypt? Data is like DNA – once it spills, the residue is nearly impossible to erase, and so it’s best to keep both neatly contained in the first place. “Secure erase” becomes a fiction when remapped sectors, untrained interns, drifting organizational responsibilities, and cost-efficient third-party cloud storage providers enter the picture. Maintaining privacy of a comparatively small set of keys is a lot easier than doing the same for a large set of backup data.

[2] Actual encryption is performed using a binary key, but most tools provide a way to generated one from an easy to remember text string, a “pass-phrase”, so for this discussion we can assume the two to be equivalent. It needs to be noted however, that there is more than one way to translate a text pass-phrase into a binary key, so it is important to use the same tool to decrypt the data that was used to encrypt it.

[3] Leaving aside the TPM-based solutions, which don’t seem to have gained much traction.

[4] Unlike, say, GPG which requires installation and a somewhat involved key import/export procedure.

[5] Technically, this command is generating a private key and a certificate that has the public key inside of it. The distinction, however, is immaterial in this case. Strangely, SMIME command is the easiest way to get asymmetric encryption using Openssl, and it requires a certificate.

So, why did I leave my job at Microsoft?

May 1, 2010

It’s been 6 months since I left my job as a developer at Microsoft (SQL Server) and I would like to finally make good on my promise and explain why I left, what I do now and how it’s working out. Here’s the part one.

There were both fundamental causes and triggers for my decision to leave. The immediate trigger was stress – I was simply miserable under the workload that I have willingly brought upon myself. While the stress could have been “managed” I have instead used it to reassess where I’m going in life.

The reasons to leave.

What did I plan to gain by leaving?

Everyday accomplishments.

SQL Server is a huge product and changing things requires a lot of preparation and research – often weeks or months. Because many different kinds of users need to be served, each new change you want to make has to be vetted against large number of users scenarios. The latter tend to expand as we add features and so we end up with squared explosion in the amount of work that has to be done. This has to get unsustainable at some point.

A smaller product, such as the one I left for, is easier to change. It feels wonderful to be able to get into something, get out of it quickly, and have something to show for it. It also makes me a lot more likely to start things if I know when and how I will finish them, which in turn makes me accomplish more and feel even more confident about taking on other things. A virtuous cycle, if there ever was one.

Actionable feedback.

SQL Server releases new version every 3-5 years. When I write a chunk of code I will know about most bugs in the next week or so, thanks to comprehensive testing, but I will only learn if it was good for the customer a couple of years later. By then the next release is already planned and I can’t go back and change how things are done, and the best option is the next next release. So it’s 5-10 years since I make a design mistake till I can get it fixed. This sort of feedback is not actionable.

When I just joined Microsoft I used not to care about such things at all, but with age I got more interested in whether people find my work worthwhile – and not just my managers, but the people who actually have to use what I created. I believe that shorter feedback cycles are not only possible in the enterprise market, but are necessary for the financial success of the product and for the emotional well-being of people who work on it. Sadly, I felt alone in that belief.

With the product I am working on right now I get feedback daily. I might be irrational, but it feels fantastic and I can’t ignore that. If I want to change something I can ship a beta (it takes me about 10 mouse clicks and one minute to make a release) and get verbal feedback the next day. The mechanical feedback starts pouring in within hours, which is still far too long in my book and I have a plan to get it to under 30 minutes.

Knowing what I know.

A lot of decision at Microsoft are made based on common sense, reason, and experience. In the 1980s that was the best we could do, but times have changed and I have grown to no longer trust common sense, my own or anyone else’s. In the age of the high-speed global networks we can do better than that – we can use actual data and scientific methods. Strong opinions are great, data is better.

While I don’t know everything about how people use my new product, I have the capacity learn anything that piques my interest in a matter of weeks, and often times hours. There is no need to have an opinion anymore.

Keeping what I earn.

Working for someone involves making a deal – you give up what you make and the employer pays you a fixed sum of money each month that you worked. This kind of arrangement is great if you like the stability of the paycheck, but there is a downside – you don’t get to accumulate wealth.

Allow me to explain.

If you wrote a reasonably good book and people are buying it, they will keep doing that for years – it’s called passive income. If you are hired to write a book  you get some amount of money as you write but the moment you stop writing you stop getting paid. The difference is huge – what you get paid by an employer is determined largely by the cost of living, while while what your book earns is determined by how much value your book has brought to the world. This blade cuts both ways – if you make something poorly the employer will shield you form the fallout, at least for a while, but if you make something great your employer will shield you from that, too. Inevitably over the longer stretch of time you get paid less than you contribute and the difference is known as “profit”.

I have grown convinced that the best way to accumulate wealth is to retain meaningful ownership over the fruits of my labor. When I sell my work I should not sell it at cost (wages), but at (the fraction of) the value it brings to the consumer.

The sum of the parts.

In isolation none of the listed reason would have driven me to leave, and in particular all but the last ones could have been remedied working within Microsoft or another company. However, all combined together they make a compelling case for self-employment, the path I have chosen to pursue.

Will I have to eat my own words? Sign up to my blog to find out. I have given myself 18 months from the start to try it out, and I’m already 6 months in.

Hello world!

October 27, 2009

Today on Oct 27th 2009 I am leaving my 11-year job at Microsoft to run my own business. In this blog I will describe the reasons that led me to do this, how I worked up the courage to make the step, what is the business and the progress as I make it.

Grab the popcorn, it will be fun. And follow me on twitter for the byte-sized updates: http://twitter.com/altudov


Follow

Get every new post delivered to your Inbox.