Producing AVRO messages with PHP for Kafka Connect

Apache Kafka has became an obvious choice and industry standard for data streaming. When streaming large amounts of data it’s often reasonable to use AVRO format, which has at least three advantages:

  • it’s one of most size efficient (compared to JSON, protobuf, or parquet); AVRO serialized payload can be 10 times smaller than the JSON equivalent,
  • enforces usage of a schema,
  • works out of the box with Kafka Connect (it’s a requirement if you’d like to use BigQuery sink connector).

Let’s see how to send data to Kafka in AVRO format from PHP producer, so that Kafka Connect can parse it and put data to sink.

Read more

Real-time big data processing with Spark Streaming

Big Data is a trending topic in the IT sector and has been for quite some time. Nowadays vast amounts of data are being produced, especially by web applications, HTTP logs, or Internet of Things devices.

For such volumes, traditional tools like Relational Database Management Systems are no longer suitable. Terabytes or even petabytes are quite common numbers in big data context, which is definitely not the capacity that MySQL, PostgreSQL, or any other database can pick up.

To harness huge amounts of data, Apache Hadoop would generally be the first and natural choice, and it’s probably right, with one assumption: Apache Hadoop is a great tool for batch processing. It proved to be extremely successful for many companies, such as Spotify. Their recommendations, radio, playlist workloads, etc. are suitable for batch processing. However, it has one downside – you need to wait for your turn. It usually takes about one day to process everything, scheduled accordingly and executed in a fail-over manner.

But what if we don’t want or can’t wait?

Read more

Type Hinting is important

One of my favorite PHP interview questions, is: what is Type Hinting and why it’s important? Putting definition in one sentence, Type Hinting is a way to define type of parameter in function signature and it’s a sine qua non to leverage polymorphism. Because of dynamic typing in PHP, parameters don’t need to have type used. Also, by type here, I mean complex types (class, abstract class, interface, array, closure), not primitives like integer or double.

Read more

Immutable value objects in PHP

Value objects are one of building blocks in Domain Driven Design. They represents a value and does not have an identity. That said, two value objects are equal if their values are equal.

Other important feature is that Value Objects are immutable, i.e. they can not be modified after creation. Only valid way to create Value Object is to pass all required informations to constructor (and should be validated somewhere there). No setter methods should take place.

Read more

Software developers care too much about tools

Lately I see perilous situation in software development area. There are plenty of good devs so much bounded to tools. By tools, I mean mostly frameworks. I would like to elaborate a bit about that, but those are my personal opinions and they aren’t here to offend anyone.

First of all, we all need to admit, that quality of modern MVC framework raised a lot, comparing with state of things few years ago. Speaking about PHP – at the time, when I attracted my attention to this language, there were pure wilderness. We did not have any strong framework (unlike Ruby On Rails, which were sine qua non choice for Ruby web development). That caused multiple projects development, some of them are dead now (or should be), some hasn’t got good market adaptation and some of them are industry leaders at the moment (Symfony and Zend).

Read more

Testing in isolation with Symfony2 and WebTestCase

It’s extremely important to have same state of the System Under Test. In most of the cases it will be possible by having same contents in a database for every test. I’ve decribed how to achieve it in Fully isolated tests in Symfony2 blog post about two years ago (btw. it’s most popular post on this blog). It was the time, when PHP’s Traits weren’t that popular.

Read more

Injecting repositories to service in Symfony2

It is generally a good idea to wrap business logic into services. Often, such services methods use doctrine’s repositories to operate on data storage. Injecting whole EntityManager service is a very popular approach, but it isn’t the most elegant way I could think of. EntityManager works only as a factory in that case and could lead to usage of other repositories, which might end up with too many responsibilities of given service.
The better way is to inject single repositories, using a factory-service mechanism, provided by the Dependency Injection Container.f

Read more

Android Meteoapp released as an Open Source

More than year ago I’ve played a little with Android Java SDK and I’ve created proof of concept of Meteoapp application. It fetches an meteograms from new.meteo.pl, cuts them into 6 parts and displays choosen parts for a given city. Meteoapp uses meteo library to interact with new.meteo.pl. This library has been also open sourced.

Read more

Gender guessing based on name in PHP

Today I discovered neat PHP extension named Gender: http://www.php.net/manual/en/book.gender.php. It determines gender based on name and country. Probably below code will be useful on some point of your software development career 🙂

Read more

Thoughts after Symfony Live Berlin 2012

Lately I had a great opportunity to attend to Symfony Live conference in Berlin. That was our second chance to meet Symfony community after successful edition in London this year.

Read more
older