Because of the performance overhead involved in transferring data between various tools and systems as part of the data processing pipeline, working with large amounts of data can be difficult. In fact, working with large amounts of data becomes slower and more expensive in terms of hardware because different programming languages, file formats, and network protocols have different ways of representing the same data in memory. This is due to the process of serialising and deserializing data into a different representation at potentially every step in a data pipeline.